class GraphFrame extends Logging with Serializable
A representation of a graph using DataFrames.
- Grouped
- Alphabetic
- By Inheritance
- GraphFrame
- Serializable
- Logging
- AnyRef
- Any
- by any2stringadd
- by StringFormat
- by Ensuring
- by ArrowAssoc
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new GraphFrame()
Default constructor is provided to support serialization
Default constructor is provided to support serialization
- Attributes
- protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- def +(other: String): String
- Implicit
- This member is added by an implicit conversion from GraphFrame toany2stringadd[GraphFrame] performed by method any2stringadd in scala.Predef.
- Definition Classes
- any2stringadd
- def ->[B](y: B): (GraphFrame, B)
- Implicit
- This member is added by an implicit conversion from GraphFrame toArrowAssoc[GraphFrame] performed by method ArrowAssoc in scala.Predef.
- Definition Classes
- ArrowAssoc
- Annotations
- @inline()
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def aggregateMessages: AggregateMessages
This is a primitive for implementing graph algorithms.
This is a primitive for implementing graph algorithms. This method aggregates values from the neighboring edges and vertices of each vertex. See AggregateMessages for detailed documentation.
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def asUndirected(): GraphFrame
Converts the directed graph into an undirected graph by ensuring that all directed edges are bidirectional.
Converts the directed graph into an undirected graph by ensuring that all directed edges are bidirectional. For every directed edge (src, dst), a corresponding edge (dst, src) is added.
- returns
a new GraphFrame representing the undirected graph.
- def bfs: BFS
Breadth-first search (BFS)
Breadth-first search (BFS)
Refer to the documentation of org.graphframes.lib.BFS for the description of the output.
- def cache(): GraphFrame.this.type
Persist the dataframe representation of vertices and edges of the graph with the default storage level.
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
- def connectedComponents: ConnectedComponents
Connected component algorithm.
Connected component algorithm.
See org.graphframes.lib.ConnectedComponents for more details.
- lazy val degrees: DataFrame
The degree of each vertex in the graph, returned as a DataFrame with two columns:
The degree of each vertex in the graph, returned as a DataFrame with two columns:
- GraphFrame.ID the ID of the vertex
- 'degree' (integer) the degree of the vertex Note that vertices with 0 edges are not returned in the result.
- Annotations
- @transient()
- def detectingCycles: DetectingCycles
Find all cycles in the graph.
Find all cycles in the graph. An implementation of the Rocha–Thatte cycle detection algorithm.
Rocha, Rodrigo Caetano, and Bhalchandra D. Thatte. "Distributed cycle detection in large-scale sparse graphs." Proceedings of Simpósio Brasileiro de Pesquisa Operacional (SBPO’15) (2015): 1-11.
Returns a DataFrame with unque cycles.
- returns
an instance of DetectingCycles initialized with the current context
- def dropIsolatedVertices(): GraphFrame
Drop isolated vertices, vertices not contained in any edges.
- def edgeColumnMap: Map[String, Int]
Version of edgeColumns which maps column names to indices in the Rows.
- def edgeColumns: Array[String]
The vertex names in the vertices DataFrame, in order.
- def edges: DataFrame
The dataframe representation of the edges of the graph.
The dataframe representation of the edges of the graph.
It contains two columns called GraphFrame.SRC and GraphFrame.DST that contain the ids of the source vertex and the destination vertex of each edge, respectively. It may also contain various other columns with user-defined attributes for each edge.
For symmetric graphs, both pairs src -> dst and dst -> src are present with the same attributes for each pair.
The order of the columns is available in edgeColumns.
- def ensuring(cond: (GraphFrame) => Boolean, msg: => Any): GraphFrame
- Implicit
- This member is added by an implicit conversion from GraphFrame toEnsuring[GraphFrame] performed by method Ensuring in scala.Predef.
- Definition Classes
- Ensuring
- def ensuring(cond: (GraphFrame) => Boolean): GraphFrame
- Implicit
- This member is added by an implicit conversion from GraphFrame toEnsuring[GraphFrame] performed by method Ensuring in scala.Predef.
- Definition Classes
- Ensuring
- def ensuring(cond: Boolean, msg: => Any): GraphFrame
- Implicit
- This member is added by an implicit conversion from GraphFrame toEnsuring[GraphFrame] performed by method Ensuring in scala.Predef.
- Definition Classes
- Ensuring
- def ensuring(cond: Boolean): GraphFrame
- Implicit
- This member is added by an implicit conversion from GraphFrame toEnsuring[GraphFrame] performed by method Ensuring in scala.Predef.
- Definition Classes
- Ensuring
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def filterEdges(conditionExpr: String): GraphFrame
Filter the edges according to String expression.
- def filterEdges(condition: Column): GraphFrame
Filter the edges according to Column expression, keep all vertices.
- def filterVertices(conditionExpr: String): GraphFrame
Filter the vertices according to String expression, remove edges containing any dropped vertices.
- def filterVertices(condition: Column): GraphFrame
Filter the vertices according to Column expression, remove edges containing any dropped vertices.
- def find(pattern: String): DataFrame
Motif finding: Searching the graph for structural patterns
Motif finding: Searching the graph for structural patterns
Motif finding uses a simple Domain-Specific Language (DSL) for expressing structural queries. For example,
graph.find("(a)-[e]->(b); (b)-[e2]->(a)")will search for pairs of verticesa,bconnected by edges in both directions. It will return aDataFrameof all such structures in the graph, with columns for each of the named elements (vertices or edges) in the motif. In this case, the returned columns will be in order of the pattern: "a, e, b, e2."DSL for expressing structural patterns:
- The basic unit of a pattern is an edge. For example,
"(a)-[e]->(b)"expresses an edgeefrom vertexato vertexb. Note that vertices are denoted by parentheses(a), while edges are denoted by square brackets[e]. - A pattern is expressed as a union of edges. Edge patterns can be joined with semicolons.
Motif
"(a)-[e]->(b); (b)-[e2]->(c)"specifies two edges fromatobtoc. - Within a pattern, names can be assigned to vertices and edges. For example,
"(a)-[e]->(b)"has three named elements: verticesa,band edgee. These names serve two purposes:- The names can identify common elements among edges. For example,
"(a)-[e]->(b); (b)-[e2]->(c)"specifies that the same vertexbis the destination of edgeeand source of edgee2. - The names are used as column names in the result
DataFrame. If a motif contains named vertexa, then the resultDataFramewill contain a column "a" which is aStructTypewith sub-fields equivalent to the schema (columns) of GraphFrame.vertices. Similarly, an edgeein a motif will produce a column "e" in the resultDataFramewith sub-fields equivalent to the schema (columns) of GraphFrame.edges. - Be aware that names do *not* identify *distinct* elements: two elements with different
names may refer to the same graph element. For example, in the motif
"(a)-[e]->(b); (b)-[e2]->(c)", the namesaandccould refer to the same vertex. To restrict named elements to be distinct vertices or edges, use post-hoc filters such asresultDataframe.filter("a.id != c.id").
- The names can identify common elements among edges. For example,
- It is acceptable to omit names for vertices or edges in motifs when not needed. E.g.,
"(a)-[]->(b)"expresses an edge between verticesa,bbut does not assign a name to the edge. There will be no column for the anonymous edge in the resultDataFrame. Similarly,"(a)-[e]->()"indicates an out-edge of vertexabut does not name the destination vertex. These are called *anonymous* vertices and edges. - An edge can be negated to indicate that the edge should *not* be present in the graph.
E.g.,
"(a)-[]->(b); !(b)-[]->(a)"finds edges fromatobfor which there is *no* edge frombtoa.
Restrictions:
- Motifs are not allowed to contain edges without any named elements:
"()-[]->()"and"!()-[]->()"are prohibited terms. - Motifs are not allowed to contain named edges within negated terms (since these named
edges would never appear within results). E.g.,
"!(a)-[ab]->(b)"is invalid, but"!(a)-[]->(b)"is valid.
More complex queries, such as queries which operate on vertex or edge attributes, can be expressed by applying filters to the result
DataFrame.This can return duplicate rows. E.g., a query
"(u)-[]->()"will return a result for each matching edge, even if those edges share the same vertexu.- pattern
Pattern specifying a motif to search for.
- returns
DataFramecontaining all instances of the motif.
- The basic unit of a pattern is an edge. For example,
- def findAugmentedPatterns(pattern: String): DataFrame
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @IntrinsicCandidate() @native()
- lazy val inDegrees: DataFrame
The in-degree of each vertex in the graph, returned as a DataFame with two columns:
The in-degree of each vertex in the graph, returned as a DataFame with two columns:
- GraphFrame.ID the ID of the vertex "- "inDegree" (int) storing the in-degree of the vertex Note that vertices with 0 in-edges are not returned in the result.
- Annotations
- @transient()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def kCore: KCore
K-Core decomposition.
K-Core decomposition.
See org.graphframes.lib.KCore for more details.
- def labelPropagation: LabelPropagation
Label propagation algorithm.
Label propagation algorithm.
See org.graphframes.lib.LabelPropagation for more details.
- def logDebug(s: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(s: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(s: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarn(s: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def maximalIndependentSet: MaximalIndependentSet
Maximal Independent Set algorithm.
Maximal Independent Set algorithm.
See org.graphframes.lib.MaximalIndependentSet for more details.
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @IntrinsicCandidate() @native()
- lazy val outDegrees: DataFrame
The out-degree of each vertex in the graph, returned as a DataFrame with two columns:
The out-degree of each vertex in the graph, returned as a DataFrame with two columns:
- GraphFrame.ID the ID of the vertex
- "outDegree" (integer) storing the out-degree of the vertex Note that vertices with 0 out-edges are not returned in the result.
- Annotations
- @transient()
- def pageRank: PageRank
PageRank algorithm.
PageRank algorithm.
See org.graphframes.lib.PageRank for more details.
- def parallelPersonalizedPageRank: ParallelPersonalizedPageRank
Parallel personalized PageRank algorithm.
Parallel personalized PageRank algorithm.
See org.graphframes.lib.ParallelPersonalizedPageRank for more details.
- def persist(newLevel: StorageLevel): GraphFrame.this.type
Persist the dataframe representation of vertices and edges of the graph with the given storage level.
Persist the dataframe representation of vertices and edges of the graph with the given storage level.
- newLevel
One of:
MEMORY_ONLY,MEMORY_AND_DISK,MEMORY_ONLY_SER,MEMORY_AND_DISK_SER,DISK_ONLY,MEMORY_ONLY_2,MEMORY_AND_DISK_2, etc..
- def persist(): GraphFrame.this.type
Persist the dataframe representation of vertices and edges of the graph with the default storage level.
- def powerIterationClustering(k: Int, maxIter: Int, weightCol: Option[String]): DataFrame
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen.
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen. From the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data.
PowerIterationClustering algorithm.
- k
The number of clusters to create (k).
- maxIter
Param for maximum number of iterations (>= 0).
- weightCol
Param for weight column name.
- def pregel: Pregel
Pregel algorithm.
Pregel algorithm.
- See also
- def resultIsPersistent(): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def shortestPaths: ShortestPaths
Shortest paths algorithm.
Shortest paths algorithm.
See org.graphframes.lib.ShortestPaths for more details.
- def stronglyConnectedComponents: StronglyConnectedComponents
Strongly connected components algorithm.
Strongly connected components algorithm.
See org.graphframes.lib.StronglyConnectedComponents for more details.
- def svdPlusPlus: SVDPlusPlus
SVD++ algorithm.
SVD++ algorithm.
See org.graphframes.lib.SVDPlusPlus for more details.
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toGraphX: Graph[Row, Row]
Converts this GraphFrame instance to a GraphX
Graph.Converts this GraphFrame instance to a GraphX
Graph. Vertex and edge attributes are the original rows in vertices and edges, respectively.Note that vertex (and edge) attributes include vertex IDs (and source, destination IDs) in order to support non-Long vertex IDs. If the vertex IDs are not convertible to Long values, then the values are indexed in order to generate corresponding Long vertex IDs (which is an expensive operation).
The column ordering of the returned
Graphvertex and edge attributes are specified by vertexColumns and edgeColumns, respectively. - def toString(): String
- Definition Classes
- GraphFrame → AnyRef → Any
- def triangleCount: TriangleCount
Triangle count algorithm.
Triangle count algorithm.
See org.graphframes.lib.TriangleCount for more details.
- lazy val triplets: DataFrame
Returns triplets: (source vertex)-[edge]->(destination vertex) for all edges in the graph.
Returns triplets: (source vertex)-[edge]->(destination vertex) for all edges in the graph. The DataFrame returned has 3 columns, with names: GraphFrame.SRC, GraphFrame.EDGE, and GraphFrame.DST. Each column is a struct. The 2 vertex columns have schema matching GraphFrame.vertices, and the edge column has a schema matching GraphFrame.edges. For example,
triplets.select(col(SRC)(ID))selects ID of the source column. - def typeDegree(edgeTypeCol: String, edgeTypes: Option[Seq[Any]] = None): DataFrame
The total degree of each vertex per edge type (both in and out), returned as a DataFrame with two columns:
The total degree of each vertex per edge type (both in and out), returned as a DataFrame with two columns:
- GraphFrame.ID the ID of the vertex
- "degrees" a struct with a field for each edge type, storing the total degree count
- edgeTypeCol
Name of the column in edges DataFrame that contains edge types
- edgeTypes
Optional sequence of edge type values. If None, edge types will be discovered automatically.
- def typeInDegree(edgeTypeCol: String, edgeTypes: Option[Seq[Any]] = None): DataFrame
The in-degree of each vertex per edge type, returned as a DataFrame with two columns:
The in-degree of each vertex per edge type, returned as a DataFrame with two columns:
- GraphFrame.ID the ID of the vertex
- "inDegrees" a struct with a field for each edge type, storing the in-degree count
- edgeTypeCol
Name of the column in edges DataFrame that contains edge types
- edgeTypes
Optional sequence of edge type values. If None, edge types will be discovered automatically.
- def typeOutDegree(edgeTypeCol: String, edgeTypes: Option[Seq[Any]] = None): DataFrame
The out-degree of each vertex per edge type, returned as a DataFrame with two columns:
The out-degree of each vertex per edge type, returned as a DataFrame with two columns:
- GraphFrame.ID the ID of the vertex
- "outDegrees" a struct with a field for each edge type, storing the out-degree count
- edgeTypeCol
Name of the column in edges DataFrame that contains edge types
- edgeTypes
Optional sequence of edge type values. If None, edge types will be discovered automatically.
- def unpersist(blocking: Boolean): GraphFrame.this.type
Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.
Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.
- blocking
Whether to block until all blocks are deleted.
- def unpersist(): GraphFrame.this.type
Mark the dataframe representation of vertices and edges of the graph as non-persistent, and remove all blocks for it from memory and disk.
- def validate(checkVertices: Boolean, intermediateStorageLevel: StorageLevel): Unit
Validates the consistency and integrity of a graph by performing checks on the vertices and edges.
Validates the consistency and integrity of a graph by performing checks on the vertices and edges.
- checkVertices
a flag to indicate whether additional vertex consistency checks should be performed. If true, the method will verify that all vertices in the vertex DataFrame are represented in the edge DataFrame and vice versa. It is slow on big graphs.
- intermediateStorageLevel
the storage level to be used when persisting intermediate DataFrame computations during the validation process.
- returns
Unit, as the method, performs validation checks and throws an exception if validation fails.
- Exceptions thrown
InvalidGraphExceptionif there are any inconsistencies in the graph, such as duplicate vertices, mismatched vertices between edges and vertex DataFrames or missing connections.
- def validate(): Unit
Validates the consistency and integrity of a graph by performing checks on the vertices and edges.
Validates the consistency and integrity of a graph by performing checks on the vertices and edges.
- returns
Unit, as the method, performs validation checks and throws an exception if validation fails.
- Exceptions thrown
InvalidGraphExceptionif there are any inconsistencies in the graph, such as duplicate vertices, mismatched vertices between edges and vertex DataFrames or missing connections.
- def vertexColumnMap: Map[String, Int]
Version of vertexColumns which maps column names to indices in the Rows.
- def vertexColumns: Array[String]
The column names in the vertices DataFrame, in order.
- def vertices: DataFrame
The dataframe representation of the vertices of the graph.
The dataframe representation of the vertices of the graph.
It contains a column called GraphFrame.ID with the id of the vertex, and various other user-defined attributes with other attributes.
The order of the columns is available in vertexColumns.
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
Deprecated Value Members
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable]) @Deprecated
- Deprecated
(Since version 9)
- def formatted(fmtstr: String): String
- Implicit
- This member is added by an implicit conversion from GraphFrame toStringFormat[GraphFrame] performed by method StringFormat in scala.Predef.
- Definition Classes
- StringFormat
- Annotations
- @deprecated @inline()
- Deprecated
(Since version 2.12.16) Use
formatString.format(value)instead ofvalue.formatted(formatString), or use thef""string interpolator. In Java 15 and later,formattedresolves to the new method in String which has reversed parameters.
- def →[B](y: B): (GraphFrame, B)
- Implicit
- This member is added by an implicit conversion from GraphFrame toArrowAssoc[GraphFrame] performed by method ArrowAssoc in scala.Predef.
- Definition Classes
- ArrowAssoc
- Annotations
- @deprecated
- Deprecated
(Since version 2.13.0) Use
->instead. If you still wish to display it as one character, consider using a font with programming ligatures such as Fira Code.