class BFS extends Arguments with Serializable
Breadth-first search (BFS)
This method returns a DataFrame of valid shortest paths from vertices matching fromExpr
to
vertices matching toExpr
. If multiple paths are valid and have the same length, the DataFrame
will return one Row for each path. If no paths are valid, the DataFrame will be empty. Note:
"Shortest" means globally shortest path. I.e., if the shortest path between two vertices
matching fromExpr
and toExpr
is length 5 (edges) but no path is shorter than 5, then all
paths returned by BFS will have length 5.
The returned DataFrame will have the following columns:
from
start vertex of pathe[i]
edge i in the path, indexed from 0v[i]
intermediate vertex i in the path, indexed from 1to
end vertex of path Each of these columns is a StructType whose fields are the same as the columns of GraphFrame.vertices or GraphFrame.edges.
For example, suppose we have a graph g. Say the vertices DataFrame of g has columns "id" and "job", and the edges DataFrame of g has columns "src", "dst", and "relation".
// Search from vertex "Joe" to find the closet vertices with attribute job = CEO. g.bfs(col("id") === "Joe", col("job") === "CEO").run()
If we found a path of 3 edges, each row would have columns:
from | e0 | v1 | e1 | v2 | e2 | to
In the above row, each vertex column (from, v1, v2, to) would have fields "id" and "job" (just like g.vertices). Each edge column (e0, e1, e2) would have fields "src", "dst", and "relation".
If there are ties, then each of the equal paths will be returned as a separate Row.
If one or more vertices match both the from and to conditions, then there is a 0-hop path. The
returned DataFrame will have the "from" and "to" columns (as above); however, the "from" and
"to" columns will be exactly the same. There will be one row for each vertex in
GraphFrame.vertices matching both fromExpr
and toExpr
.
Parameters:
fromExpr
Spark SQL expression specifying valid starting vertices for the BFS. This condition will be matched against each vertex's id or attributes. To start from a specific vertex, this could be "id = [start vertex id]". To start from multiple valid vertices, this can operate on vertex attributes.toExpr
Spark SQL expression specifying valid target vertices for the BFS. This condition will be matched against each vertex's id or attributes.maxPathLength
Limit on the length of paths. If no valid paths of length <= maxPathLength are found, then the BFS is terminated. (default = 10)edgeFilter
Spark SQL expression specifying edges which may be used in the search. This allows the user to disallow crossing certain edges. Such filters can be applied post-hoc after BFS, run specifying the filter here is more efficient.
Returns:
- DataFrame of valid shortest paths found in the BFS
from | e0 | v1 | e1 | v2 | e2 | to }}} to) would have fields "id" and "job" (just like g.vertices). Each edge column (e0, e1, e2) would have fields "src", "dst", and "relation".
If there are ties, then each of the equal paths will be returned as a separate Row.
If one or more vertices match both the from and to conditions, then there is a 0-hop path. The
returned DataFrame will have the "from" and "to" columns (as above); however, the "from" and
"to" columns will be exactly the same. There will be one row for each vertex in
GraphFrame.vertices matching both fromExpr
and toExpr
.
Parameters:
fromExpr
Spark SQL expression specifying valid starting vertices for the BFS. This condition will be matched against each vertex's id or attributes. To start from a specific vertex, this could be "id = [start vertex id]". To start from multiple valid vertices, this can operate on vertex attributes.toExpr
Spark SQL expression specifying valid target vertices for the BFS. This condition will be matched against each vertex's id or attributes.maxPathLength
Limit on the length of paths. If no valid paths of length <= maxPathLength are found, then the BFS is terminated. (default = 10)edgeFilter
Spark SQL expression specifying edges which may be used in the search. This allows the user to disallow crossing certain edges. Such filters can be applied post-hoc after BFS, run specifying the filter here is more efficient.
Returns:
- DataFrame of valid shortest paths found in the BFS
- Alphabetic
- By Inheritance
- BFS
- Serializable
- Serializable
- Arguments
- AnyRef
- Any
- by any2stringadd
- by StringFormat
- by Ensuring
- by ArrowAssoc
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
- def +(other: String): String
- def ->[B](y: B): (BFS, B)
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
- def edgeFilter(value: String): BFS.this.type
- def edgeFilter(value: Column): BFS.this.type
- def ensuring(cond: (BFS) ⇒ Boolean, msg: ⇒ Any): BFS
- def ensuring(cond: (BFS) ⇒ Boolean): BFS
- def ensuring(cond: Boolean, msg: ⇒ Any): BFS
- def ensuring(cond: Boolean): BFS
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def fromExpr(value: String): BFS.this.type
- def fromExpr(value: Column): BFS.this.type
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def maxPathLength(value: Int): BFS.this.type
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- def run(): DataFrame
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- def toExpr(value: String): BFS.this.type
- def toExpr(value: Column): BFS.this.type
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
- def →[B](y: B): (BFS, B)
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated
-
def
formatted(fmtstr: String): String
- Implicit
- This member is added by an implicit conversion from BFS to StringFormat[BFS] performed by method StringFormat in scala.Predef.
- Definition Classes
- StringFormat
- Annotations
- @deprecated @inline()
- Deprecated
(Since version 2.12.16) Use
formatString.format(value)
instead ofvalue.formatted(formatString)
, or use thef""
string interpolator. In Java 15 and later,formatted
resolves to the new method in String which has reversed parameters.