Installation

If you are new to using Apache Spark, refer to the Apache Spark Documentation and its Quick-Start Guide for more information.

Spark Versions Compatibility

Component Spark 3.x (Scala 2.12) Spark 3.x (Scala 2.13) Spark 4.x (Scala 2.13)
graphframes
graphframes-connect

The following example shows how to run the Spark shell with the GraphFrames package. We use the --packages argument to download the graphframes package and any dependencies automatically.

Spark 3.x

Spark Shell

$ ./bin/spark-shell --packages io.graphframes:graphframes-spark3_2.12:0.9.3

Or use the following command to force using of Scala 2.13:

$ ./bin/spark-shell --packages io.graphframes:graphframes-spark3_2.13:0.9.3

PySpark

$ pip install graphframes-py==0.9.3
$ ./bin/pyspark --packages io.graphframes:graphframes-spark3_2.12:0.9.3

Spark 4.x

Spark Shell

$ ./bin/spark-shell --packages io.graphframes:graphframes-spark4_2.13:0.9.3

PySpark

$ pip install graphframes-py==0.9.3
$ ./bin/pyspark --packages io.graphframes:graphframes-spark4_2.13:0.9.3

Spark Connect Server Extension

To add GraphFrames to your spark connect server, you need to specify the plugin name:

For Spark 4.x:

./sbin/start-connect-server.sh \
  --conf spark.connect.extensions.relation.classes=\
  org.apache.spark.sql.graphframes.GraphFramesConnect \
  --packages io.graphframes.graphframes-connect-spark4_2.13:0.9.3

For Spark 3.x:

./sbin/start-connect-server.sh \
  --conf spark.connect.extensions.relation.classes=\
  org.apache.spark.sql.graphframes.GraphFramesConnect \
  --packages io.graphframes.graphframes-connect-spark3_2.12:0.9.3

WARNING: The GraphFrames Connect Server Extension is not compatible with managed SparkConnect from Databricks. To make it work, you need to use build GraphFrames Connect Server Extension from source with a flag:

./build/sbt -Dvendor.name=dbx connect/assembly

Spark Connect Clients

At the moment GraphFrames has only PySpark client bundled with the package: pip install graphframes-py==0.9.3. In Runtime GraphFrames PySpark client will automatically handle the connection to the GraphFrames Connect Server Extension in case it is Spark Connect environment.

Messages

At the moment, the following APIs are exposed:

message GraphFramesAPI {
  bytes vertices = 1;
  bytes edges = 2;
  oneof method {
    AggregateMessages aggregate_messages = 3;
    BFS bfs = 4;
    ConnectedComponents connected_components = 5;
    DropIsolatedVertices drop_isolated_vertices = 6;
    FilterEdges filter_edges = 7;
    FilterVertices filter_vertices = 8;
    Find find = 9;
    LabelPropagation label_propagation = 10;
    PageRank page_rank = 11;
    ParallelPersonalizedPageRank parallel_personalized_page_rank = 12;
    PowerIterationClustering power_iteration_clustering = 13;
    Pregel pregel = 14;
    ShortestPaths shortest_paths = 15;
    StronglyConnectedComponents strongly_connected_components = 16;
    SVDPlusPlus svd_plus_plus = 17;
    TriangleCount triangle_count = 18;
    Triplets triplets = 19;
  }
}

Building GraphFrames from Source

./build/sbt package

Nightly Builds

GraphFrames project is publishing SNAPSHOTS (nightly builds) to the "Central Portal Snapshots." Please read this section of the Sonatype documentation to check how can you use snapshots in your project.

GroupId: io.graphframes ArtifactIds: