Benchmarks
Graphalytics Benchmarks
This benchmark is to test the performance of GraphFrames algorithms, not Apache Spark itself. So, all the graphs are read from Parquet files on disk and persisted in memory in the serialized format. As a result, only the time of GraphFrames algorithms is measured, and the time to read/parse source files, serialize, and persist the data is not measured.
Configurations
- Serializer:
org.apache.spark.serializer.KryoSerializer - GraphFrame checkpoints:
localCheckpoints - Spark Version: 4.0.0
- Scala Version: 2.13.18
- VM: standard GitHub Actions runner for open source projects.
Graph: wiki-Talk
- Vertices: 2M
- Edges: 5M
- Size Category: XS
- Source files format:
Parquet
| Algorithm | Measurements | Time (s) |
|---|---|---|
| Shortest Paths Graphframes | 3 | 50.2069 |
| Shortest Paths GraphX | 3 | 18.2139 |
| Connected Components Graphframes | 3 | 29.5346 |
| Connected Components GraphX | 3 | 17.6575 |
| Label Propagation GraphFrames | 3 | 68.2840 |
| Label Propagation GraphX | 3 | 110.6229 |