GraphFrames 0.9.3 release

The brand-new website and documentation

Documentation was significantly improved. The website is now built with TypeLevel Laika. Laika was chosen because GraphFrames is a Scala first library, and it is nice to have a website that is built with Scala and well integrated to the sbt build. Documentation was split into multiple sections for better navigation and readability. A new section was added for the DataFrame-based Pregel API. A lot of things were fixed or updated.

The new Property Graph Model

While Graphframes was always good at providing graph algorithms, the API for constructing property graphs was very low-level and with a high level of variations about how to do the same thing. As well, the full pattern matching support was not available. The first step towards a more user-friendly API was to introduce a new PropertyGraph model. This model is a pure logical structure that wraps multiple tables or dataframes to VertexPropertyGroup and EdgePropertyGroup objects. This objects can be used to construct a PropertyGraphFrame that handles everything and provides an API to convert it to a GraphFrame and back to run graph algorithms. What is cool is that the PropertyGraphFrame supports arbitrary multigraphs that may have directed and undirected, weighted and unweighted edges, etc. Next steps are to add support for pattern matching with an ability to match by property group type as well as filtering by any property internally that is already very close to the OpenCypher standard. Additional benefit of the new graph abstraction is a possibility to integrate GraphFrames with native graph storage formats like Apache GraphAr (incubating).

Local checkpoints in DataFrame-based Pregel and Connected Components

It was a long-awaited feature to be able to not rely on persistent checkpoints for the DataFrame-based Pregel and Connected Components. With the 0.9.3 release, this is now possible by either direct API or global configuration.

Bug fixes in DataFrame-based LabelPropagation

Due to a bug in tests code, the DataFrame-based LabelPropagation was not tested and was not working. This has been fixed.

Benchmarks for GraphFrames

For the first time, we have introduced benchmarks for GraphFrames. Benchmarks are run on the LDBC Graphalytics benchmark suite. The results are not "official" because they were not checked by the LDBC Graphalytics team. However, they are a good starting point for future improvements and tracking performance. The results are available here.

Compatibility with Scala 3

We are continuing to work on the compatibility with Scala 3. In 0.9.3 the new scalac flags were introduced to improve the compatibility. As well as some deprecation warnings were fixed.

Future steps