GraphFrames is back!

Originally published in the Graphlet Ai Blog

GraphFrames 0.9.2 is out on PyPi as graphframes-py and as io.graphframes on Maven Sonatype Central! Documentation is now available on graphframes.io… and we even have a new logo!

The new GraphFrames logo is new for this release :)

GraphFrames is BACK!

You can see below that GraphFrames is back! It has seen contributions every week for most of the year — we have half a dozen active contributors now. This release is due to the efforts of many people but I need to express our deep gratitude to Sem Sinchenko, who drove this release.

GraphFrames is back! Semyon Sinchenko deserves the appreciation and respect of all GraphFrames users :)The project has gone from dead to lively since GraphX was deprecated from Spark — prompting us to work on a replacement.

The project has gone from effectively dead to vibrant in the six months since GraphX was deprecated from Spark, which prompted us to get to work on an all-DataFrame replacement. You can see in the chart below that there is more frequent contributions than since the project’s inception!

Contributions chart of the GraphFrames project

New Features in GraphFrames 0.9.2

It was necessary for GraphFrames to support both Spark 4 and Spark Connect to remain integral to the Spark community. There were many issues resolved in the release, but the core of it was:

State of the Union

The GraphFrames community has achieved our first goal: make the project viable again! Still in the future?

Property Graphs

Sem has started implementing Property Graphs for GraphFrames, which currently has relationship for edges but not type for nodes. In current practice, this means property graph processing requires you to merge all your node schemas together into a kitchen sink schema before using GraphFrames’ algorithms. it is a real drag… property graphs will be a huge improvement! Sem recently outlined a beautiful vision for property graphs as part of the Open Lakehouse. Check it out!

Inclusion in Spark

This is actively debated: it would be a lot of trouble to release with Spark, but based on the number of search hits for GraphX versus GraphFrames, it would get us 10x as many users. When I put that way, GraphFrames in Spark sounds pretty good!

GraphX is Deprecated

Spark deprecating GraphX was the call to action that led us to revive GraphFrames, and we heard it well. We’re building DataFrame implementations of all GraphX components. GraphX has already been removed from ShortestPaths and from LabelPropagation. The rest of the work is being tracked here and is underway. GraphX will be deprecated from GraphFrames as of 1.0. GraphFrames 2.0 will remove GraphX completely. Soon GraphFrames will be entirely built on DataFrames!

The Sedona Alliance!

Developers from Apache Sedona joined the development of GraphFrames 0.9. Sedona 1.80 will depend on the new version. They’ve been a huge help! James Willis, Adam Binford and the Apache Sedona team gave us new configurations, helped us fix our CI to enable the 0.9 release and drove Spark 4 support. James Willis became an official maintainer of GraphFrames to coordinate efforts between these projects.

New Contributors

We have a lot of new contributors for this release!

A Call for Help

We are building a list of dependent projects, so if you use GraphFrames, please let us know! We want your help testing new versions before the release.

Got questions or concerns? Let us know what you think! Find us on Discord in #graphframes on GraphGeeks, or join the GraphFrames Google Group.