class ConnectedComponents extends Arguments with Logging with WithCheckpointInterval with WithBroadcastThreshold with WithIntermediateStorageLevel with WithUseLabelsAsComponents with WithMaxIter with WithLocalCheckpoints

Connected Components algorithm.

Computes the connected component membership of each vertex and returns a DataFrame of vertex information with each vertex assigned a component ID.

The resulting DataFrame contains all the vertex information and one additional column:

  • component (LongType): unique ID for this component
Linear Supertypes
WithLocalCheckpoints, WithMaxIter, WithUseLabelsAsComponents, WithIntermediateStorageLevel, WithBroadcastThreshold, WithCheckpointInterval, Logging, Arguments, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ConnectedComponents
  2. WithLocalCheckpoints
  3. WithMaxIter
  4. WithUseLabelsAsComponents
  5. WithIntermediateStorageLevel
  6. WithBroadcastThreshold
  7. WithCheckpointInterval
  8. Logging
  9. Arguments
  10. AnyRef
  11. Any
Implicitly
  1. by any2stringadd
  2. by StringFormat
  3. by Ensuring
  4. by ArrowAssoc
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. def +(other: String): String
    Implicit
    This member is added by an implicit conversion from ConnectedComponents toany2stringadd[ConnectedComponents] performed by method any2stringadd in scala.Predef.
    Definition Classes
    any2stringadd
  4. def ->[B](y: B): (ConnectedComponents, B)
    Implicit
    This member is added by an implicit conversion from ConnectedComponents toArrowAssoc[ConnectedComponents] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc
    Annotations
    @inline()
  5. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. val broadcastThreshold: Int
    Attributes
    protected
    Definition Classes
    WithBroadcastThreshold
  8. val checkpointInterval: Int
    Attributes
    protected
    Definition Classes
    WithCheckpointInterval
  9. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  10. def ensuring(cond: (ConnectedComponents) => Boolean, msg: => Any): ConnectedComponents
    Implicit
    This member is added by an implicit conversion from ConnectedComponents toEnsuring[ConnectedComponents] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  11. def ensuring(cond: (ConnectedComponents) => Boolean): ConnectedComponents
    Implicit
    This member is added by an implicit conversion from ConnectedComponents toEnsuring[ConnectedComponents] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  12. def ensuring(cond: Boolean, msg: => Any): ConnectedComponents
    Implicit
    This member is added by an implicit conversion from ConnectedComponents toEnsuring[ConnectedComponents] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  13. def ensuring(cond: Boolean): ConnectedComponents
    Implicit
    This member is added by an implicit conversion from ConnectedComponents toEnsuring[ConnectedComponents] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  14. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  16. def getAlgorithm: String

    Gets the algorithm used for computing connected components.

  17. def getBroadcastThreshold: Int

    Gets broadcast threshold in propagating component assignment.

    Gets broadcast threshold in propagating component assignment.

    Definition Classes
    WithBroadcastThreshold
    See also

    org.graphframes.lib.ConnectedComponents.setBroadcastThreshold

  18. def getCheckpointInterval: Int

    Gets checkpoint interval.

    Gets checkpoint interval.

    Definition Classes
    WithCheckpointInterval
  19. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  20. def getIntermediateStorageLevel: StorageLevel

    Gets storage level for intermediate datasets that require multiple passes.

    Gets storage level for intermediate datasets that require multiple passes.

    Definition Classes
    WithIntermediateStorageLevel
  21. def getUseLabelsAsComponents: Boolean

    Gets whether to use vertex labels as component identifiers.

    Gets whether to use vertex labels as component identifiers.

    Definition Classes
    WithUseLabelsAsComponents
  22. def getUseLocalCheckpoints: Boolean

    Gets whether local checkpoints are being used instead of regular checkpoints.

    Gets whether local checkpoints are being used instead of regular checkpoints.

    returns

    true if local checkpoints are enabled, false otherwise

    Definition Classes
    WithLocalCheckpoints
  23. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  24. val intermediateStorageLevel: StorageLevel
    Attributes
    protected
    Definition Classes
    WithIntermediateStorageLevel
  25. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  26. def logDebug(s: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  27. def logInfo(s: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logTrace(s: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. def logWarn(s: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def maxIter(value: Int): ConnectedComponents.this.type

    The max number of iterations of algorithm to be performed.

    The max number of iterations of algorithm to be performed.

    Definition Classes
    WithMaxIter
  31. val maxIter: Option[Int]
    Attributes
    protected
    Definition Classes
    WithMaxIter
  32. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  33. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  34. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  35. def resultIsPersistent(): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. def run(): DataFrame

    Runs the algorithm.

  37. def setAlgorithm(value: String): ConnectedComponents.this.type

    Sets the algorithm to use for computing connected components.

    Sets the algorithm to use for computing connected components. Supported values:

  38. def setBroadcastThreshold(value: Int): ConnectedComponents.this.type

    Sets a broadcast threshold in propagating component assignments (default: 1,000,000).

    Sets a broadcast threshold in propagating component assignments (default: 1,000,000). If a node degree is greater than this threshold at some iteration, its component assignment will be collected and then broadcasted back to propagate the assignment to its neighbors. Otherwise, the assignment propagation is done by a normal Spark join. This parameter is only used when the algorithm is set to "graphframes". If the value is -1, then the skewness problem is left to the Apache Spark AQE optimizer.

    **WARNING** using a broadcast threshold is non-free! Under the hood it is calling an action, and if a broadcast threshold is set, then AQE is disabled to avoid wrong results! If your graph does not contain gigantic components, it is strongly recommended to set this value to -1. On benchmarks setting it to -1 gains about x5 better results in performance.

    **WARNING** the current default value is 1,000,000. It is left for backward compatibility only. In the future versions it may be set to -1 as more reasonable for the most real-world cases (e.g., the data deduplication problem).

    Definition Classes
    WithBroadcastThreshold
  39. def setCheckpointInterval(value: Int): ConnectedComponents.this.type

    Sets checkpoint interval in terms of number of iterations (default: 2).

    Sets checkpoint interval in terms of number of iterations (default: 2). Checkpointing regularly helps recover from failures, clean shuffle files, shorten the lineage of the computation graph, and reduce the complexity of plan optimization. As of Spark 2.0, the complexity of plan optimization would grow exponentially without checkpointing. Hence, disabling or setting longer-than-default checkpoint intervals are not recommended. Checkpoint data is saved under org.apache.spark.SparkContext.getCheckpointDir with prefix of the algorithm name. If the checkpoint directory is not set, this throws a java.io.IOException. Set a nonpositive value to disable checkpointing. This parameter is only used when the algorithm is set to "graphframes". Its default value might change in the future.

    Definition Classes
    WithCheckpointInterval
    See also

    org.apache.spark.SparkContext.setCheckpointDir in Spark API doc

  40. def setIntermediateStorageLevel(value: StorageLevel): ConnectedComponents.this.type

    Sets storage level for intermediate datasets that require multiple passes (default: MEMORY_AND_DISK).

    Sets storage level for intermediate datasets that require multiple passes (default: MEMORY_AND_DISK).

    Definition Classes
    WithIntermediateStorageLevel
  41. def setIsGraphPrepared(value: Boolean): ConnectedComponents.this.type

    !! WARNING: INTERNAL API — FOR VERY EXPERIENCED USERS ONLY !!

    !! WARNING: INTERNAL API — FOR VERY EXPERIENCED USERS ONLY !!

    Sets whether the graph has already been prepared before being passed to the algorithm, skipping the internal graph preparation step. The default is false, meaning the algorithm will always prepare the graph itself, which is the safe and recommended behaviour.

    Only set this to true if you have already performed all required preparation steps yourself and you fully understand what those steps are for the specific algorithm you are using. The preparation requirements differ significantly between algorithms:

    • two_phase and randomized_contraction each require their own distinct preparation steps. These are NOT interchangeable. You MUST study the internal source code of the algorithm you intend to use and replicate its exact preparation logic before enabling this flag.

    Incorrect use of this flag WILL produce silently wrong results with no error or warning at runtime. There is no validation that the graph has been correctly prepared. You are entirely responsible for ensuring correctness.

    value

    true if the graph is already prepared, false otherwise (default: false)

  42. def setUseLabelsAsComponents(value: Boolean): ConnectedComponents.this.type

    Sets whether to use vertex labels as component identifiers (default: false).

    Sets whether to use vertex labels as component identifiers (default: false). When true, vertex labels will be used as component identifiers instead of computing connected components.

    Definition Classes
    WithUseLabelsAsComponents
  43. def setUseLocalCheckpoints(value: Boolean): ConnectedComponents.this.type

    Sets whether to use local checkpoints instead of regular checkpoints (default: false).

    Sets whether to use local checkpoints instead of regular checkpoints (default: false). Local checkpoints are faster but less reliable as they don't survive node failures.

    value

    true to use local checkpoints, false for regular checkpoints

    returns

    this instance

    Definition Classes
    WithLocalCheckpoints
  44. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  45. def toString(): String
    Definition Classes
    AnyRef → Any
  46. val useLabelsAsComponents: Boolean
    Attributes
    protected
    Definition Classes
    WithUseLabelsAsComponents
  47. val useLocalCheckpoints: Boolean
    Attributes
    protected
    Definition Classes
    WithLocalCheckpoints
  48. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  49. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  50. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

  2. def formatted(fmtstr: String): String
    Implicit
    This member is added by an implicit conversion from ConnectedComponents toStringFormat[ConnectedComponents] performed by method StringFormat in scala.Predef.
    Definition Classes
    StringFormat
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.12.16) Use formatString.format(value) instead of value.formatted(formatString), or use the f"" string interpolator. In Java 15 and later, formatted resolves to the new method in String which has reversed parameters.

  3. def run(graph: GraphFrame): DataFrame
    Annotations
    @deprecated
    Deprecated

    (Since version 0.11.0) use graph.connectedComponents instead

  4. def [B](y: B): (ConnectedComponents, B)
    Implicit
    This member is added by an implicit conversion from ConnectedComponents toArrowAssoc[ConnectedComponents] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc
    Annotations
    @deprecated
    Deprecated

    (Since version 2.13.0) Use -> instead. If you still wish to display it as one character, consider using a font with programming ligatures such as Fira Code.

Inherited from WithLocalCheckpoints

Inherited from WithMaxIter

Inherited from WithUseLabelsAsComponents

Inherited from WithIntermediateStorageLevel

Inherited from WithBroadcastThreshold

Inherited from WithCheckpointInterval

Inherited from Logging

Inherited from Arguments

Inherited from AnyRef

Inherited from Any

Inherited by implicit conversion any2stringadd fromConnectedComponents to any2stringadd[ConnectedComponents]

Inherited by implicit conversion StringFormat fromConnectedComponents to StringFormat[ConnectedComponents]

Inherited by implicit conversion Ensuring fromConnectedComponents to Ensuring[ConnectedComponents]

Inherited by implicit conversion ArrowAssoc fromConnectedComponents to ArrowAssoc[ConnectedComponents]

Ungrouped