class HyperANF extends Serializable with Logging with WithCheckpointInterval with WithIntermediateStorageLevel with WithLocalCheckpoints

HyperANF-style approximation of the neighbourhood function on top of GraphFrames.

This implementation is inspired by Vigna, Paolo; Boldi, Marco; Rosa, Sebastiano. "HyperANF: Approximating the Neighbourhood Function of Very Large Graphs on a Budget." arXiv preprint arXiv:1011.5599 (2010).

The input graph is treated as directed: for each vertex, reachability is computed by following outgoing edges from src to dst.

Compared with the cumulative neighbourhood-function presentation in the paper, this implementation returns one column per hop, hop_0, hop_1, hop_2, ..., hop_N. The hop_0 column contains a HyperLogLog sketch of the source vertex itself, and each hop_k column for k >= 1 contains a HyperLogLog sketch of the set of vertices reachable in exactly k hops. To derive the cumulative approximate neighbourhood function for distances up to some hop k, a user can combine hop_0 through hop_k with hll_union and then apply hll_sketch_estimate to the merged sketch.

The computation can also be restricted to a subgraph by supplying an edge filter expression via setEdgesFilterExpression. A common use case is to filter on src, for example src IN (...), to obtain sketches only for a selected set of starting vertices.

Linear Supertypes
WithLocalCheckpoints, WithIntermediateStorageLevel, WithCheckpointInterval, Logging, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HyperANF
  2. WithLocalCheckpoints
  3. WithIntermediateStorageLevel
  4. WithCheckpointInterval
  5. Logging
  6. Serializable
  7. AnyRef
  8. Any
Implicitly
  1. by any2stringadd
  2. by StringFormat
  3. by Ensuring
  4. by ArrowAssoc
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. def +(other: String): String
    Implicit
    This member is added by an implicit conversion from HyperANF toany2stringadd[HyperANF] performed by method any2stringadd in scala.Predef.
    Definition Classes
    any2stringadd
  4. def ->[B](y: B): (HyperANF, B)
    Implicit
    This member is added by an implicit conversion from HyperANF toArrowAssoc[HyperANF] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc
    Annotations
    @inline()
  5. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. val checkpointInterval: Int
    Attributes
    protected
    Definition Classes
    WithCheckpointInterval
  8. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  9. def ensuring(cond: (HyperANF) => Boolean, msg: => Any): HyperANF
    Implicit
    This member is added by an implicit conversion from HyperANF toEnsuring[HyperANF] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  10. def ensuring(cond: (HyperANF) => Boolean): HyperANF
    Implicit
    This member is added by an implicit conversion from HyperANF toEnsuring[HyperANF] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  11. def ensuring(cond: Boolean, msg: => Any): HyperANF
    Implicit
    This member is added by an implicit conversion from HyperANF toEnsuring[HyperANF] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  12. def ensuring(cond: Boolean): HyperANF
    Implicit
    This member is added by an implicit conversion from HyperANF toEnsuring[HyperANF] performed by method Ensuring in scala.Predef.
    Definition Classes
    Ensuring
  13. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  15. def getCheckpointInterval: Int

    Gets checkpoint interval.

    Gets checkpoint interval.

    Definition Classes
    WithCheckpointInterval
  16. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  17. def getIntermediateStorageLevel: StorageLevel

    Gets storage level for intermediate datasets that require multiple passes.

    Gets storage level for intermediate datasets that require multiple passes.

    Definition Classes
    WithIntermediateStorageLevel
  18. def getUseLocalCheckpoints: Boolean

    Gets whether local checkpoints are being used instead of regular checkpoints.

    Gets whether local checkpoints are being used instead of regular checkpoints.

    returns

    true if local checkpoints are enabled, false otherwise

    Definition Classes
    WithLocalCheckpoints
  19. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  20. val intermediateStorageLevel: StorageLevel
    Attributes
    protected
    Definition Classes
    WithIntermediateStorageLevel
  21. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  22. def logDebug(s: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(s: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  24. def logTrace(s: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  25. def logWarn(s: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  26. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  27. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  28. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  29. def resultIsPersistent(): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def run(): DataFrame

    Runs the HyperANF-style computation.

    Runs the HyperANF-style computation.

    The returned DataFrame has one row per source vertex present in the filtered edge set. It contains the vertex id column id and one sketch column per hop: hop_0, hop_1, hop_2, ..., hop_N. The hop_0 column stores a HyperLogLog sketch containing id itself. Each hop_k column for k >= 1 stores a HyperLogLog sketch for the set of vertices reachable from id in exactly k directed hops.

    To obtain an approximate cumulative neighbourhood size up to hop k, union hop_0 through hop_k with hll_union and then apply hll_sketch_estimate.

    returns

    a DataFrame with exact-hop HyperLogLog sketches per source vertex

  31. def setCheckpointInterval(value: Int): HyperANF.this.type

    Sets checkpoint interval in terms of number of iterations (default: 2).

    Sets checkpoint interval in terms of number of iterations (default: 2). Checkpointing regularly helps recover from failures, clean shuffle files, shorten the lineage of the computation graph, and reduce the complexity of plan optimization. As of Spark 2.0, the complexity of plan optimization would grow exponentially without checkpointing. Hence, disabling or setting longer-than-default checkpoint intervals are not recommended. Checkpoint data is saved under org.apache.spark.SparkContext.getCheckpointDir with prefix of the algorithm name. If the checkpoint directory is not set, this throws a java.io.IOException. Set a nonpositive value to disable checkpointing. This parameter is only used when the algorithm is set to "graphframes". Its default value might change in the future.

    Definition Classes
    WithCheckpointInterval
    See also

    org.apache.spark.SparkContext.setCheckpointDir in Spark API doc

  32. def setEdgesFilterExpression(value: Column): HyperANF.this.type

    Sets the edge filter expression used before running the computation.

    Sets the edge filter expression used before running the computation.

    Only edges satisfying this predicate participate in the directed reachability expansion. This effectively runs the algorithm on the subgraph induced by the filtered edge set.

    A common use case is filtering on src, for example src IN (...), to limit the result to a chosen set of starting vertices.

    value

    filter expression applied to graph.edges

    returns

    this HyperANF instance

  33. def setIntermediateStorageLevel(value: StorageLevel): HyperANF.this.type

    Sets storage level for intermediate datasets that require multiple passes (default: MEMORY_AND_DISK).

    Sets storage level for intermediate datasets that require multiple passes (default: MEMORY_AND_DISK).

    Definition Classes
    WithIntermediateStorageLevel
  34. def setLgNomEntries(value: Int): HyperANF.this.type

    Sets the log2 of nominal entries used by HLL sketch aggregations.

  35. def setNHops(value: Int): HyperANF.this.type

    Sets the maximum hop distance to compute.

    Sets the maximum hop distance to compute.

    The result will contain hop_0, hop_1, hop_2, ..., hop_N, where N is the configured number of hops.

    value

    positive number of hops to compute

    returns

    this HyperANF instance

  36. def setUseLocalCheckpoints(value: Boolean): HyperANF.this.type

    Sets whether to use local checkpoints instead of regular checkpoints (default: false).

    Sets whether to use local checkpoints instead of regular checkpoints (default: false). Local checkpoints are faster but less reliable as they don't survive node failures.

    value

    true to use local checkpoints, false for regular checkpoints

    returns

    this instance

    Definition Classes
    WithLocalCheckpoints
  37. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  38. def toString(): String
    Definition Classes
    AnyRef → Any
  39. val useLocalCheckpoints: Boolean
    Attributes
    protected
    Definition Classes
    WithLocalCheckpoints
  40. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  41. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  42. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

  2. def formatted(fmtstr: String): String
    Implicit
    This member is added by an implicit conversion from HyperANF toStringFormat[HyperANF] performed by method StringFormat in scala.Predef.
    Definition Classes
    StringFormat
    Annotations
    @deprecated @inline()
    Deprecated

    (Since version 2.12.16) Use formatString.format(value) instead of value.formatted(formatString), or use the f"" string interpolator. In Java 15 and later, formatted resolves to the new method in String which has reversed parameters.

  3. def [B](y: B): (HyperANF, B)
    Implicit
    This member is added by an implicit conversion from HyperANF toArrowAssoc[HyperANF] performed by method ArrowAssoc in scala.Predef.
    Definition Classes
    ArrowAssoc
    Annotations
    @deprecated
    Deprecated

    (Since version 2.13.0) Use -> instead. If you still wish to display it as one character, consider using a font with programming ligatures such as Fira Code.

Inherited from WithLocalCheckpoints

Inherited from WithIntermediateStorageLevel

Inherited from WithCheckpointInterval

Inherited from Logging

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Inherited by implicit conversion any2stringadd fromHyperANF to any2stringadd[HyperANF]

Inherited by implicit conversion StringFormat fromHyperANF to StringFormat[HyperANF]

Inherited by implicit conversion Ensuring fromHyperANF to Ensuring[HyperANF]

Inherited by implicit conversion ArrowAssoc fromHyperANF to ArrowAssoc[HyperANF]

Ungrouped