pyspark.sql.functions.tuple_union_theta_integer#
- pyspark.sql.functions.tuple_union_theta_integer(col1, col2, lgNomEntries=None, mode=None)[source]#
Merges a Datasketches TupleSketch with integer summaries with a ThetaSketch.
New in version 4.2.0.
- Parameters
- col1
Columnor column name The TupleSketch column with integer summaries
- col2
Columnor column name The ThetaSketch column
- lgNomEntries
Columnor int, optional The log-base-2 of nominal entries (must be between 4 and 26, defaults to 12)
- mode
Columnor str, optional The summary mode: “sum” (default), “min”, “max”, or “alwaysone”
- col1
- Returns
ColumnThe binary representation of the merged TupleSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, 10, 3), (2, 20, 4)], ["key1", "v1", "key2"]) # noqa >>> df = df.agg( ... sf.tuple_sketch_agg_integer("key1", "v1").alias("sketch1"), ... sf.theta_sketch_agg("key2").alias("sketch2") ... ) >>> df.select(sf.tuple_sketch_estimate_integer(sf.tuple_union_theta_integer(df.sketch1, "sketch2"))).show() # noqa +-----------------------------------------------------------------------------------+ |tuple_sketch_estimate_integer(tuple_union_theta_integer(sketch1, sketch2, 12, sum))| +-----------------------------------------------------------------------------------+ | 4.0| +-----------------------------------------------------------------------------------+