python - Count in pyspark -
i have spark dataframe df column "id" (string) , column "values" (array of strings). want create column called count contains count of values each id.
df looks -
id values 1fdf67 [dhjy1,jh87w3,89yt5re] df45l1 [hj098,hg45l0,sass65r4,dh6t21]
result should -
id values count 1fdf67 [dhjy1,jh87w3,89yt5re] 3 df45l1 [hj098,hg45l0,sass65r4,dh6t21] 4
i trying below -
df= df.select(id,values).todf(id,values,values.count())
this doesn't seem working requirement.
please use size
function:
from pyspark.sql.functions import size df = spark.createdataframe([ ("1fdf67", ["dhjy1", "jh87w3", "89yt5re"]), ("df45l1", ["hj098", "hg45l0", "sass65r4", "dh6t21"])], ("id", "values")) df.select("*", size("values").alias("count")).show(2, false) +------+---------------------------------+-----+ |id |values |count| +------+---------------------------------+-----+ |1fdf67|[dhjy1, jh87w3, 89yt5re] |3 | |df45l1|[hj098, hg45l0, sass65r4, dh6t21]|4 | +------+---------------------------------+-----+
Comments
Post a Comment