python - Count in pyspark -

June 15, 2011

i have spark dataframe df column "id" (string) , column "values" (array of strings). want create column called count contains count of values each id.

df looks -

id        values 1fdf67    [dhjy1,jh87w3,89yt5re] df45l1    [hj098,hg45l0,sass65r4,dh6t21]

result should -

id        values                          count 1fdf67    [dhjy1,jh87w3,89yt5re]          3 df45l1    [hj098,hg45l0,sass65r4,dh6t21]  4

i trying below -

df= df.select(id,values).todf(id,values,values.count())

this doesn't seem working requirement.

please use size function:

from pyspark.sql.functions import size  df = spark.createdataframe([     ("1fdf67", ["dhjy1", "jh87w3", "89yt5re"]),     ("df45l1", ["hj098", "hg45l0", "sass65r4", "dh6t21"])],     ("id", "values"))  df.select("*", size("values").alias("count")).show(2, false)  +------+---------------------------------+-----+ |id    |values                           |count| +------+---------------------------------+-----+ |1fdf67|[dhjy1, jh87w3, 89yt5re]         |3    | |df45l1|[hj098, hg45l0, sass65r4, dh6t21]|4    | +------+---------------------------------+-----+

Search This Blog

RT

python - Count in pyspark -

Comments

Post a Comment

Popular posts from this blog

javascript - Replicate keyboard event with html button -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

Ansible warning on jinja2 braces on when -