How delete the few rows in dataframe scala/spark? -

June 15, 2010

i hava dataframe,i want delete first , second rows in dataframe,what should do?

this inputs:

+-----+ |value| +-----+ |    1| |    4| |    3| |    5| |    4| |   18| -------

this except result:

+-----+ |value| +-----+ |    3| |    5| |    4| |   18| -------

in opinion not make sense speak first or second record if cannot define ordering of dataframe. ordering of records result of show statement "arbitrary" , depends on partitioning of data.

suppose have column on can order records, can use window-functions. starting dataframe:

+----+-----+ |year|value| +----+-----+ |2007|    1| |2008|    4| |2009|    3| |2010|    5| |2011|    4| |2012|   18| +----+-----+

you can do

import org.apache.spark.sql.expressions.window  df .withcolumn("rn",row_number().over(window.orderby($"year"))) .where($"rn">2).drop($"rn") .show

Search This Blog

RT

How delete the few rows in dataframe scala/spark? -

Comments

Post a Comment

Popular posts from this blog

javascript - Replicate keyboard event with html button -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

Ansible warning on jinja2 braces on when -