How delete the few rows in dataframe scala/spark? -
i hava dataframe,i want delete first , second rows in dataframe,what should do?
this inputs:
+-----+ |value| +-----+ | 1| | 4| | 3| | 5| | 4| | 18| -------
this except result:
+-----+ |value| +-----+ | 3| | 5| | 4| | 18| -------
in opinion not make sense speak first or second record if cannot define ordering of dataframe. ordering of records result of show
statement "arbitrary" , depends on partitioning of data.
suppose have column on can order records, can use window-functions. starting dataframe:
+----+-----+ |year|value| +----+-----+ |2007| 1| |2008| 4| |2009| 3| |2010| 5| |2011| 4| |2012| 18| +----+-----+
you can do
import org.apache.spark.sql.expressions.window df .withcolumn("rn",row_number().over(window.orderby($"year"))) .where($"rn">2).drop($"rn") .show
Comments
Post a Comment