scala - How to replace in values in spark dataframes after recalculations? -
i have schema in spark as
root |-- atom: array (nullable = true) | |-- element: struct (containsnull = true) | | |-- dailydata: array (nullable = true) | | | |-- element: struct (containsnull = true) | | | | |-- datatimezone: string (nullable = true) | | | | |-- intervaltime: long (nullable = true) | | | | |-- intervalvalue: long (nullable = true) | | | | |-- utcacquisitiontime: string (nullable = true) | | |-- usage: string (nullable = true) | -- titlename: string (nullable = true) i have extracted utcacquisitiontime , datatimezone below above schema
val result=q.selectexpr("explode(dailydata) r").select("r.utcacquisitiontime","r.datatimezone") +--------------------+------------+ | utcacquisitiontime|datatimezone| +--------------------+------------+ |2017-03-27t22:00:00z| +02:00| |2017-03-27t22:15:00z| +02:00| |2017-03-27t22:30:00z| +02:00| |2017-03-27t22:45:00z| +02:00| |2017-03-27t23:00:00z| +02:00| |2017-03-27t23:15:00z| +02:00| |2017-03-27t23:30:00z| +02:00| |2017-03-27t23:45:00z| +02:00| |2017-03-28t00:00:00z| +02:00| |2017-03-28t00:15:00z| +02:00| |2017-03-28t00:30:00z| +02:00| |2017-03-28t00:45:00z| +02:00| |2017-03-28t01:00:00z| +02:00| |2017-03-28t01:15:00z| +02:00| |2017-03-28t01:30:00z| +02:00| |2017-03-28t01:45:00z| +02:00| |2017-03-28t02:00:00z| +02:00| |2017-03-28t02:15:00z| +02:00| |2017-03-28t02:30:00z| +02:00| |2017-03-28t02:45:00z| +02:00| +--------------------+------------+ i need calculate localtime using these 2 columns , replace them localtime after calculations. how shall calculate localtime , replace same?
you can rely on udf function in spark (user defined function). in org.apache.sql.functions._ there plenty of predefined function might you. here how can make work
+-------------------+------------+ | utcacquisitiontime|datatimezone| +-------------------+------------+ |2017-03-27t22:00:00| +02:00| +-------------------+------------+ note have removed unnecessary "z" time column. using jodatime dependency define udf function this:
val totimestamp = udf((time:string, zone:string) => { val timezone = datetimezone.forid(zone) val df = datetimeformat.forpattern("yyyy-mm-dd't'hh:mm:ss") new java.sql.timestamp(df.withzone(timezone).parsedatetime(time).getmillis) }) apply on column withcolumn
df.withcolumn("timestamp", totimestamp(col("utcacquisitiontime"), col("datatimezone")) show results (note in schema column timestamp of type timestamp can date operation on it)
+-------------------+------------+--------------------+ | utcacquisitiontime|datatimezone| timestamp| +-------------------+------------+--------------------+ |2017-03-27t22:00:00| +02:00|2017-01-27 22:00:...| +-------------------+------------+--------------------+ root |-- utcacquisitiontime: string (nullable = true) |-- datatimezone: string (nullable = true) |-- timestamp: timestamp (nullable = true)
Comments
Post a Comment