scala - How to replace in values in spark dataframes after recalculations? -


i have schema in spark as

root |-- atom: array (nullable = true) |    |-- element: struct (containsnull = true) |    |    |-- dailydata: array (nullable = true) |    |    |    |-- element: struct (containsnull = true) |    |    |    |    |-- datatimezone: string (nullable = true) |    |    |    |    |-- intervaltime: long (nullable = true) |    |    |    |    |-- intervalvalue: long (nullable = true) |    |    |    |    |-- utcacquisitiontime: string (nullable = true) |    |    |-- usage: string (nullable = true) | -- titlename: string (nullable = true) 

i have extracted utcacquisitiontime , datatimezone below above schema

val result=q.selectexpr("explode(dailydata) r").select("r.utcacquisitiontime","r.datatimezone")  +--------------------+------------+ |  utcacquisitiontime|datatimezone| +--------------------+------------+ |2017-03-27t22:00:00z|      +02:00| |2017-03-27t22:15:00z|      +02:00| |2017-03-27t22:30:00z|      +02:00| |2017-03-27t22:45:00z|      +02:00| |2017-03-27t23:00:00z|      +02:00| |2017-03-27t23:15:00z|      +02:00| |2017-03-27t23:30:00z|      +02:00| |2017-03-27t23:45:00z|      +02:00| |2017-03-28t00:00:00z|      +02:00| |2017-03-28t00:15:00z|      +02:00| |2017-03-28t00:30:00z|      +02:00| |2017-03-28t00:45:00z|      +02:00| |2017-03-28t01:00:00z|      +02:00| |2017-03-28t01:15:00z|      +02:00| |2017-03-28t01:30:00z|      +02:00| |2017-03-28t01:45:00z|      +02:00| |2017-03-28t02:00:00z|      +02:00| |2017-03-28t02:15:00z|      +02:00| |2017-03-28t02:30:00z|      +02:00| |2017-03-28t02:45:00z|      +02:00| +--------------------+------------+ 

i need calculate localtime using these 2 columns , replace them localtime after calculations. how shall calculate localtime , replace same?

you can rely on udf function in spark (user defined function). in org.apache.sql.functions._ there plenty of predefined function might you. here how can make work

+-------------------+------------+ | utcacquisitiontime|datatimezone| +-------------------+------------+ |2017-03-27t22:00:00|      +02:00| +-------------------+------------+ 

note have removed unnecessary "z" time column. using jodatime dependency define udf function this:

val totimestamp = udf((time:string, zone:string) => {       val timezone = datetimezone.forid(zone)      val df = datetimeformat.forpattern("yyyy-mm-dd't'hh:mm:ss")      new java.sql.timestamp(df.withzone(timezone).parsedatetime(time).getmillis)      })  

apply on column withcolumn

df.withcolumn("timestamp", totimestamp(col("utcacquisitiontime"), col("datatimezone")) 

show results (note in schema column timestamp of type timestamp can date operation on it)

+-------------------+------------+--------------------+ | utcacquisitiontime|datatimezone|           timestamp| +-------------------+------------+--------------------+ |2017-03-27t22:00:00|      +02:00|2017-01-27 22:00:...| +-------------------+------------+--------------------+  root  |-- utcacquisitiontime: string (nullable = true)  |-- datatimezone: string (nullable = true)  |-- timestamp: timestamp (nullable = true) 

Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

angular - Copying node modules to wwwroot AspNetCore -