python - Pyspark Type Conversion Issue from Date to String -


i using pyspark 2.1 . below dataframe content

expecteddays,date  139,30.jul.2017  134,01.nov.2018 

my output should below

138,30.jul.2017,<30/sep/2018,4/feb/2019> 

poupulation of last column taken care below modules daterangebetween , get_date

below code

from datetime import datetime  datetime import timedelta import pandas pd datetime import timedelta pyspark.sql import sparksession pyspark import sparkcontext pyspark.sql.functions import concat,explode datetime import datetime pyspark.sql.functions import udf pyspark.sql.types import  stringtype datetime import timedelta import pandas pd pyspark.sql.types import arraytype, structtype, structfield, integertype pyspark.sql import types   maintenance_final_join=spark.read.csv('/user/naveensri/adh_dev_engg/test.csv',header=true)  def get_date(dateformat="%d-%m-%y", adddays=0 ,timenow=0 ):      #print('inside date',timesnow)     if (adddays!=0):         anothertime = timenow + timedelta(days=adddays)     else:         anothertime = timenow     return anothertime.strftime(dateformat) def daterangebetween(expecteddate , estimateddays): output_format = '%d-%m-%y'    daterangelist =[] j=2 #print('inside date range',expecteddate) rangeenddate= datetime.strptime(get_date(output_format, 730,expecteddate), '%d-%m-%y').date() #print('rangeenddate---',rangeenddate) calculateddate = datetime.strptime(get_date(output_format,estimateddays ,expecteddate), '%d-%m-%y').date() #print('calculateddate----',calculateddate)  while(calculateddate<=rangeenddate):        # print(calculateddate)     #print (estimateddays)       daterangelist.append(calculateddate)     calculateddate = datetime.strptime(get_date(output_format,estimateddays ,calculateddate), '%d-%m-%y').date()  #print('-----', datetime.strptime(get_date(output_format,estimateddays ,calculateddate), '%d-%m-%y').date())   return daterangelist  daterange = udf(daterangebetween, types.arraytype(types.stringtype())) adddays=182 result = maintenance_final_join.withcolumn('part_dates',daterange(maintenance_final_join.expected,maintenance_final_join.estimateddays)).show() 

after executing getting error:

typeerror: coercing unicode: need string or buffer, datetime.timedelta found 

first of all, please fix indent. daterangebetween() function difficult read is.

however, problem in this:

daterangelist.append(calculateddate) calculateddate = datetime.strptime(get_date(output_format,estimateddays,          calculateddate), '%d-%m-%y').date() 

your calculateddate datetime object. append object (not string representation) daterangelist , return this. in main program, try udf array of datetime objects.

i assume intention use string representations. if changed

daterangelist.append(calculateddate.strftime("......")) 

and inserted correct format string in place of dots, @ least processing string objects instead of datetimes.


Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -