python - Pyspark Type Conversion Issue from Date to String -
i using pyspark 2.1 . below dataframe content
expecteddays,date 139,30.jul.2017 134,01.nov.2018 my output should below
138,30.jul.2017,<30/sep/2018,4/feb/2019> poupulation of last column taken care below modules daterangebetween , get_date
below code
from datetime import datetime datetime import timedelta import pandas pd datetime import timedelta pyspark.sql import sparksession pyspark import sparkcontext pyspark.sql.functions import concat,explode datetime import datetime pyspark.sql.functions import udf pyspark.sql.types import stringtype datetime import timedelta import pandas pd pyspark.sql.types import arraytype, structtype, structfield, integertype pyspark.sql import types maintenance_final_join=spark.read.csv('/user/naveensri/adh_dev_engg/test.csv',header=true) def get_date(dateformat="%d-%m-%y", adddays=0 ,timenow=0 ): #print('inside date',timesnow) if (adddays!=0): anothertime = timenow + timedelta(days=adddays) else: anothertime = timenow return anothertime.strftime(dateformat) def daterangebetween(expecteddate , estimateddays): output_format = '%d-%m-%y' daterangelist =[] j=2 #print('inside date range',expecteddate) rangeenddate= datetime.strptime(get_date(output_format, 730,expecteddate), '%d-%m-%y').date() #print('rangeenddate---',rangeenddate) calculateddate = datetime.strptime(get_date(output_format,estimateddays ,expecteddate), '%d-%m-%y').date() #print('calculateddate----',calculateddate) while(calculateddate<=rangeenddate): # print(calculateddate) #print (estimateddays) daterangelist.append(calculateddate) calculateddate = datetime.strptime(get_date(output_format,estimateddays ,calculateddate), '%d-%m-%y').date() #print('-----', datetime.strptime(get_date(output_format,estimateddays ,calculateddate), '%d-%m-%y').date()) return daterangelist daterange = udf(daterangebetween, types.arraytype(types.stringtype())) adddays=182 result = maintenance_final_join.withcolumn('part_dates',daterange(maintenance_final_join.expected,maintenance_final_join.estimateddays)).show() after executing getting error:
typeerror: coercing unicode: need string or buffer, datetime.timedelta found
first of all, please fix indent. daterangebetween() function difficult read is.
however, problem in this:
daterangelist.append(calculateddate) calculateddate = datetime.strptime(get_date(output_format,estimateddays, calculateddate), '%d-%m-%y').date() your calculateddate datetime object. append object (not string representation) daterangelist , return this. in main program, try udf array of datetime objects.
i assume intention use string representations. if changed
daterangelist.append(calculateddate.strftime("......")) and inserted correct format string in place of dots, @ least processing string objects instead of datetimes.
Comments
Post a Comment