pyspark - Inserting and Deleting data in a Spark Dataframe -
i have pyspark dataframe input_dataframe shown below:
**cust_id** **source_id** **value** 10 11 test_value 10 12 test_value2 i have dataframe delta_dataframe have updated records input_dataframe , new records shown below:
**cust_id** **source_id** **value** 10 11 update_value 10 15 new_value2 in both dataframe, primary key combination of cust_id , source_id.
i have generate new dataframe output_dataframe, have records input_dataframe updated records delta_dataframe, final dataframe below:
**cust_id** **source_id** **value** 10 11 update_value 10 12 test_value2 10 15 new_value2 can please suggest me, how can achieve in pyspark. appreciated on this.
subtract 2 dataframes based on primary key. make inner join of output input_dataframe. take uion of delta_dataframe. proper output.
Comments
Post a Comment