pyspark - Inserting and Deleting data in a Spark Dataframe -


i have pyspark dataframe input_dataframe shown below:

**cust_id**   **source_id**     **value**    10              11          test_value    10              12          test_value2 

i have dataframe delta_dataframe have updated records input_dataframe , new records shown below:

**cust_id**   **source_id**     **value**    10              11          update_value    10              15          new_value2 

in both dataframe, primary key combination of cust_id , source_id.

i have generate new dataframe output_dataframe, have records input_dataframe updated records delta_dataframe, final dataframe below:

**cust_id**   **source_id**     **value**    10              11          update_value    10              12          test_value2    10              15          new_value2 

can please suggest me, how can achieve in pyspark. appreciated on this.

subtract 2 dataframes based on primary key. make inner join of output input_dataframe. take uion of delta_dataframe. proper output.


Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -