apache spark - Scala dataframe calling function inside withColumn? -
here trying do, have 2 tables have same column names.
table this:
----------- b c d ----------- 1 2 3 4 5 6 3 4 7 8 3 4
the logic of problem need have is, compare b c d columns in table1 table2. if a,b match each other, return new column value 0, else return 0. if c table 3, return 0, else return 1. 1 value should returned each row, priority: c>d>a=b.
i joined 2 tables(dataframes), result in combineddf. how join them: table1.join(table2,table1($"a")=table2($"a"))
so here did:
def func(a:mutable.wrappedarray[string],b:mutable.wrappedarray[string],c:string,d:string) = {if(c=="3") "0"; else if(d=="4")"1"; else if ((0 a.length-1).exists(i => a(i) == b(i)))"0" else "1"}
for function want put a,b columns table1 in 1 array , a,b column table2 array , running loop check equality. (i need array because real case, have random number of columns need compare).
and here how call function.
combineddf.withcolumn("returnval",func(array(col("table1.a"),col("table1.b")), array(col("table2.a"),col("table2.b")),col("table1.c"),col("table1.d")))
but it's doesn't work, though put columns inside array using array function its' still telling me type mismatch.
error message: <console:67>: error: type mismatch; found:org.apache.spark.column required: string
in advance!
you can try this, me understand 1 thing why need combine dataframes, , mean if , b matches(my assumption per row, right?), if a,b,c,d columns string change integer string.
def func(a:integer,b:integer,c:integer,d:integer) = { if(c == 3) "0" else if(d == 4) "1" else if (a == b) "0" else "1" } val udffunc = udf(func _) combineddf.withcolumn("returnval", udffunc(col("table1.a"), col("table1.b"), col("table1.c"),col("table1.d") ) )
Comments
Post a Comment