scala - Case insensitive search in array type column spark dataframe -
i have spark data frame following:
+----------+-------------------------------------------------+ |col1 |words | +----------+-------------------------------------------------+ |an |[an, attractive, ,, thin, low, profile] | |attractive|[an, attractive, ,, thin, low, profile] | |, |[an, attractive, ,, thin, low, profile] | |thin |[an, attractive, ,, thin, low, profile] | |rail |[an, attractive, ,, thin, low, profile] | |profile |[an, attractive, ,, thin, low, profile] | |lighter |[lighter, than, metal, ,, level, ,, and, tes] | |than |[lighter, than, metal, ,, level, ,, and, tww] | |steel |[lighter, than, metal, ,, level, ,, and, test] | |, |[lighter, than, metal, ,, level, ,, and, test] | |level |[lighter, than, metal, ,, level, ,, and, test] | |, |[lighter, than, metal, ,, level, ,, and, ste] | |and |[lighter, than, metal, ,, level, ,, and, ste] | |test |[lighter, than, metal, ,, level, ,, and, ste] | |renewable |[renewable, resource] | |resource |[renewable, resource] | |no |[no1, bal, testme, saves, time, and, money] | +----------+-------------------------------------------------+
i want filter data above column case insensitive. doing this.
df.filter(array('words, "level")).show(false)
but not showing data. please me resolve issue.
for can create simple udf converts both case lower case , filters
here simple example,
scala> import spark.implicits._ import spark.implicits._ scala> import org.apache.spark.sql.functions._ import org.apache.spark.sql.functions._ scala> val df = seq(("an", list("an", "attractive"," ","", "thin", "low", "profile")), ("lighter", list("lighter", "than", "metal"," " ,"", "level"," " ,"", "and", "tes"))).todf("col1", "words") df: org.apache.spark.sql.dataframe = [col1: string, words: array<string>] scala> val filterudf = udf((arr: seq[string]) => arr.map(_.tolowercase).contains("level".tolowercase)) filterudf: org.apache.spark.sql.expressions.userdefinedfunction = userdefinedfunction(<function1>,booleantype,some(list(arraytype(stringtype,true)))) scala> df.filter(filterudf($"words")).show(false) +-------+-------------------------------------------------+ |col1 |words | +-------+-------------------------------------------------+ |lighter|[lighter, than, metal, , , level, , , and, tes]| +-------+-------------------------------------------------+
hope helps!
Comments
Post a Comment