scala - Case insensitive search in array type column spark dataframe -


i have spark data frame following:

 +----------+-------------------------------------------------+    |col1      |words                                            |    +----------+-------------------------------------------------+    |an        |[an, attractive, ,, thin, low, profile]          |    |attractive|[an, attractive, ,, thin, low, profile]          |    |,         |[an, attractive, ,, thin, low, profile]          |    |thin      |[an, attractive, ,, thin, low, profile]          |      |rail      |[an, attractive, ,, thin, low, profile]          |    |profile   |[an, attractive, ,, thin, low, profile]          |    |lighter   |[lighter, than, metal, ,, level, ,, and, tes]    |    |than      |[lighter, than, metal, ,, level, ,, and, tww]    |    |steel     |[lighter, than, metal, ,, level, ,, and, test]   |    |,         |[lighter, than, metal, ,, level, ,, and, test]   |    |level     |[lighter, than, metal, ,, level, ,, and, test]   |    |,         |[lighter, than, metal, ,, level, ,, and, ste]    |    |and       |[lighter, than, metal, ,, level, ,, and, ste]    |    |test      |[lighter, than, metal, ,, level, ,, and, ste]    |    |renewable |[renewable, resource]                            |    |resource  |[renewable, resource]                            |    |no        |[no1, bal, testme, saves, time, and, money]      |    +----------+-------------------------------------------------+   

i want filter data above column case insensitive. doing this.

df.filter(array('words, "level")).show(false) 

but not showing data. please me resolve issue.

for can create simple udf converts both case lower case , filters

here simple example,

scala> import spark.implicits._ import spark.implicits._  scala> import org.apache.spark.sql.functions._ import org.apache.spark.sql.functions._  scala> val df = seq(("an", list("an", "attractive"," ","", "thin", "low", "profile")), ("lighter", list("lighter", "than", "metal"," " ,"", "level"," " ,"", "and", "tes"))).todf("col1", "words") df: org.apache.spark.sql.dataframe = [col1: string, words: array<string>]  scala> val filterudf = udf((arr: seq[string]) => arr.map(_.tolowercase).contains("level".tolowercase)) filterudf: org.apache.spark.sql.expressions.userdefinedfunction = userdefinedfunction(<function1>,booleantype,some(list(arraytype(stringtype,true))))  scala> df.filter(filterudf($"words")).show(false)  +-------+-------------------------------------------------+ |col1   |words                                            | +-------+-------------------------------------------------+ |lighter|[lighter, than, metal,  , , level,  , , and, tes]| +-------+-------------------------------------------------+ 

hope helps!


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -