scala - Case insensitive search in array type column spark dataframe -

June 15, 2011

i have spark data frame following:

 +----------+-------------------------------------------------+    |col1      |words                                            |    +----------+-------------------------------------------------+    |an        |[an, attractive, ,, thin, low, profile]          |    |attractive|[an, attractive, ,, thin, low, profile]          |    |,         |[an, attractive, ,, thin, low, profile]          |    |thin      |[an, attractive, ,, thin, low, profile]          |      |rail      |[an, attractive, ,, thin, low, profile]          |    |profile   |[an, attractive, ,, thin, low, profile]          |    |lighter   |[lighter, than, metal, ,, level, ,, and, tes]    |    |than      |[lighter, than, metal, ,, level, ,, and, tww]    |    |steel     |[lighter, than, metal, ,, level, ,, and, test]   |    |,         |[lighter, than, metal, ,, level, ,, and, test]   |    |level     |[lighter, than, metal, ,, level, ,, and, test]   |    |,         |[lighter, than, metal, ,, level, ,, and, ste]    |    |and       |[lighter, than, metal, ,, level, ,, and, ste]    |    |test      |[lighter, than, metal, ,, level, ,, and, ste]    |    |renewable |[renewable, resource]                            |    |resource  |[renewable, resource]                            |    |no        |[no1, bal, testme, saves, time, and, money]      |    +----------+-------------------------------------------------+

i want filter data above column case insensitive. doing this.

df.filter(array('words, "level")).show(false)

but not showing data. please me resolve issue.

for can create simple udf converts both case lower case , filters

here simple example,

scala> import spark.implicits._ import spark.implicits._  scala> import org.apache.spark.sql.functions._ import org.apache.spark.sql.functions._  scala> val df = seq(("an", list("an", "attractive"," ","", "thin", "low", "profile")), ("lighter", list("lighter", "than", "metal"," " ,"", "level"," " ,"", "and", "tes"))).todf("col1", "words") df: org.apache.spark.sql.dataframe = [col1: string, words: array<string>]  scala> val filterudf = udf((arr: seq[string]) => arr.map(_.tolowercase).contains("level".tolowercase)) filterudf: org.apache.spark.sql.expressions.userdefinedfunction = userdefinedfunction(<function1>,booleantype,some(list(arraytype(stringtype,true))))  scala> df.filter(filterudf($"words")).show(false)  +-------+-------------------------------------------------+ |col1   |words                                            | +-------+-------------------------------------------------+ |lighter|[lighter, than, metal,  , , level,  , , and, tes]| +-------+-------------------------------------------------+

hope helps!

Search This Blog

RT

scala - Case insensitive search in array type column spark dataframe -

Comments

Post a Comment

Popular posts from this blog

javascript - Replicate keyboard event with html button -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

Ansible warning on jinja2 braces on when -