regex - Spark column rlike converts int to boolean -
so i'm using regex spark's column rlike extract last digit string. problem after extracts digit, automatically gets converted boolean. there way me stop being automatically converted boolean?
test.withcolumn("quarter", $"month".rlike("\\d+$"))
for example:
input:
2015 q 1 2015 q 1 2015 q 2 2015 q 2
output:
true true true true
expected: 1 1 2 2
i tried casting after integer returns 1 because gets converted boolean int.
test.withcolumn("quarter", $"month".rlike("\\d+$").cast("integer"))
spark has function extract matching regex, can use regexp_extract function this.
scala> val df = seq("2015 q 1", "2015 q 1", "2015 q 2", "2015 q 2").todf("col1") df: org.apache.spark.sql.dataframe = [col1: string] scala> import org.apache.spark.sql.functions._ import org.apache.spark.sql.functions._ scala> df.withcolumn("quarter",regexp_extract($"col1", ".*(\\d+)$", 1)).show +--------+-------+ | col1|quarter| +--------+-------+ |2015 q 1| 1| |2015 q 1| 1| |2015 q 2| 2| |2015 q 2| 2| +--------+-------+
Comments
Post a Comment