scala - Check column datatype and execute sql only on Integer and Decimal in Spark SQL -
i'm trying check datatype of column input parquet file, if datatype integer or decimal run spark sql.
//get array of structfields val datatypes = parquetrdd_subset.schema.fields //check datatype of column (val_datatype <- datatypes) if (val_datatype.datatype.typename == "integer" || val_datatype.datatype.typename.contains("decimal")) { //get field name val x = parquetrdd_subset.schema.fieldnames val dfs = x.map(field => spark.sql(s"select 'dataprofilerstats' table_name,(select 100 * approx_count_distinct($field)/count(1) parquetdftable) percentage_unique_value parquetdftable")) }
the issue is, although, datatype validation successful, while inside loop after getting field names, not restricting columns integer or decimals, query being performed on columns types strings well. how fields decimal or integer. how address this.
this how can filter columns integer , double type
// fiter columns val columns = df.schema.fields.filter(x => x.datatype == integertype || x.datatype == doubletype) //use these filtered select df.select(columns.map(x => col(x.name)): _*)
i hope helps!
Comments
Post a Comment