How to convert an array extracted from a json string field to a bigquery Repeated field? -
we have loaded json blobs in string field in bigquery table. need create view (using standard sql)over table extract array field bigquery array/repeated field of "record" type (which includes repeated field).
here sample record (json_blob):
{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]}
i hoping end view has following layout:
[ { "name": "order_id", "type": "string", "mode": "nullable" }, { "mode": "nullable", "name": "customer_id", "type": "string" }, { "mode": "repeated", "name": "items", "type": "record", "fields": [ { "mode": "nullable", "name": "line", "type": "string" }, { "mode": "repeated", "name": "ref_ids", "type": "string" }, { "mode": "nullable", "name": "sku", "type": "string" }, { "mode": "nullable", "name": "amount", "type": "integer" } ] } ]
json_extract(json_blob, '$.items') extracts items parts, how convert bigquery array of type "record" can processed normal bigquery array/repeated of struct?
appreciate help.
there no way using sql functions in bigquery @ time of writing unless can impose hard limit on number of values in json array; see the relevant issue tracker item. options are:
- process data differently (e.g. using cloud dataflow or tool) can load newline-delimited json bigquery.
- use javascript udf takes input json , returns desired type; straightforward uses more cpu (and hence may require higher billing tier).
- use sql functions understanding solution breaks down if there many elements.
here approach using javascript udf:
#standardsql create temp function jsontoitems(input string) returns struct<order_id int64, customer_id string, items array<struct<line string, ref_ids array<string>, sku string, amount int64>>> language js """ return json.parse(input); """; input ( select '{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]}' json ) select jsontoitems(json).* input;
if want try sql-based approach without javascript, here's of hack until feature request above resolved, number of array elements must no more 10:
#standardsql create temp function jsonextractrefids(json string) ( (select array_agg(v ignore nulls) unnest([ json_extract_scalar(json, '$.ref_ids[0]'), json_extract_scalar(json, '$.ref_ids[1]'), json_extract_scalar(json, '$.ref_ids[2]'), json_extract_scalar(json, '$.ref_ids[3]'), json_extract_scalar(json, '$.ref_ids[4]'), json_extract_scalar(json, '$.ref_ids[5]'), json_extract_scalar(json, '$.ref_ids[6]'), json_extract_scalar(json, '$.ref_ids[7]'), json_extract_scalar(json, '$.ref_ids[8]'), json_extract_scalar(json, '$.ref_ids[9]')]) v) ); create temp function jsontoitem(json string) returns struct<line string, ref_ids array<string>, sku string, amount int64> ( if(json null, null, struct( json_extract_scalar(json, '$.line'), jsonextractrefids(json), json_extract_scalar(json, '$.sku'), cast(json_extract_scalar(json, '$.amount') int64) ) ) ); create temp function jsontoitems(json string) ( (select struct cast(json_extract_scalar(json, '$.order_id') int64) order_id, json_extract_scalar(json, '$.customer_id') customer_id, (select array_agg(v ignore nulls) unnest([ jsontoitem(json_extract(json, '$.items[0]')), jsontoitem(json_extract(json, '$.items[1]')), jsontoitem(json_extract(json, '$.items[2]')), jsontoitem(json_extract(json, '$.items[3]')), jsontoitem(json_extract(json, '$.items[4]')), jsontoitem(json_extract(json, '$.items[5]')), jsontoitem(json_extract(json, '$.items[6]')), jsontoitem(json_extract(json, '$.items[7]')), jsontoitem(json_extract(json, '$.items[8]')), jsontoitem(json_extract(json, '$.items[9]'))]) v) items ) ); input ( select '{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]}' json ) select jsontoitems(json).* input;
Comments
Post a Comment