How to convert an array extracted from a json string field to a bigquery Repeated field? -


we have loaded json blobs in string field in bigquery table. need create view (using standard sql)over table extract array field bigquery array/repeated field of "record" type (which includes repeated field).

here sample record (json_blob):

{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]} 

i hoping end view has following layout:

[ {     "name": "order_id",     "type": "string",     "mode": "nullable" }, {     "mode": "nullable",     "name": "customer_id",     "type": "string" }, {     "mode": "repeated",     "name": "items",     "type": "record",     "fields": [         {             "mode": "nullable",             "name": "line",             "type": "string"         },         {             "mode": "repeated",             "name": "ref_ids",             "type": "string"         },         {             "mode": "nullable",             "name": "sku",             "type": "string"         },         {             "mode": "nullable",             "name": "amount",             "type": "integer"         }     ] } ] 

json_extract(json_blob, '$.items') extracts items parts, how convert bigquery array of type "record" can processed normal bigquery array/repeated of struct?

appreciate help.

there no way using sql functions in bigquery @ time of writing unless can impose hard limit on number of values in json array; see the relevant issue tracker item. options are:

  • process data differently (e.g. using cloud dataflow or tool) can load newline-delimited json bigquery.
  • use javascript udf takes input json , returns desired type; straightforward uses more cpu (and hence may require higher billing tier).
  • use sql functions understanding solution breaks down if there many elements.

here approach using javascript udf:

#standardsql create temp function jsontoitems(input string) returns struct<order_id int64, customer_id string, items array<struct<line string, ref_ids array<string>, sku string, amount int64>>> language js """ return json.parse(input); """;  input (   select '{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]}' json ) select   jsontoitems(json).* input; 

if want try sql-based approach without javascript, here's of hack until feature request above resolved, number of array elements must no more 10:

#standardsql create temp function jsonextractrefids(json string) (   (select array_agg(v ignore nulls)    unnest([      json_extract_scalar(json, '$.ref_ids[0]'),      json_extract_scalar(json, '$.ref_ids[1]'),      json_extract_scalar(json, '$.ref_ids[2]'),      json_extract_scalar(json, '$.ref_ids[3]'),      json_extract_scalar(json, '$.ref_ids[4]'),      json_extract_scalar(json, '$.ref_ids[5]'),      json_extract_scalar(json, '$.ref_ids[6]'),      json_extract_scalar(json, '$.ref_ids[7]'),      json_extract_scalar(json, '$.ref_ids[8]'),      json_extract_scalar(json, '$.ref_ids[9]')]) v) );  create temp function jsontoitem(json string) returns struct<line string, ref_ids array<string>, sku string, amount int64> (   if(json null, null,     struct(       json_extract_scalar(json, '$.line'),       jsonextractrefids(json),       json_extract_scalar(json, '$.sku'),       cast(json_extract_scalar(json, '$.amount') int64)     )   ) );  create temp function jsontoitems(json string) (   (select struct     cast(json_extract_scalar(json, '$.order_id') int64) order_id,     json_extract_scalar(json, '$.customer_id') customer_id,     (select array_agg(v ignore nulls)      unnest([        jsontoitem(json_extract(json, '$.items[0]')),        jsontoitem(json_extract(json, '$.items[1]')),        jsontoitem(json_extract(json, '$.items[2]')),        jsontoitem(json_extract(json, '$.items[3]')),        jsontoitem(json_extract(json, '$.items[4]')),        jsontoitem(json_extract(json, '$.items[5]')),        jsontoitem(json_extract(json, '$.items[6]')),        jsontoitem(json_extract(json, '$.items[7]')),        jsontoitem(json_extract(json, '$.items[8]')),        jsontoitem(json_extract(json, '$.items[9]'))]) v) items   ) );  input (   select '{"order_id":"123456","customer_id":"2abcd", "items":[{"line":"1","ref_ids":["66b56e60","9e7ca2b7"],"sku":"1111","amount":40 },{"line":"2","ref_ids":["7777h0","8888j0"],"sku":"2222","amount":10 }]}' json ) select   jsontoitems(json).* input; 

Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -