Google Cloud Dataflow (Python): function to read from and write to a .csv file? -
i not able figure out precise functions in gcp dataflow python sdk read , write csv files (or non-txt files matter). bigquery, have figured out following functions:
beam.io.read(beam.io.bigquerysource('%table_id%')) beam.io.write(beam.io.bigquerysink('%table_id%'))
for reading textfiles, readfromtext , writetotext functions known me.
however, not able find examples gcp dataflow python sdk in data written or read csv files. please provide gcp dataflow python sdk functions reading , writing csv files in same manner have done functions relating bigquery above?
csv files text files. simplest (though inelegant) way of reading them readfromtext
, , split lines read on commas (e.g. beam.map(lambda x: x.split(','))
).
for more elegant option, check out this question, or use beam_utils
pip repository , use beam_utils.sources.csvfilesource
source read from.
Comments
Post a Comment