google cloud datastore - DataFlow not acking PubSub messages -

January 15, 2012

simple gcloud dataflow pipeline:

pubsubio.readstrings().fromsubscription -> window -> pardo -> datastoreio.v1().write()

when load applied pubsub topic, messages read not acked:

jul 25, 2017 4:20:38 pm org.apache.beam.sdk.io.gcp.pubsub.pubsubunboundedsource$pubsubreader stats info: pubsub projects/my-project/subscriptions/my-subscription has 1000 received messages, 950 current unread messages, 843346 current unread bytes, 970 current in-flight msgs, 28367ms oldest in-flight, 1 current in-flight checkpoints, 2 max in-flight checkpoints, 770b/s recent read, 1000 recent received, 0 recent extended, 0 recent late extended, 50 recent acked, 990 recent nacked, 0 recent expired, 898ms recent message timestamp skew, 9224873061464212ms recent watermark skew, 0 recent late messages, 2017-07-25t23:16:49.437z last reported watermark

what pipeline step should ack messages?

stackdriver dashboard shows there acks number of unacked messages stays stable.
no error messages in trace indicating message processing failed.
entries show in datastore

dataflow acknowledge pubsub messages after durably committed somewhere else. in pipeline consists of pubsub -> pardo -> 1 or more sinks, may delayed of sinks having problems (even if being retried, slow things down). part of ensuring results seem processed effectively-once. see a previous question when dataflow acknowledges message more details.

one (easy) option change behavior add groupbykey (using randomly generated key) after pubsub source , before sinks. cause messages acknowledged earlier, may perform worse, since pubsub better @ holding unprocessed inputs groupbykey.

Search This Blog

RT

google cloud datastore - DataFlow not acking PubSub messages -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -