google cloud datastore - DataFlow not acking PubSub messages -


simple gcloud dataflow pipeline:

pubsubio.readstrings().fromsubscription -> window -> pardo -> datastoreio.v1().write()

when load applied pubsub topic, messages read not acked:

jul 25, 2017 4:20:38 pm org.apache.beam.sdk.io.gcp.pubsub.pubsubunboundedsource$pubsubreader stats info: pubsub projects/my-project/subscriptions/my-subscription has 1000 received messages, 950 current unread messages, 843346 current unread bytes, 970 current in-flight msgs, 28367ms oldest in-flight, 1 current in-flight checkpoints, 2 max in-flight checkpoints, 770b/s recent read, 1000 recent received, 0 recent extended, 0 recent late extended, 50 recent acked, 990 recent nacked, 0 recent expired, 898ms recent message timestamp skew, 9224873061464212ms recent watermark skew, 0 recent late messages, 2017-07-25t23:16:49.437z last reported watermark

what pipeline step should ack messages?

  • stackdriver dashboard shows there acks number of unacked messages stays stable.
  • no error messages in trace indicating message processing failed.
  • entries show in datastore

dataflow acknowledge pubsub messages after durably committed somewhere else. in pipeline consists of pubsub -> pardo -> 1 or more sinks, may delayed of sinks having problems (even if being retried, slow things down). part of ensuring results seem processed effectively-once. see a previous question when dataflow acknowledges message more details.

one (easy) option change behavior add groupbykey (using randomly generated key) after pubsub source , before sinks. cause messages acknowledged earlier, may perform worse, since pubsub better @ holding unprocessed inputs groupbykey.


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -