search - ElasticSearch - Update or new index? -
requirements:
- a single elasticsearch index needs constructed bunch of flat files gets dropped every week
- apart weekly feed, intermittent diff files, providing additional data not part of original feed (insert or update, no delete)
- the time parse , load these files (weekly full feed or diff files) elasticsearch not huge
- the weekly feeds received in 2 consecutive weeks expected have significant differences (deletes, additions, updates)
- the index critical apps function , needs have close 0 downtime
- we not concerned exact changes made in feed, need have ability rollback previous version in case current load fails reason
- to state obvious, searches need fast , responsive
given these requirements, planning following:
- for incremental updates (diff) can insert or update records as-is using bulk api
- for full updates reconstruct new index , swap alias mentioned in post. in case of rollback, can revert previous working index (backups maintained if rollback needs go few versions)
questions:
- is best approach or better crud documents on created index using built-in versioning, when re-constructing index?
- what impact of modifying data (delete, update) underlying lucene indices/shards? can modifications cause fragmentation or inefficiency?
at first glance, i'd overall approach sound. creating new index every week new data , swapping alias approach if need
- zero downtime ,
- to able rollback previous indices whatever reason
if keep 1 index , crud documents in there, you'd not able rollback if goes wrong , end in mixed state data current week , data week earlier.
- every time update (even 1 single field) or delete document, previous version flagged deleted in underlying lucene segment. when lucene segments have grown sufficiently big, es merge them , wipe out deleted documents. however, in case, since you're creating index every week (and delete index week prior), won't land situation you'll have space and/or fragmentation issues.
Comments
Post a Comment