amazon web services - Query S3 logs content using Athena or DynamoDB -
i have use case query request url s3 logs. amazon has introduced athena query s3 file contents. best option respect cost , performance?
- use athena query s3 files request urls
- store metadata of each file request url information in dynamodb table query
amazon dynamodb poor choice running queries on web logs.
dynamodb super-fast, if retrieving data based upon primary key ("query"). if running query against all data in table (eg find particular ip address in key not indexed), dynamodb need scan through rows in table, takes lot of time ("scan"). example, if table configured 100 reads per second , scanning 10000 rows, take 100 seconds (100 x 100 = 10000).
tip: not full-table scans in nosql database.
amazon athena ideal scanning log files! there no need pre-load data - run query against logs stored in amazon s3. use standard sql find data you're seeking. plus, pay data read disk. file format bit weird, you'll need correct create table
statement.
see: using aws athena query s3 server access logs
another choice use amazon redshift, can gbs, tbs , pbs of data across billions of rows. if going run frequent queries against log data, redshift great. however, being standard sql database, need pre-load data redshift. unfortunately, amazon s3 log files not in csv format, need etl files suitable format. isn't worthwhile occasional, ad-hoc requests.
many people use amazon elasticsearch service scanning log files. again, file format needs special handling , pipeline load data needs work, result near-realtime interactive analysis of s3 log files.
Comments
Post a Comment