amazon web services - Query S3 logs content using Athena or DynamoDB -

April 15, 2015

i have use case query request url s3 logs. amazon has introduced athena query s3 file contents. best option respect cost , performance?

use athena query s3 files request urls
store metadata of each file request url information in dynamodb table query

amazon dynamodb poor choice running queries on web logs.

dynamodb super-fast, if retrieving data based upon primary key ("query"). if running query against all data in table (eg find particular ip address in key not indexed), dynamodb need scan through rows in table, takes lot of time ("scan"). example, if table configured 100 reads per second , scanning 10000 rows, take 100 seconds (100 x 100 = 10000).

tip: not full-table scans in nosql database.

amazon athena ideal scanning log files! there no need pre-load data - run query against logs stored in amazon s3. use standard sql find data you're seeking. plus, pay data read disk. file format bit weird, you'll need correct create table statement.

see: using aws athena query s3 server access logs

another choice use amazon redshift, can gbs, tbs , pbs of data across billions of rows. if going run frequent queries against log data, redshift great. however, being standard sql database, need pre-load data redshift. unfortunately, amazon s3 log files not in csv format, need etl files suitable format. isn't worthwhile occasional, ad-hoc requests.

many people use amazon elasticsearch service scanning log files. again, file format needs special handling , pipeline load data needs work, result near-realtime interactive analysis of s3 log files.

see: using elk stack analyze s3 logs

Search This Blog

RT

amazon web services - Query S3 logs content using Athena or DynamoDB -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

html - How to custom Bootstrap grid height? -