Amazon CloudFront real-time logs. Import to AWS ElasticSearch Service.

Not long ago, AWS announced that its CloudFront CDN would support real-time logs. Until then, we could only use the S3 log feature, which created a delay before data could actually be analysed. This was not sufficient for certain scenarios.

 

Fortunately, AWS is constantly thinking about us ?. So, they unveiled a solution to make our life a bit easier and more real-time (or something near real-time — see details below).
The task was to provide a quick and easy way to process CloudFront log data, ideally by having the possibility of building dashboards and performing some analytics.
We decided to export the logs into AWS Elastic search Service.
High-level solution diagram:

The first step is to prepare the Kinesis Data Stream.

 

 

There is nothing special in the configuration here — it is pretty much a case of ‘next-next-next’ clicks. You should pay attention to scaling; in our case, one shard was enough. One important note: as CloudFront is a global service, your Kinesis and Elasticsearch infrastructure should run in the US-East-1 region. It probably is possible to send the data to another region, but in our case we wanted to follow the KISS approach.

Once you have the Kinesis Data Stream, it’s time to configure your Kinesis Firehose.

Some deeper explanation is needed on this step. Kinesis transfers the log data from CloudFront as a log entry string. There is no built-in mechanism to transform data from the string to JSON (the format suitable for Elasticsearch). So, if you try to export the data without any transformation, you will get the mapper_parsing_exception error.

The transformation is made by Lambda (hello to AWS, which has no native way to achieve the goal ? )

As you can see from the code below, we’ve used only some of the log fields from our CloudFront to illustrate this. You can find the full list of available fields in our AWS Documentation.

 

The function extracts the fields from log entries and forms a JSON object. Make sure your fields are in the right order. The output should be Base64-encoded, otherwise Kinesis will throw the error.

Now, let’s talk about buffering. Kinesis Firehose can buffer the data based either on time or volume. In our case, we chose to run on the minimal possible values: 60s or 1MiB. Your settings may vary, depending on your needs.

 

 

 

The last thing here is S3 backup.

If the record cannot be processed by Lambda, or put down to Elasticsearch, it will be recorded to S3, for future manual processing.
The output configuration is pretty straightforward. You should create the Elasticsearch domain in US-East-1. It is alsouseful to set up an indices rotation. In our case, we chose daily rotation.

There are several settings for Elasticsearch that you need to select in order to enable it to work with Kinesis:

  • Create the role in Elasticsearch (for example firehose-role):

 

  • Create the role mapping for your Firehose IAM role:

 

 

 

After carrying out these steps, you should see the logs in your Elasticsearch cluster.

As I mentioned before, this solution will not give you truly real-time log ingestion to Elasticsearch.
Kinesis still works with batches, and you may achieve near real-time ingestion, which should be enough for most needs.