Outputting JSON logs on Elastic Beanstalk with Amazon Linux 2

Question

We're trying to migrate our Java applications from from the current Elastic Beanstalk JDK 8 platforms to the new ones running Corretto 11 on Amazon Linux 2. The app works well, but the way logs are handled changed. The output from the web process is now stored in /var/log/web.stdout.log and every line is prefixed with the timestamp and process name, ie:

May 20 17:00:00 ip-10-48-41-129 web: {"timestamp":"2020-05-20T17:00:00.035Z","message":"...","logger":"...","thread":"MessageBroker-2","level":"INFO"}

How can we get rid of the prefix? These logs are streamed into CloudWatch and we're outputting them to stdout in JSON so that we can later query them with Logs Insights. But with the prefix, Insights doesn't "see" the JSON and just treats the whole line as a text blob.

I cannot find any documentation for that on AWS. Almost all of Elastic Beanstalk's documentation refers to the first version of Amazon Linux.

Wit a bit of luck, this is still based on JUL. If it is, check this out: logicbig.com/tutorials/core-java-tutorial/logging/… . JUL lets you decide how StdOut should be formatted. — Gerard H. Pille, Commented May 20, 2020 at 18:28
I'm having simiar issue, but in addition to that my log collector agent is not able to read the file due to permissions [ -rw------- 1 root root 10176111 Sep 23 13:42 web.stdout.log ] Did you manage to make that file readable to other users? — igor, Commented Sep 23, 2020 at 13:47
The fact that there still isn't a solution that doesn't involve major hackery is crazy — Anthony Webb, Commented Dec 3, 2021 at 19:15

Mike · Accepted Answer · 2021-08-06 22:15:25Z

I used Platform Hooks to accomplish this. The only catch is /etc/rsyslog.d/web.conf is replaced on both application and configuration deployments, so you need a hook for both.

This approach avoids messing with Elastic Beanstalk's internal files in /opt/elasticbeanstalk/config/private (which have changed since the previous answers - rsyslog.conf no longer exists). Also, platform hooks are now preferred over ebextensions.

If you use CodeBuild, don't forget to include the platformFiles directory (or wherever you put your files) in your output artifact.

NOTE: This code assumes the name of the process is web. If you defined a different process name in your Procfile, use that instead. However, I think the rsyslog config should always be at /etc/rsyslog.d/web.conf despite the process name.

Make sure all of your .sh files are executable using chmod +x.

.platform/hooks/predeploy/10_logs.sh

#!/bin/sh
sudo platformFiles/setupLogs.sh

.platform/confighooks/predeploy/10_logs.sh

#!/bin/sh
sudo platformFiles/setupLogs.sh

platformFiles/setupLogs.sh

#!/bin/sh
# By default logs output to /var/log/web.stdout.log are prefixed. We want just the raw logs from the app.
# This updates the rsyslog config. Also grants read permissions to the log files.

set -eu

mv platformFiles/rsyslogWebConf.conf /etc/rsyslog.d/web.conf

touch /var/log/web.stdout.log
touch /var/log/web.stderr.log
chmod +r /var/log/web.stdout.log
chmod +r /var/log/web.stderr.log

systemctl restart rsyslog.service

platformFiles/rsyslogWebConf.conf

# This file is created from Elastic Beanstalk platform hooks.

template(name="WebTemplate" type="string" string="%msg%\n")

if $programname == 'web' then {
  *.=warning;*.=err;*.=crit;*.=alert;*.=emerg; /var/log/web.stderr.log;WebTemplate
  *.=info;*.=notice /var/log/web.stdout.log;WebTemplate
}

Conjecture

It looks like /opt/elasticbeanstalk/config/private/rsyslog.conf was replaced with /opt/elasticbeanstalk/config/private/rsyslog.conf.template:

# This rsyslog file redirects Elastic Beanstalk platform logs.
# Logs are initially sent to syslog, but we also want to divide
# stdout and stderr into separate log files.

{{range .ProcessNames}}if $programname  == '{{.}}' then {
  *.=warning;*.=err;*.=crit;*.=alert;*.=emerg /var/log/{{.}}.stderr.log
  *.=info;*.=notice /var/log/{{.}}.stdout.log
}
{{end}}

Based on this, I speculate Elastic Beanstalk uses this template to generate a single /etc/rsyslog.d/web.conf file which contains a block for every defined process name. Because both an application and a configuration deployment can change the defined processes, it makes sense this file is recreated after both.

Amazing answer! This worked immediately for me. It was driving me so crazy, I was thinking I was going to replatform the app on to ECS to get JSON parsing working. — mmilleruva, Commented Apr 27, 2022 at 21:22
Thanks! Works in November 2022, tried with Corretto 17 running on 64bit Amazon Linux 2/3.4.0. — knok16, Commented Nov 2, 2022 at 19:43

Michał Paluchowski · Accepted Answer · 2020-05-21 07:17:13Z

4

I found a solution that works well enough, so I'll post it here for posterity. If someone can suggest a better one, please do.

Elastic Beanstalk on Amazon Linux 2 relies on rsyslog for log processing and output. There's a file at /opt/elasticbeanstalk/config/private/rsyslog.conf that during deployment gets copied over to /etc/rsyslog.d/web.conf and it's the one that's directing all output from the web application to /var/log/web.stdout.log.

The file doesn't include any custom template. It relies on rsyslog's default template, which prefixes any %msg% with a timestamp and $programname (which is web in this case).

I tried replacing this file via an .ebextensions config, but that didn't work, because Elastic Beanstalk seems to be overwriting this file after .ebextensions run. So I added an additional platform hook that deletes the file, keeping the custom one I added.

Here's the .ebextensions/logs.config file:

files:
  "/etc/rsyslog.d/web-files.conf":
    mode: "000644"
    owner: root
    group: root
    content: |
      template(name="WebTemplate" type="string" string="%msg%\n")

      if $programname == 'web' then {
        *.=warning;*.=err;*.=crit;*.=alert;*.=emerg; /var/log/web.stderr.log;WebTemplate
        *.=info;*.=notice /var/log/web.stdout.log;WebTemplate
      }

commands:
  remove-.bak-rsyslog:
    command: rm -f *.bak
    cwd: /etc/rsyslog.d

and .platform/hooks/predeploy/remove-default-rsyslog-conf.sh (make sure you chmod +x this one):

#!/bin/sh
rm /etc/rsyslog.d/web.conf
systemctl restart rsyslog.service

answered May 21, 2020 at 7:17

Michał Paluchowski

1411 silver badge6 bronze badges

The other option that I found would be to create your own log files and add it to Elastic Beanstalk's logtasks. The documentation is out of date but I do believe the folders are /opt/elasticbeanstalk/config/private/logtasks/bundle and /opt/elasticbeanstalk/config/private/logtasks/tail.
– Steven Lu
Commented May 23, 2020 at 15:49
I found that Elastic Beanstalk keeps its own private file for rsyslog. If you override /opt/elasticbeanstalk/config/private/rsyslog.conf instead that with your .ebextension it'll use and copy that file. That way you can avoid rm -f and needing to restart rsyslog.service on every deploy.
– Steven Lu
Commented May 23, 2020 at 16:03
I saw the files in /opt/elasticbeanstalk/config/private but the private bit suggests that they shouldn't be messed with. I therefore preferred less invasive means.
– Michał Paluchowski
Commented May 27, 2020 at 6:40
Hi @MichałPaluchowski did you find a better way to do this? Facing a similar problem with Node.js platform and this solution seems a bit too hacky for my comfort.
– Arsalan Ahmad
Commented Jun 19, 2020 at 13:26
Also experiencing this in node, @ArsalanAhmad. Currently trying to modify the cloudformation template to see if that works. I've been on the phone with AWS support and they didn't have a solution.
– puttputt
Commented Jun 24, 2020 at 0:06

Add a comment |

puttputt · Accepted Answer · 2020-06-24 14:44:43Z

I'm going to put my solution down here, although its a bit of a different problem than op - its around the same idea and hopefully answers some of the other commenters questions about nodejs.

This is for Amazon Linux 2 running Node.js 12.x

Problem: nodejs stdout logs are mixed in with nginx logs under "web" and are badly formatted.

Previously, in Amazon Linux 1 running Nodejs, these logs are separated into /var/log/nodejs/nodejs.log and /var/log/nginx/access.log. Having them combined and prefixed with the ip address just makes them an an absolute mess.

I followed the solutions put forwarded and modified them a bit.

.ebextension config that modifies rsyslog.conf to split logs into two different files. I'm filtering on a pattern I saw in my log files, but you could use any regex that is compatible with RainerScript.

I'm not sure if theey want you to be editing this file, as another commenter has pointed out, sincei is private. If you're not comfortable with this, I would recommend writing to your own log files rather than stdout. You have more control that way.

files:
  "/opt/elasticbeanstalk/config/private/rsyslog.conf" :
    mode: "000755"
    owner: root
    group: root
    content: |
        template(name="WebTemplate" type="string" string="%msg%\n")

        if $programname  == 'web' and $msg startswith '#033[' then {
            *.=warning;*.=err;*.=crit;*.=alert;*.=emerg; /var/log/web.stderr.log
            *.=info;*.=notice /var/log/web.stderr.log;
        } else if( $programname == 'web') then /var/log/node/nodejs.log;WebTemplate

Now that I have a new log file that is more or less, just node stdout, that needs to be streamed to cloudwatch as well as included in the log bundle. These configs are documented (although not updated for Amazon Linux 2). No configs are overwritten here, they simple add new configurations .

files:
  "/opt/elasticbeanstalk/config/private/logtasks/bundle/node.conf" :
    mode: "000755"
    owner: root
    group: root
    content: |
      /var/log/node/nodejs.log


files:
  "/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/node.json" :
    mode: "000755"
    owner: root
    group: root
    content: |
        {
            "logs": {
                "logs_collected": {
                    "files": {
                        "collect_list": [
                           
                            {
                                "file_path": "/var/log/node/nodejs.log",
                                "log_group_name": "/aws/elasticbeanstalk/[environment_name]/var/log/node/nodejs.log",
                                "log_stream_name": "{instance_id}"
                            }
                        ]
                    }
                }
            }
        }

commands:
    append_and_restart:
        command: /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a append-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/etc/node.json -s

Anthony Webb · Accepted Answer · 2021-12-03 19:52:07Z

If you are using a lambda to move stuff into loggly (as per their documentation here https://documentation.solarwinds.com/en/success_center/loggly/content/admin/cloudwatch-logs.htm) you can simply modify the lambda to extract the JSON and drop off the leading string that is causing grief. Here is the lambda I am using, and it works great to clean stuff up and post my JSON logs.

Sidenote, I use NodeJS - Fastify which includes Pino under the hood for nice JSON logs. Some great insight on setting up the logs can be found here https://jaywolfe.dev/blog/setup-your-fastify-server-with-logging-the-right-way-no-more-express/

'use strict';

/*
 *  To encrypt your secrets use the following steps:
 *
 *  1. Create or use an existing KMS Key - http://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html
 *
 *  2. Expand "Encryption configuration" and click the "Enable helpers for encryption in transit" checkbox
 *
 *  3. Paste <your loggly customer token> into the kmsEncryptedCustomerToken environment variable and click "Encrypt"
 *
 *  4. Give your function's role permission for the `kms:Decrypt` action using the provided policy template
*/

const AWS = require('aws-sdk');
const http = require('http');
const zlib = require('zlib');


// loggly url, token and tag configuration
// user needs to edit environment variables when creating function via blueprint
// logglyHostName, e.g. logs-01.loggly.com
// logglyTags, e.g. CloudWatch2Loggly
const logglyConfiguration = {
    hostName: process.env.logglyHostName,
    tags: process.env.logglyTags,
};

// use KMS to decrypt customer token in kmsEncryptedCustomerToken environment variable
const decryptParams = {
    CiphertextBlob: Buffer.from(process.env.kmsEncryptedCustomerToken, 'base64'),
    EncryptionContext: { LambdaFunctionName: process.env.AWS_LAMBDA_FUNCTION_NAME },
};

const kms = new AWS.KMS({ apiVersion: '2014-11-01' });

kms.decrypt(decryptParams, (error, data) => {
    if (error) {
        logglyConfiguration.tokenInitError = error;
        console.log(error);
    } else {
        logglyConfiguration.customerToken = data.Plaintext.toString('ascii');
    }
});

// entry point
exports.handler = (event, context, callback) => {
    const payload = Buffer.from(event.awslogs.data, 'base64');

    // converts the event to a valid JSON object with the sufficient infomation required
    function parseEvent(logEvent, logGroupName, logStreamName) {
        return {
            // remove '\n' character at the end of the event
            message: extractJSON(logEvent.message.trim())[0],
            logGroupName,
            logStreamName,
            timestamp: new Date(logEvent.timestamp).toISOString(),
        };
    }

    // joins all the events to a single event
    // and sends to Loggly using bulk endpoint
    function postEventsToLoggly(parsedEvents) {
        if (!logglyConfiguration.customerToken) {
            if (logglyConfiguration.tokenInitError) {
                console.log('error in decrypt the token. Not retrying.');
                return callback(logglyConfiguration.tokenInitError);
            }
            console.log('Cannot flush logs since authentication token has not been initialized yet. Trying again in 100 ms.');
            setTimeout(() => postEventsToLoggly(parsedEvents), 100);
            return;
        }

        // get all the events, stringify them and join them
        // with the new line character which can be sent to Loggly
        // via bulk endpoint
        const finalEvent = parsedEvents.map(JSON.stringify).join('\n');

        // creating logglyURL at runtime, so that user can change the tag or customer token in the go
        // by modifying the current script
        // create request options to send logs
        try {
            const options = {
                hostname: logglyConfiguration.hostName,
                path: `/bulk/${logglyConfiguration.customerToken}/tag/${encodeURIComponent(logglyConfiguration.tags)}`,
                method: 'POST',
                headers: {
                    'Content-Type': 'application/json',
                    'Content-Length': finalEvent.length,
                },
            };

            const req = http.request(options, (res) => {
                res.on('data', (data) => {
                    const result = JSON.parse(data.toString());
                    if (result.response === 'ok') {
                        callback(null, 'all events are sent to Loggly');
                    } else {
                        console.log(result.response);
                    }
                });
                res.on('end', () => {
                    console.log('No more data in response.');
                    callback();
                });
            });

            req.on('error', (err) => {
                console.log('problem with request:', err.toString());
                callback(err);
            });

            // write data to request body
            req.write(finalEvent);
            req.end();
        } catch (ex) {
            console.log(ex.message);
            callback(ex.message);
        }
    }
    
    function extractJSON(str) {
        var firstOpen, firstClose, candidate;
        firstOpen = str.indexOf('{', firstOpen + 1);
        do {
            firstClose = str.lastIndexOf('}');
            console.log('firstOpen: ' + firstOpen, 'firstClose: ' + firstClose);
            if(firstClose <= firstOpen) {
                return null;
            }
            do {
                candidate = str.substring(firstOpen, firstClose + 1);
                console.log('candidate: ' + candidate);
                try {
                    var res = JSON.parse(candidate);
                    console.log('...found');
                    return [res, firstOpen, firstClose + 1];
                }
                catch(e) {
                    console.log('...failed');
                }
                firstClose = str.substr(0, firstClose).lastIndexOf('}');
            } while(firstClose > firstOpen);
            firstOpen = str.indexOf('{', firstOpen + 1);
        } while(firstOpen != -1);
    }

    zlib.gunzip(payload, (error, result) => {
        if (error) {
            callback(error);
        } else {
            const resultParsed = JSON.parse(result.toString('ascii'));
            const parsedEvents = resultParsed.logEvents.map((logEvent) =>
                    parseEvent(logEvent, resultParsed.logGroup, resultParsed.logStream));

            postEventsToLoggly(parsedEvents);
        }
    });
};

Stack Exchange Network

Outputting JSON logs on Elastic Beanstalk with Amazon Linux 2

4 Answers 4

Conjecture

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
amazon-web-services
elastic-beanstalk
amazon-linux-2
.

Hot Network Questions

Outputting JSON logs on Elastic Beanstalk with Amazon Linux 2

4 Answers 4

Conjecture

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged amazon-web-serviceselastic-beanstalkamazon-linux-2.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
amazon-web-services
elastic-beanstalk
amazon-linux-2
.