Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: provide skywalking logger plugin #5478

Merged
merged 13 commits into from
Nov 12, 2021

Conversation

dmsolr
Copy link
Member

@dmsolr dmsolr commented Nov 10, 2021

What this PR does / why we need it:

Pre-submission checklist:

  • Did you explain what problem does this PR solve? Or what new features have been added?
  • Have you added corresponding test cases?
  • Have you modified the corresponding document?
  • Is this PR backward compatible? If it is not backward compatible, please discuss on the mailing list first

provide a new logger plugin to support log access log to SkyWalking in SkyWalking Log Format. If skywalking trace plugin is enabled or requests with SkyWalking Trace context header, this plugin will parse the trace context header and print trace context in log. So that the access log can relate to trace.

trace_context = {
traceId = base64.decode_base64url(ids[2]),
traceSegment = base64.decode_base64url(ids[3]),
spanId = tonumber(ids[4])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The span id is missing in the comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I update the comment. Currently, the 4th part, 3 means span id.

4. Parent span ID. Must be an integer. It begins with 0. This span ID points to the parent span in parent trace segment.

Ref SkyWalking Cross Process Propagation Headers Protocol

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if you also add the link to the skywalking's doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

local ids = ngx_re.split(headers["sw8"], '-')

trace_context = {
traceId = base64.decode_base64url(ids[2]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check the length of ids and the result of decode_base64url

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

apisix/plugins/skywalking-logger.lua Outdated Show resolved Hide resolved
docs/en/latest/plugins/skywalking-logger.md Outdated Show resolved Hide resolved
end

if not log_body.route_id then
log_body.route_id = "no-matched"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the no-matched necessary for the skywalking? If not, we can leave it empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

dmsolr and others added 2 commits November 11, 2021 11:57
Co-authored-by: 罗泽轩 <spacewanderlzx@gmail.com>
@moonming
Copy link
Member

@wu-sheng please take a look, and welcome feedback:)

local headers = ngx.req.get_headers()
if headers then
-- 1-TRACEID-SEGMENTID-3-PARENT_SERVICE-PARENT_INSTANCE-PARENT_ENDPOINT-IPPORT
local ids = ngx_re.split(headers["sw8"], '-')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to do this? SkyWalking has tracer for APISIX, which has been integrated. Why don't you try to read it from the context?

Decoding the header is fine from tech perspective, but it could be a risk when one day, Lua agent supports namespace concept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And all following decode(twice for trace id and span id) and splitting cost unnecessary resources.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am considering about it still works when tracer plugin is not enabling.
If we give the version matching matric, it looks not very dangerous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call, SkyWalking wouldn't change v3 format for a long time(already years, expect more) AFAIK. namespace would be an issue, but not very serious, just duplicate configurations.
I just think resource costing is very sensitive. And this is not that risk, as when the incompatible issue happens, it happens at the first time both plugin activated. So anyone will notice this immediately, they could do fix or disable one/both quickly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.
But now, skywalking-nginx-lua does not expose trace context. I would like to get merged this pr first. And keep going on polishing.
We need to enhancement skywalking-nginx-lua and release a new version. And then, I back to polishing this feature.

Copy link
Member

@wu-sheng wu-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General review from design perspective(nothing related to code details).

local headers = ngx.req.get_headers()
if headers then
-- 1-TRACEID-SEGMENTID-3-PARENT_SERVICE-PARENT_INSTANCE-PARENT_ENDPOINT-IPPORT
local ids = ngx_re.split(headers["sw8"], '-')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And all following decode(twice for trace id and span id) and splitting cost unnecessary resources.

}

if not stale_timer_running then
-- run the timer every 30 mins if any log is present
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every 30 mins? For?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a huge latency in low traffic scenario. No one is willing to wait for 30 mins for log collecting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it sweeps the stale log data in every 30m.
When the data transport fails, the data will remain in memory. Therefore, regular cleaning is required.

Copy link
Member

@wu-sheng wu-sheng Nov 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not very general way to process logs. Usually, unless you have local cache or mmap(like SkyWalking Satellite), logs should be abandoned quickly once retry threshold reached.
The reason behind this logic is, observability presents a large data set but isn't always very useful, especially for logs. Metrics are small, and tolerable to keep latest values, but different from logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me. @spacewander how do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err. @dmsolr's explain is incorrect. This timer doesn't clean up the stale log. It clean up the stale buffers when they are no longer holding the log.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is about clean up the stale buffers only, than, this is APISIX's call. Nothing relates to SkyWalking or observability preference.

-- 1-TRACEID-SEGMENTID-SPANID-PARENT_SERVICE-PARENT_INSTANCE-PARENT_ENDPOINT-IPPORT
local ids = ngx_re.split(sw_header, '-')
if #ids == 8 then
local trace_id, err = base64.decode_base64url(ids[2])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode_base64url only returns str or nil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if so, we don't need to check the result of decode_base64url, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am correct, we should check if the trace_id is nil or not? The trace_id is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trace_context is optional. If trace_id is missing, it cannot relate to trace. But missing trace_id log can be found in log view.

Refer https://github.com/apache/skywalking-data-collect-protocol/blob/master/logging/Logging.proto#L99-L107

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, I prefer to collect the missing trace_context log.

}
_EOC_

$block->set_value("http_config", $http_config);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New tests should contain a common section, like

if (!$block->request) {

There is no need to repeat them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -0,0 +1,117 @@
---
title: http-logger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right title?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@wu-sheng wu-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to me. Thanks.

@spacewander spacewander merged commit 5f6afdb into apache:master Nov 12, 2021
--- response_body
opentracing
--- error_log eval
qr/.*\{\\\"json\\\":\\\"\{(\\\\\\\"\@timestamp\\\\\\\":\\\\\\\".*\\\\\\\"|\\\\\\\"client_ip\\\\\\\":\\\\\\\"127\.0\.0\.1\\\\\\\"|\\\\\\\"host\\\\\\\":\\\\\\\"localhost\\\\\\\"|\\\\\\\"route_id\\\\\\\":\\\\\\\"1\\\\\\\"|,){7}\}/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not easy to read, can there be a more readable way? @dmsolr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is raw skywalking log format.
Seem like have some approach to simplify it. I will have a try later.


=== TEST 7: test wrong trace context header
--- request
GET /opentracing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I suddenly noticed this, why /opentracing? Rather than /request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants