SlideShare a Scribd company logo
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
@ItaiYaffe
●
@ItaiYaffe
●
●
@ItaiYaffe
●
●
●
@ItaiYaffe
●
●
●
@ItaiYaffe
PRODUCT PAGE
10M
CHECKOUT
3M
HOMEPAGE
15M
7M
Drop-off
5M
Drop-off
AD EXPOSURE
100M
85M
Drop-off
@ItaiYaffe
PRODUCT PAGE
10M
CHECKOUT
3M
HOMEPAGE
15M
7M
Drop-off
5M
Drop-off
AD EXPOSURE
100M
85M
Drop-off
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
@ItaiYaffe
●
●
●
●
@ItaiYaffe
●
●
●
●
●
@ItaiYaffe
>10B events/day >20TB/day
S3
1000’s nodes/day 10’s of TB
ingested/day
druid
$100K’s/month
@ItaiYaffe
●
○
○
●
○
○
●
●
@ItaiYaffe
Awareness
Exposed to
campaign (e.g
via online ad)
Consideration
Interest is
expressed (e.g
clicked ad)
Intent
Steps taken towards
making a purchase (e.g
added product to cart)
Purchase
@ItaiYaffe
Awareness
Exposed to
campaign (e.g
via online ad)
Consideration
Interest is
expressed (e.g
clicked ad)
Intent
Steps taken towards
making a purchase (e.g
added product to cart)
Purchase
Tactic Stages
@ItaiYaffe
Awareness Consideration Intent Purchase
Drop-
off
Drop-
off
Drop-
off
@ItaiYaffe
PRODUCT PAGE
10M UUs
CHECKOUT
3M UUs
HOMEPAGE
15M UUs
7M
Drop-off
5M
Drop-off
AD EXPOSURE
100M UUs
85M
Drop-off
* UUs = Unique Users
@ItaiYaffe
PRODUCT PAGE
10M UUs
CHECKOUT
3M UUs
HOMEPAGE
15M UUs
7M
Drop-off
5M
Drop-off
AD EXPOSURE
100M UUs
85M
Drop-off
* UUs = Unique Users
@ItaiYaffe
2 Unique Users
7 Views
2 Purchases $$$ $$$
@ItaiYaffe
PRODUCT PAGE
10M
CHECKOUT
3M
HOMEPAGE
15M
7M
Drop-off
5M
Drop-off
AD EXPOSURE
100M
85M
Drop-off
@ItaiYaffe
●
●
○
●
@ItaiYaffe
…
@ItaiYaffe
@ItaiYaffe
●
●
●
●
●
○
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
●
○
●
●
●
○
●
●
@ItaiYaffe
●
○
●
●
●
○
○
○
@ItaiYaffe
●
○
●
●
●
○
○
○
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
●
●
●
●
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
●
●
○
●
○
○
●
@ItaiYaffe
…
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
{event_time=2020-01-28T..., userid=uid1, attribute=online_ad}
{event_time=2020-01-28T..., userid=uid1, attribute=homepage}
{event_time=2020-01-28T..., userid=uid1, attribute=productX_page}
....
@ItaiYaffe
{event_time=2020-01-28T... , userid=uid1, attribute=online_ad, type=Tactic}
{event_time=2020-01-28T... , userid=uid1, attribute=homepage, type=Stage}
{event_time=2020-01-28T... , userid=uid1, attribute=productX_page , type=Stage}
....
@ItaiYaffe
{event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage}
{event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page }
....
....
@ItaiYaffe
"type": "index_hadoop",
"spec": {
"dataSchema": {
"dataSource": "campaign_1472",
"granularitySpec": {
"queryGranularity": "day",
"segmentGranularity": "day",
"type": "uniform",
"intervals": ["2020-01-01/2020-01-29"]
...
@ItaiYaffe
"timestampSpec": {
"column": "event_date", "format": "yyyy-MM-dd"
},
"dimensionsSpec": {
"dimensions": ["tactic", "stage"]
},
"metricsSpec": [{
"fieldName": "userid", "type": "thetaSketch",
"name": "user_id_sketch", "size": 65536}],
...
@ItaiYaffe
"inputSpec": {"type": " multi",
"children": [
{"type": " dataSource",
"ingestionSpec": {
"intervals": ["2020-01-01/2020-01-29"],
"dataSource": "campaign_1472", ...}},
{"type": " static",
"Paths": "s3://<BUCKET_NAME>/date=2020-01-28/campaign=1472",
...},
...
@ItaiYaffe
{__time=2020-01-28, tactic=online_ad, stage=homepage, user_id_sketch=<Object>}
{__time=2020-01-28, tactic=online_ad, stage=productX_page , user_id_sketch=<Object>}
....
....
@ItaiYaffe
SELECT
APPROX_COUNT_DISTINCT_DS_THETA(user_id_sketch,65536)
as homepage_sketch
FROM campaign_1472
WHERE (("tactic" = 'online_ad')
AND ("stage" = 'homepage'))
AND __time BETWEEN '2020-01-01T00:00:00.000'
AND '2020-01-29T23:59:59.000'
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
PRODUCT PAGE
1K UUs
...
HOMEPAGE
3.1K UUs
2.5K
Drop-off
ONLINE AD
8.1M UUs
* UUs = Unique Users
@ItaiYaffe
PRODUCT PAGE
1K UUs
...
HOMEPAGE
3.1K UUs
2.5K
Drop-off
ONLINE AD
8.1M UUs
* UUs = Unique Users
@ItaiYaffe
@ItaiYaffe
●
○
● …
○
●
@ItaiYaffe
@ItaiYaffe
{event_time=2020-01-28T09:15, userid=uid1, attribute=productX_page}
{event_time=2020-01-28T10:10, userid=uid1, attribute=online_ad}
{event_time=2020-01-28T10:11, userid=uid1, attribute=homepage}
....
@ItaiYaffe
{event_time=2020-01-28T09:15 , userid=uid1, attribute=productX_page , type=Stage}
{event_time=2020-01-28T10:10 , userid=uid1, attribute=online_ad, type=Tactic}
{event_time=2020-01-28T10:11 , userid=uid1, attribute=homepage, type=Stage}
....
@ItaiYaffe
{event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page }
{event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage}
....
....
@ItaiYaffe
{event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=productX_page }
{event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage}
....
....
@ItaiYaffe
{event_date=2020-01-28, userid=uid1, tactic=online_ad, stage=homepage}
....
....
@ItaiYaffe
SELECT THETA_SKETCH_NOT(65536,
THETA_SKETCH_INTERSECT(65536,a,b), THETA_SKETCH_UNION(65536,c,d,e)
) as online_ad_596
FROM (
SELECT
DS_THETA("user_id_sketch") FILTER (WHERE stage = 'homepage') as a,
DS_THETA("user_id_sketch") FILTER (WHERE tactic = 'online_ad') as b,
DS_THETA("user_id_sketch") FILTER (WHERE stage = 'productX_page') as c,
DS_THETA("user_id_sketch") FILTER (WHERE stage = 'add_to_cart') as d,
DS_THETA("user_id_sketch") FILTER (WHERE stage = 'checkout') as e
FROM campaign_1472
WHERE stage in ('homepage','productX_page','checkout','add_to_cart')
AND tactic = 'online_ad') subquery
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
@ItaiYaffe
PRODUCT PAGE
0.6K UUs
...
HOMEPAGE
3.1K UUs
2.5K
Drop-off
ONLINE AD
8.1M UUs
* UUs = Unique Users
@ItaiYaffe
PRODUCT PAGE
0.6K UUs
...
HOMEPAGE
3.1K UUs
2.5K
Drop-off
ONLINE AD
8.1M UUs
* UUs = Unique Users
@ItaiYaffe
●
○
●
○
○
●
○
○
@ItaiYaffe
●
○
○
@ItaiYaffe
●
○
○
●
○
○
@ItaiYaffe
●
○
○
●
○
○
●
○
○
@ItaiYaffe
●
○
■
■
○
○
●
○
●
○
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid
DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid

More Related Content

DevTalks Reimagined 2020 - Funnel Analysis with Spark and Druid