SlideShare a Scribd company logo
#SMX #13B @AndreasReiffen
Creative ideas to testing procedures
How to test
(& perfect)
nearly everything
#SMX #13B @AndreasReiffen
About…
• Data-driven online advertising strategist
• Online retail expert
• Entrepreneur
• Over €3 billion in customer revenues last
year
• SaaS product for Google Shopping &
Search
• 130 true experts in their field
• Offices in Germany & UK, new office in
NYC
… me … crealytics & camato
#SMX #13B @AndreasReiffen
ALL aspects of testing? At least some I hope!
2 Types of testing to take
performance to the next
level?
Testing is more than
finding the perfect ad
copy.
5 Common pit falls
Depending on the
setup and the
analysis tests can tell
very different stories
3 Methods & tools to
use for successful
testing
#SMX #13B @AndreasReiffen
Which methods to use
#SMX #13B @AndreasReiffen
1 2 3 4
Drafts and
Experiments
Scheduled A/B
tests
Before/ After tests Further tools for
testing
These are our recommended methods
#SMX #13B @AndreasReiffen
Draft & experiments is the most diverse
testing tool for almost everything
Structural
Tests that change the structure
within a campaign
• Ads
• Landing Pages
• MatchTypes
Bidding
Tests that influence bidding of some
sort
• Bids
• Modifiers
• Device
• Ad schedule
• Geo-Targeting
• Strategies
• eCPC
• Target CPA
Features
Changes within features added to a
campaign
• RLSA
• Ad Extensions
• Sitelinks
• Etc.
Drafts and Experiments allow you to test almost anything within a campaign.
Unfortunately this feature is currently not available for Shopping campaigns.
1
#SMX #13B @AndreasReiffen
Set up a draft campaign to
collaborate or begin a new test
1
#SMX #13B @AndreasReiffen
Choose the % of traffic for testing
and set a timeframe
1
#SMX #13B @AndreasReiffen
A/B test landing page with drafts & experiments
for conversion rate
1
Test not successful:The original landing pages lead to a higher Conversion Rate.
Setup: Create an Experiment, change only landing pages
Analysis: Keep track of top line performance using automatic scorecard displayed in the Experiment
campaign. Nonetheless, always take a deepdive into performance after finishing the experiment to
rule out any irregularities
#SMX #13B @AndreasReiffen
Manually scheduled A/B tests
still have some use cases
Search terms
Tests where the query composition is
important
• MatchType changes
• Negative changes
Cross campaign
Tests that have to be tested across
different campaigns
• Quality Score development in new
accounts/ campaigns
Shopping
Any of the tests you can use D&E for
inText ads
• Structure
• Bidding
• Features
What ever can‘t be achieved through D&E
Use this scheduling to avoid cannibalization while still being independent from seasonality
2
#SMX #13B @AndreasReiffen
Scheduled A/B test use campaign setting to
share hours justly between A and B
Copy and paste existing campaign and
upload two hour scheduling for both
campaigns so they run alternatingly.
2
#SMX #13B @AndreasReiffen
Setup: Duplicate campaign & set schedule to run against original campaign.
Analysis: Compare traffic & QS levels
Example A/B scheduling: how fast do quality
scores pick up after campaign transition?
8.3
Day 1-4
8.3
-8%
8.6
+4%
7.6
Day 5-30New Campaign
Original Campaign
944
Day 5-30
-32%
-3%
1,391
1,210
Day 1-4
1,252
2
Quality Scores pick up within a few days.
Traffic picks up simultaneously.
#SMX #13B @AndreasReiffen
Before / after are versatile and used for feed
components. Control group is important.
Feed changes
Changes in the feed
• Test new titles
• Test new images
Product changes
Tests that affect the product
portfolio itself
• Price changes
Things that cannot be easily changed.
Make sure to have control group that indicates seasonal or budget changes.
3
#SMX #13B @AndreasReiffen
Before/ after test measures changes of relation
between test and control
Test
Control
Before During After
3
#SMX #13B @AndreasReiffen
Before/after example: Google rewards cheaper
product prices with more impressions
100100 93
33
Test
Products
-67% -7%
Account
Level
AfterBefore
100100 100
62
Account
Level
-38% 0%
Test
Products
Impressions Clicks
3
True, price changes not only affect CTR but also have massive impact on Impression levels.
Setup: Increased prices from lowest to highest among competitors
Analysis: Compare traffic in before/ after, using account traffic as baseline.
0
50
100
150
0
20
40
60
+5%
clicksown price
Clicks
Price
Days
#SMX #13B @AndreasReiffen
Google Merchant Center experiments are a great idea,
however lack attention from Google
Google is testing feed optimisations
directly in the Merchant Center
interface.
Tests include phase 1 and phase 2 in
comparison against baseline. Not very
well documented since still in beta.
4
#SMX #13B @AndreasReiffen
Product titles A vs B:
Alternative values are proposed from additional column in feed.
Shortcoming: products to include in test & control are randomized, not the impressions
or users! Google might discontinue.
Merchant Center experiments
cover product titles and images to A/B test
4
#SMX #13B @AndreasReiffen
Online A/B tools are a great help to find out whether tests
have significant outcome.
Trials and successes
can include:
clicks; conversions
impressions; clicks
4
#SMX #13B @AndreasReiffen
Optimizing current accounts & performance
#SMX #13B @AndreasReiffen
Optimise parameters within the
Google sandbox to get better
Google KPIs
Understand what the black box
does to inform & improve
strategy
Two types of testing: Optimizing PPC/ Understanding
Google
Testing ads:
Necessary not to fall behind
Testing Google:
Move first and gain advantage
budget
revenue
Ad A
vs
Ad B
#SMX #13B @AndreasReiffen
1 2
Optimizing
existing Google
performance
Reverse
engineering
Google
There are two different types of objectives
#SMX #13B @AndreasReiffen
High Intent
Brand + Specific
Product
Low Intent
Generic +
ProductType
High
$1.00
Low
$0.50
nike mercurial superfly
soccer shoes
Bid
Campaign A
Campaign B
Hypothesis: Splitting shopping queries
„generics vs designers“ can save cost at same revenueEine
alte mit
rein
1
#SMX #13B @AndreasReiffen
Google is forced to adopt query split by campaign
priority and negatives
Generics
Designers
Designer + Product
Name
NegativesPrioritiesCampaigns
high
medium
low
Designer names
Product names
Product names
n/a
#SMX #13B @AndreasReiffen
Split vs Non-split
Duplicate products, split queries in „test“,
increase share of designers by higher
bids. Rotate by scheduling.
Test design: Rotating A/B test.
Now you could do this with draft & experiments.
Hypothesis holds
Queries with higher conversion
probability get more exposure,
overcomensating higer CPCs.
80% 75% 70%
28% 35%
Generics
Designers
20%
100% 100% 100%Original, no split
Phase 3Phase 2Phase 1
Test
Control
100%
Phase 3Phase 1 Phase 2
128% 137%
Revenue
test vs control
Cost
test vs control
98%
Phase 3Phase 1 Phase 2
103% 96%
1
#SMX #13B @AndreasReiffen
• A / B testing complex campaign setups is possible
• Keep results comparable: You should either keep cost
stable or revenue stable
• Don‘t measure the uplift of the „test“ campaign itself, only
the change in relation to „control“ to eliminate seasonality
Conclusions for testing Google hacks
1
#SMX #13B @AndreasReiffen
Hypothesis: Bidding on products is like „broad match“:
higer bids = larger share of less converting traffic
Impressions
Max CPC
Specific queries
Generic queries
2
#SMX #13B @AndreasReiffen
Test design: Increase bids on brands by 200%
Now you could do this with draft & experiments.
Chi Chi London before / after
(k imps)
Hypothesis holds
Traffic quality gets weaker like in
broads.
Surprising: you pay more for
same traffic! Overbidding on
shopping is dangerous.
4.3
2.1
0.6
1.3
bid = 1.50bid = 0.50
5.4
0.7
0.4
designer only [chi chi london]
designer + cats [chi chi dress]
generic terms [party dresses]
0.40
0.09
0.22
0.85
0.25
0.63
CPC
2
#SMX #13B @AndreasReiffen
Conclusions from reverse engineering tests
• Pure before / after tests need multiple sibling tests to
validate: we tested several brands with same results
• Look beyond your hypothesis for additional learnings:
same traffic at higher CPC was surprising
• Always segment out: queries, device, top vs other, search
partners, audience vs non-audience.
2
#SMX #13B @AndreasReiffen
Common pit falls
#SMX #13B @AndreasReiffen
1 2 3 4
Statistical
significance
Don’t
aggregate
Think
outside the
box
Know your
surrounding
5
Look out for
cannibaliza
tion
Common challenges we have encountered
#SMX #13B @AndreasReiffen
Only end testing when statistical significance is reached
1
Tipp: Use tools mentioned above to evaluate if data has relevance.
Done wrong: eCPC test run for two weeks only. Result is that eCPC does not work.
Done right: Consider that Googe algorithm needs time to learn + not enough traffic for statistical
relevance. With more data the result is that eCPC does indeed work.
eCPC
2,930
+5%
CPC
2,8001,010,246
CPC
1,032,007
eCPC
-2%
Impressions Conversions
#SMX #13B @AndreasReiffen
Don‘t analyse totals, measure changes on the actual
changed elements
2
Done wrong: Only analyzed top line data. Result is that title changes hurt performance
Done right: Total decrease caused by one term only, on average Impressions increased by 116%.
Result is that title changes work well.
100100
138146
Account Test
Before
After
Impressions
T XOB NK P SJ UIGD YC E MF Q V WRLA H
After
Before
#SMX #13B @AndreasReiffen
Don‘t limit yourself to the original question, there are
more insights to win
3
Done wrong: eCPC works, but some interesting insights slipped our attention!
Done right: Analyzing further we noticed: eCPC helped managin tablet performance (before Google
reintroduced them). This opened up a new way of optimizing device performance
+5%
Conversions
2,800
CPC
2,930
eCPCeCPCCPC
Impressions
1,010,246
-2%
1,032,007
eCPCCPC
Tablet CPO
-10%
Lower CPCs
Higher CR
Traffic shift towards Desktop
#SMX #13B @AndreasReiffen
Be aware of your surroundings! What else could
influence the test results?
4
Done wrong: Image changes sometimes work, sometimes they don‘t, result inconclusive
Done right: Looking at the test environment shows: If competition images are mixed, there‘s no
change. If competition images are uniform, there‘s an improvement. Result: You have to stand out.
+2.6%
100.0% 102.6%
+27.0%
100.0%
127.0%
Not significant Significant
Test A Test B
CTR
Test A Test B
* CTR test vs control, with original image set to 100%
#SMX #13B @AndreasReiffen
Be aware of your surroundings! What else could
influence the test results?
5
Done wrong: Measure query clicks on one single product after increasing bids.
Done right: However, the product diverted queries from other products, therefore actual increment is
much lower.
1.5
200%
bid
after
bid
before
0.5
nominal
increase
baseline
+1,581%
actual
increase
nominal
increase
cannibalised
impressions
baseline
+114%
#SMX #13B @AndreasReiffen
Take aways
#SMX #13B @AndreasReiffen
Take aways
Knack for numbers
You have to like playing with
numbers and think
analytically
More than just
numbers
Data miners and scientists
are not everything.You need
to understand the bigger
picture
Experience
For elaborate testing you
need to be a PPC pro with
experience
Loads of data
You need access to the
data warehouse yourself or
know someone who can
10100
11001
#SMX #13B @AndreasReiffen
LEARN MORE: UPCOMING @SMX EVENTS
THANK YOU!
SEE YOU AT THE NEXT #SMX

More Related Content

Andreas Reiffen - SMX London Slidedeck

  • 1. #SMX #13B @AndreasReiffen Creative ideas to testing procedures How to test (& perfect) nearly everything
  • 2. #SMX #13B @AndreasReiffen About… • Data-driven online advertising strategist • Online retail expert • Entrepreneur • Over €3 billion in customer revenues last year • SaaS product for Google Shopping & Search • 130 true experts in their field • Offices in Germany & UK, new office in NYC … me … crealytics & camato
  • 3. #SMX #13B @AndreasReiffen ALL aspects of testing? At least some I hope! 2 Types of testing to take performance to the next level? Testing is more than finding the perfect ad copy. 5 Common pit falls Depending on the setup and the analysis tests can tell very different stories 3 Methods & tools to use for successful testing
  • 5. #SMX #13B @AndreasReiffen 1 2 3 4 Drafts and Experiments Scheduled A/B tests Before/ After tests Further tools for testing These are our recommended methods
  • 6. #SMX #13B @AndreasReiffen Draft & experiments is the most diverse testing tool for almost everything Structural Tests that change the structure within a campaign • Ads • Landing Pages • MatchTypes Bidding Tests that influence bidding of some sort • Bids • Modifiers • Device • Ad schedule • Geo-Targeting • Strategies • eCPC • Target CPA Features Changes within features added to a campaign • RLSA • Ad Extensions • Sitelinks • Etc. Drafts and Experiments allow you to test almost anything within a campaign. Unfortunately this feature is currently not available for Shopping campaigns. 1
  • 7. #SMX #13B @AndreasReiffen Set up a draft campaign to collaborate or begin a new test 1
  • 8. #SMX #13B @AndreasReiffen Choose the % of traffic for testing and set a timeframe 1
  • 9. #SMX #13B @AndreasReiffen A/B test landing page with drafts & experiments for conversion rate 1 Test not successful:The original landing pages lead to a higher Conversion Rate. Setup: Create an Experiment, change only landing pages Analysis: Keep track of top line performance using automatic scorecard displayed in the Experiment campaign. Nonetheless, always take a deepdive into performance after finishing the experiment to rule out any irregularities
  • 10. #SMX #13B @AndreasReiffen Manually scheduled A/B tests still have some use cases Search terms Tests where the query composition is important • MatchType changes • Negative changes Cross campaign Tests that have to be tested across different campaigns • Quality Score development in new accounts/ campaigns Shopping Any of the tests you can use D&E for inText ads • Structure • Bidding • Features What ever can‘t be achieved through D&E Use this scheduling to avoid cannibalization while still being independent from seasonality 2
  • 11. #SMX #13B @AndreasReiffen Scheduled A/B test use campaign setting to share hours justly between A and B Copy and paste existing campaign and upload two hour scheduling for both campaigns so they run alternatingly. 2
  • 12. #SMX #13B @AndreasReiffen Setup: Duplicate campaign & set schedule to run against original campaign. Analysis: Compare traffic & QS levels Example A/B scheduling: how fast do quality scores pick up after campaign transition? 8.3 Day 1-4 8.3 -8% 8.6 +4% 7.6 Day 5-30New Campaign Original Campaign 944 Day 5-30 -32% -3% 1,391 1,210 Day 1-4 1,252 2 Quality Scores pick up within a few days. Traffic picks up simultaneously.
  • 13. #SMX #13B @AndreasReiffen Before / after are versatile and used for feed components. Control group is important. Feed changes Changes in the feed • Test new titles • Test new images Product changes Tests that affect the product portfolio itself • Price changes Things that cannot be easily changed. Make sure to have control group that indicates seasonal or budget changes. 3
  • 14. #SMX #13B @AndreasReiffen Before/ after test measures changes of relation between test and control Test Control Before During After 3
  • 15. #SMX #13B @AndreasReiffen Before/after example: Google rewards cheaper product prices with more impressions 100100 93 33 Test Products -67% -7% Account Level AfterBefore 100100 100 62 Account Level -38% 0% Test Products Impressions Clicks 3 True, price changes not only affect CTR but also have massive impact on Impression levels. Setup: Increased prices from lowest to highest among competitors Analysis: Compare traffic in before/ after, using account traffic as baseline. 0 50 100 150 0 20 40 60 +5% clicksown price Clicks Price Days
  • 16. #SMX #13B @AndreasReiffen Google Merchant Center experiments are a great idea, however lack attention from Google Google is testing feed optimisations directly in the Merchant Center interface. Tests include phase 1 and phase 2 in comparison against baseline. Not very well documented since still in beta. 4
  • 17. #SMX #13B @AndreasReiffen Product titles A vs B: Alternative values are proposed from additional column in feed. Shortcoming: products to include in test & control are randomized, not the impressions or users! Google might discontinue. Merchant Center experiments cover product titles and images to A/B test 4
  • 18. #SMX #13B @AndreasReiffen Online A/B tools are a great help to find out whether tests have significant outcome. Trials and successes can include: clicks; conversions impressions; clicks 4
  • 19. #SMX #13B @AndreasReiffen Optimizing current accounts & performance
  • 20. #SMX #13B @AndreasReiffen Optimise parameters within the Google sandbox to get better Google KPIs Understand what the black box does to inform & improve strategy Two types of testing: Optimizing PPC/ Understanding Google Testing ads: Necessary not to fall behind Testing Google: Move first and gain advantage budget revenue Ad A vs Ad B
  • 21. #SMX #13B @AndreasReiffen 1 2 Optimizing existing Google performance Reverse engineering Google There are two different types of objectives
  • 22. #SMX #13B @AndreasReiffen High Intent Brand + Specific Product Low Intent Generic + ProductType High $1.00 Low $0.50 nike mercurial superfly soccer shoes Bid Campaign A Campaign B Hypothesis: Splitting shopping queries „generics vs designers“ can save cost at same revenueEine alte mit rein 1
  • 23. #SMX #13B @AndreasReiffen Google is forced to adopt query split by campaign priority and negatives Generics Designers Designer + Product Name NegativesPrioritiesCampaigns high medium low Designer names Product names Product names n/a
  • 24. #SMX #13B @AndreasReiffen Split vs Non-split Duplicate products, split queries in „test“, increase share of designers by higher bids. Rotate by scheduling. Test design: Rotating A/B test. Now you could do this with draft & experiments. Hypothesis holds Queries with higher conversion probability get more exposure, overcomensating higer CPCs. 80% 75% 70% 28% 35% Generics Designers 20% 100% 100% 100%Original, no split Phase 3Phase 2Phase 1 Test Control 100% Phase 3Phase 1 Phase 2 128% 137% Revenue test vs control Cost test vs control 98% Phase 3Phase 1 Phase 2 103% 96% 1
  • 25. #SMX #13B @AndreasReiffen • A / B testing complex campaign setups is possible • Keep results comparable: You should either keep cost stable or revenue stable • Don‘t measure the uplift of the „test“ campaign itself, only the change in relation to „control“ to eliminate seasonality Conclusions for testing Google hacks 1
  • 26. #SMX #13B @AndreasReiffen Hypothesis: Bidding on products is like „broad match“: higer bids = larger share of less converting traffic Impressions Max CPC Specific queries Generic queries 2
  • 27. #SMX #13B @AndreasReiffen Test design: Increase bids on brands by 200% Now you could do this with draft & experiments. Chi Chi London before / after (k imps) Hypothesis holds Traffic quality gets weaker like in broads. Surprising: you pay more for same traffic! Overbidding on shopping is dangerous. 4.3 2.1 0.6 1.3 bid = 1.50bid = 0.50 5.4 0.7 0.4 designer only [chi chi london] designer + cats [chi chi dress] generic terms [party dresses] 0.40 0.09 0.22 0.85 0.25 0.63 CPC 2
  • 28. #SMX #13B @AndreasReiffen Conclusions from reverse engineering tests • Pure before / after tests need multiple sibling tests to validate: we tested several brands with same results • Look beyond your hypothesis for additional learnings: same traffic at higher CPC was surprising • Always segment out: queries, device, top vs other, search partners, audience vs non-audience. 2
  • 30. #SMX #13B @AndreasReiffen 1 2 3 4 Statistical significance Don’t aggregate Think outside the box Know your surrounding 5 Look out for cannibaliza tion Common challenges we have encountered
  • 31. #SMX #13B @AndreasReiffen Only end testing when statistical significance is reached 1 Tipp: Use tools mentioned above to evaluate if data has relevance. Done wrong: eCPC test run for two weeks only. Result is that eCPC does not work. Done right: Consider that Googe algorithm needs time to learn + not enough traffic for statistical relevance. With more data the result is that eCPC does indeed work. eCPC 2,930 +5% CPC 2,8001,010,246 CPC 1,032,007 eCPC -2% Impressions Conversions
  • 32. #SMX #13B @AndreasReiffen Don‘t analyse totals, measure changes on the actual changed elements 2 Done wrong: Only analyzed top line data. Result is that title changes hurt performance Done right: Total decrease caused by one term only, on average Impressions increased by 116%. Result is that title changes work well. 100100 138146 Account Test Before After Impressions T XOB NK P SJ UIGD YC E MF Q V WRLA H After Before
  • 33. #SMX #13B @AndreasReiffen Don‘t limit yourself to the original question, there are more insights to win 3 Done wrong: eCPC works, but some interesting insights slipped our attention! Done right: Analyzing further we noticed: eCPC helped managin tablet performance (before Google reintroduced them). This opened up a new way of optimizing device performance +5% Conversions 2,800 CPC 2,930 eCPCeCPCCPC Impressions 1,010,246 -2% 1,032,007 eCPCCPC Tablet CPO -10% Lower CPCs Higher CR Traffic shift towards Desktop
  • 34. #SMX #13B @AndreasReiffen Be aware of your surroundings! What else could influence the test results? 4 Done wrong: Image changes sometimes work, sometimes they don‘t, result inconclusive Done right: Looking at the test environment shows: If competition images are mixed, there‘s no change. If competition images are uniform, there‘s an improvement. Result: You have to stand out. +2.6% 100.0% 102.6% +27.0% 100.0% 127.0% Not significant Significant Test A Test B CTR Test A Test B * CTR test vs control, with original image set to 100%
  • 35. #SMX #13B @AndreasReiffen Be aware of your surroundings! What else could influence the test results? 5 Done wrong: Measure query clicks on one single product after increasing bids. Done right: However, the product diverted queries from other products, therefore actual increment is much lower. 1.5 200% bid after bid before 0.5 nominal increase baseline +1,581% actual increase nominal increase cannibalised impressions baseline +114%
  • 37. #SMX #13B @AndreasReiffen Take aways Knack for numbers You have to like playing with numbers and think analytically More than just numbers Data miners and scientists are not everything.You need to understand the bigger picture Experience For elaborate testing you need to be a PPC pro with experience Loads of data You need access to the data warehouse yourself or know someone who can 10100 11001
  • 38. #SMX #13B @AndreasReiffen LEARN MORE: UPCOMING @SMX EVENTS THANK YOU! SEE YOU AT THE NEXT #SMX