Andreas Reiffen - SMX London Slidedeck

#SMX #13B @AndreasReiffen
Creative ideas to testing procedures
How to test
(& perfect)
nearly everything

About…
• Data-driven online advertising strategist
• Online retail expert
• Entrepreneur
• Over €3 billion in customer revenues last
year
• SaaS product for Google Shopping &
Search
• 130 true experts in their field
• Offices in Germany & UK, new office in
NYC
… me … crealytics & camato

ALL aspects of testing? At least some I hope!
2 Types of testing to take
performance to the next
level?
Testing is more than
finding the perfect ad
copy.
5 Common pit falls
Depending on the
setup and the
analysis tests can tell
very different stories
3 Methods & tools to
use for successful
testing

Which methods to use

1 2 3 4
Drafts and
Experiments
Scheduled A/B
tests
Before/ After tests Further tools for
testing
These are our recommended methods

Draft & experiments is the most diverse
testing tool for almost everything
Structural
Tests that change the structure
within a campaign
• Ads
• Landing Pages
• MatchTypes
Bidding
Tests that influence bidding of some
sort
• Bids
• Modifiers
• Device
• Ad schedule
• Geo-Targeting
• Strategies
• eCPC
• Target CPA
Features
Changes within features added to a
campaign
• RLSA
• Ad Extensions
• Sitelinks
• Etc.
Drafts and Experiments allow you to test almost anything within a campaign.
Unfortunately this feature is currently not available for Shopping campaigns.
1

Set up a draft campaign to
collaborate or begin a new test
1

Choose the % of traffic for testing
and set a timeframe
1

A/B test landing page with drafts & experiments
for conversion rate
1
Test not successful:The original landing pages lead to a higher Conversion Rate.
Setup: Create an Experiment, change only landing pages
Analysis: Keep track of top line performance using automatic scorecard displayed in the Experiment
campaign. Nonetheless, always take a deepdive into performance after finishing the experiment to
rule out any irregularities

Manually scheduled A/B tests
still have some use cases
Search terms
Tests where the query composition is
important
• MatchType changes
• Negative changes
Cross campaign
Tests that have to be tested across
different campaigns
• Quality Score development in new
accounts/ campaigns
Shopping
Any of the tests you can use D&E for
inText ads
• Structure
• Bidding
• Features
What ever can‘t be achieved through D&E
Use this scheduling to avoid cannibalization while still being independent from seasonality
2

Scheduled A/B test use campaign setting to
share hours justly between A and B
Copy and paste existing campaign and
upload two hour scheduling for both
campaigns so they run alternatingly.
2

Setup: Duplicate campaign & set schedule to run against original campaign.
Analysis: Compare traffic & QS levels
Example A/B scheduling: how fast do quality
scores pick up after campaign transition?
8.3
Day 1-4
8.3
-8%
8.6
+4%
7.6
Day 5-30New Campaign
Original Campaign
944
Day 5-30
-32%
-3%
1,391
1,210
Day 1-4
1,252
2
Quality Scores pick up within a few days.
Traffic picks up simultaneously.

Before / after are versatile and used for feed
components. Control group is important.
Feed changes
Changes in the feed
• Test new titles
• Test new images
Product changes
Tests that affect the product
portfolio itself
• Price changes
Things that cannot be easily changed.
Make sure to have control group that indicates seasonal or budget changes.
3

Before/ after test measures changes of relation
between test and control
Test
Control
Before During After
3

Before/after example: Google rewards cheaper
product prices with more impressions
100100 93
33
Test
Products
-67% -7%
Account
Level
AfterBefore
100100 100
62
Account
Level
-38% 0%
Test
Products
Impressions Clicks
3
True, price changes not only affect CTR but also have massive impact on Impression levels.
Setup: Increased prices from lowest to highest among competitors
Analysis: Compare traffic in before/ after, using account traffic as baseline.
0
50
100
150
0
20
40
60
+5%
clicksown price
Clicks
Price
Days

Google Merchant Center experiments are a great idea,
however lack attention from Google
Google is testing feed optimisations
directly in the Merchant Center
interface.
Tests include phase 1 and phase 2 in
comparison against baseline. Not very
well documented since still in beta.
4

Product titles A vs B:
Alternative values are proposed from additional column in feed.
Shortcoming: products to include in test & control are randomized, not the impressions
or users! Google might discontinue.
Merchant Center experiments
cover product titles and images to A/B test
4

Online A/B tools are a great help to find out whether tests
have significant outcome.
Trials and successes
can include:
clicks; conversions
impressions; clicks
4

Optimizing current accounts & performance

Optimise parameters within the
Google sandbox to get better
Google KPIs
Understand what the black box
does to inform & improve
strategy
Two types of testing: Optimizing PPC/ Understanding
Google
Testing ads:
Necessary not to fall behind
Testing Google:
Move first and gain advantage
budget
revenue
Ad A
vs
Ad B

1 2
Optimizing
existing Google
performance
Reverse
engineering
Google
There are two different types of objectives

High Intent
Brand + Specific
Product
Low Intent
Generic +
ProductType
High
$1.00
Low
$0.50
nike mercurial superfly
soccer shoes
Bid
Campaign A
Campaign B
Hypothesis: Splitting shopping queries
„generics vs designers“ can save cost at same revenueEine
alte mit
rein
1

Google is forced to adopt query split by campaign
priority and negatives
Generics
Designers
Designer + Product
Name
NegativesPrioritiesCampaigns
high
medium
low
Designer names
Product names
Product names
n/a

Split vs Non-split
Duplicate products, split queries in „test“,
increase share of designers by higher
bids. Rotate by scheduling.
Test design: Rotating A/B test.
Now you could do this with draft & experiments.
Hypothesis holds
Queries with higher conversion
probability get more exposure,
overcomensating higer CPCs.
80% 75% 70%
28% 35%
Generics
Designers
20%
100% 100% 100%Original, no split
Phase 3Phase 2Phase 1
Test
Control
100%
Phase 3Phase 1 Phase 2
128% 137%
Revenue
test vs control
Cost
test vs control
98%
Phase 3Phase 1 Phase 2
103% 96%
1

• A / B testing complex campaign setups is possible
• Keep results comparable: You should either keep cost
stable or revenue stable
• Don‘t measure the uplift of the „test“ campaign itself, only
the change in relation to „control“ to eliminate seasonality
Conclusions for testing Google hacks
1

Hypothesis: Bidding on products is like „broad match“:
higer bids = larger share of less converting traffic
Impressions
Max CPC
Specific queries
Generic queries
2

Test design: Increase bids on brands by 200%
Now you could do this with draft & experiments.
Chi Chi London before / after
(k imps)
Hypothesis holds
Traffic quality gets weaker like in
broads.
Surprising: you pay more for
same traffic! Overbidding on
shopping is dangerous.
4.3
2.1
0.6
1.3
bid = 1.50bid = 0.50
5.4
0.7
0.4
designer only [chi chi london]
designer + cats [chi chi dress]
generic terms [party dresses]
0.40
0.09
0.22
0.85
0.25
0.63
CPC
2

Conclusions from reverse engineering tests
• Pure before / after tests need multiple sibling tests to
validate: we tested several brands with same results
• Look beyond your hypothesis for additional learnings:
same traffic at higher CPC was surprising
• Always segment out: queries, device, top vs other, search
partners, audience vs non-audience.
2

Common pit falls

1 2 3 4
Statistical
significance
Don’t
aggregate
Think
outside the
box
Know your
surrounding
5
Look out for
cannibaliza
tion
Common challenges we have encountered

Only end testing when statistical significance is reached
1
Tipp: Use tools mentioned above to evaluate if data has relevance.
Done wrong: eCPC test run for two weeks only. Result is that eCPC does not work.
Done right: Consider that Googe algorithm needs time to learn + not enough traffic for statistical
relevance. With more data the result is that eCPC does indeed work.
eCPC
2,930
+5%
CPC
2,8001,010,246
CPC
1,032,007
eCPC
-2%
Impressions Conversions

Don‘t analyse totals, measure changes on the actual
changed elements
2
Done wrong: Only analyzed top line data. Result is that title changes hurt performance
Done right: Total decrease caused by one term only, on average Impressions increased by 116%.
Result is that title changes work well.
100100
138146
Account Test
Before
After
Impressions
T XOB NK P SJ UIGD YC E MF Q V WRLA H
After
Before

Don‘t limit yourself to the original question, there are
more insights to win
3
Done wrong: eCPC works, but some interesting insights slipped our attention!
Done right: Analyzing further we noticed: eCPC helped managin tablet performance (before Google
reintroduced them). This opened up a new way of optimizing device performance
+5%
Conversions
2,800
CPC
2,930
eCPCeCPCCPC
Impressions
1,010,246
-2%
1,032,007
eCPCCPC
Tablet CPO
-10%
Lower CPCs
Higher CR
Traffic shift towards Desktop

Be aware of your surroundings! What else could
influence the test results?
4
Done wrong: Image changes sometimes work, sometimes they don‘t, result inconclusive
Done right: Looking at the test environment shows: If competition images are mixed, there‘s no
change. If competition images are uniform, there‘s an improvement. Result: You have to stand out.
+2.6%
100.0% 102.6%
+27.0%
100.0%
127.0%
Not significant Significant
Test A Test B
CTR
Test A Test B
* CTR test vs control, with original image set to 100%

Be aware of your surroundings! What else could
influence the test results?
5
Done wrong: Measure query clicks on one single product after increasing bids.
Done right: However, the product diverted queries from other products, therefore actual increment is
much lower.
1.5
200%
bid
after
bid
before
0.5
nominal
increase
baseline
+1,581%
actual
increase
nominal
increase
cannibalised
impressions
baseline
+114%

Take aways

Take aways
Knack for numbers
You have to like playing with
numbers and think
analytically
More than just
numbers
Data miners and scientists
are not everything.You need
to understand the bigger
picture
Experience
For elaborate testing you
need to be a PPC pro with
experience
Loads of data
You need access to the
data warehouse yourself or
know someone who can
10100
11001

LEARN MORE: UPCOMING @SMX EVENTS
THANK YOU!
SEE YOU AT THE NEXT #SMX

Andreas Reiffen - SMX London Slidedeck

Related slideshows

More Related Content

Andreas Reiffen - SMX London Slidedeck