PayPal's Fraud Detection with Deep Learning in H2O World 2014

Fraud Prevention Using Deep Learning
Venkatesh Ramanathan
H2O World 2014
November 19, 2014

Outline(
About(PayPal(
Fraud(Preven3on(@(PayPal(
Fraud(Preven3on(Dilemma(&(Solu3on((Deep(Learning)(
Experimental(Setup(
Results(
Conclusions(

About PayPal
Unmatched CompetitiveAdvantage
+150M Active Digital
Wallets
Deep Relationships Core Competency
In Risk
Global Platform with
Huge Momentum
4321
143M
2013
2012
123M

PAYMENT CODE WEARABLE TECH
QR scanning that generates
a payment code for easy
check out
Fully able to integrate with
existing POS systems; no
rip & replace
Available in select markets
today
Payments on any type
of mobile device
Available in select
markets today
About PayPal
Innovative leader in payment…

StateDofDthe(art(feature(engineering,(
machine(learning(and(sta3s3cal(
models(
Highly(scalable(and(mul3Dlayered(
infrastructure(soIware((
Superior(team(of(data(scien3sts,(
researchers,(ﬁnancial(and(
intelligence(analysts(

• Employs(stateDofDthe(art(machine(learning(and(
sta3s3cal(models(to(ﬂag(fraudulent(behavior(upDfront(
• More(sophis3cated(algorithms(aIer(transac3on(is(
complete(
Transac3on(
Level(
• Monitor(account(level(ac3vity(to(iden3fy(abusive(
behavior(
• Abusive(paPern(include(frequent(payments,(suspicious(
proﬁle(changes(
Account(Level(
• Monitor(accountDtoDaccount(interac3on(
• Frequent(transfer(of(money(from(several(accounts(to(
one(central(account((
Network(Level(

Fraud(Preven3on(Dilemma(
Fraudsters(are(becoming(increasingly(smarter(and(
adap3ve(
Need(costDeﬀec3ve(solu3ons(that(can(model(
complex(aPack(paPerns(not(previously(observed(((
Need(scalable(and(computa3onally(eﬃcient(
predic3on(models(

Fraud(Preven3on(Dilemma(
Solu3on:(Deep(Learning(
• Helps(to(unearth(lowDlevel(complex(abstrac3ons(
• Helps(to(learn(complex(highly(varying(func3ons(not(
present(in(the(training(examples(
• Widely(employed(for(image,(video(processing(and(object(
recogni3on(
Why(Deep(
Learning?(
• Highly(scalable(
• Superior(performance(
• Flexible(deployment(
• Work(seamlessly(with(other(big(data(frameworks(
• Simple(interface(
Why(H2O?(

Experiment(
•  Dataset(
–  160(million(records(
–  1500(features((150(categorical)(
–  0.6TB(compressed(in(HDFS(
•  Infrastructure(
–  800(node(Hadoop((CDH3)(cluster(
•  Decision(
–  fraud/notDfraud(

Experiment(
R(
H2O(
Mapper(
HDFS( HDFS(
•  Setup(
–  800(node(Hadoop(
(CDH3)(cluster(
–  R(as(a(client(
H2O(
Mapper(
•  H2O(cloud(forma3on(
failed(
–  H2O(mapper(needs(
memory(upfront(
–  Cluster(capacity(
limita3ons(

Experiment(
R(
H2O(
Cloud(
HDFS( HDFS(
•  Setup(
(CDH3)(cluster(
–  5(node(H2O(cloud((24(
CPUs;(144GB(RAM)(
H2O(
Cloud(
•  Import(failed(
–  Data(snappy(
compressed(

Experiment(
R(
H2O(
Cloud(
HDFS( HDFS(
•  Setup(
(CDH3)(cluster(
CPUs;(144GB(RAM)(
–  GZIP’ed(data(
H2O(
Cloud(
•  Import(too(slow(
–  1GB/hour(
–  Not(parallelized(

Experiment(
R(
H2O(
Cloud(
HDFS( HDFS(
•  Setup(
–  800(node(Hadoop((CDH3)(
cluster(
CPUs;(144GB(RAM)(
–  Cliﬀ’s(ﬁx((1(GB(from(1(
hour(to(10(minutes)(
H2O(
Cloud(
•  Deep(Learning(failed(
–  Skipping(rows(if(it(had(
missing(values(
–  99%(of(rows(had(missing(
values(

Experiment(
R(
H2O(
Cloud(
HDFS( HDFS(
•  Setup(
cluster(
CPUs;(144GB(RAM)(
–  Arno’s(ﬁxes(
H2O(
Cloud(
•  Deep(Learning(slow(

Experiment(
R(
H2O(
Cloud(
HDFS( HDFS(
•  Setup(
cluster(
CPUs;(144GB(RAM)(
–  Arno’s(ﬁxes(&(sugges3ons(
–  Reduced(data(
•  10(million(rows((60%(
training;(20%(valida3on;(
20%(test)(
H2O(
Cloud(

Experimental(Design(
Parameter' Range'
#(of(hidden(layers( (2,(4,(6,(8(
#(of(neurons( 200,(300,(400,(500,(600,(700(
ac3va3on(func3on( Rec3ﬁer;(Tanh;(Maxout;(Rec3ﬁerWithDropout(
feature(subset( All,(subset1(–(subset7(
test(data(set( All,(week4(–(week8(
L1/L2(regulariza3on( 0(D(1(
epoch( 500(
10(million(rows/1500(features((60%(training;(20%(valida3on;(20%(test)(
((

Results(
#'of'hidden'layers'
(Rec6ﬁer,'2'layer,'200'neurons,'
500'epoch,''L1/L2'='0)'
Area'Under'ROC'Curve'(AUC)'
'
2( 0.762(
4( 0.821(
6( 0.839(
8' 0.839'
How(much(depth(is(required?(
Best(
performance(
with(6(layers(

Results(
How(much(depth(is(required?(
Best(
performance(
with(600(
neurons(
#(of(hidden(layersD6(

Results(
Ac6va6on'func6on'
(6'layers;'600'neurons)'
AUC'
Tanh( 0.801(
Rec3fier( 0.856(
Maxout( 0.826(
Rec6fierWithDropout' 0.865'
Which(ac3va3on(func3on(produces(best(result?(
Best(performance(
with(
Rec3fierWithDropout(

Results(
Feature'subset' AUC'
subset1( 0.836(
subset2( 0.847(
subset3' 0.849'
subset4( 0.844(
subset5( 0.834(
subset6( 0.786(
subset7' 0.751'
Which(subset(of(features(produces(best(result?(
Best(performance(
with(subset3;(
Worst(for(subset7(
(2/3rd(less(feature)(

Results(
Epoch:'500'
Hidden:'2'layers'
Neurons:'200'each'layer'
Subset7'
'
AUC'
Epoch:'500'
Hidden:'6'layers'
Neurons:'600'each'layer'
Subset7'
'
AUC'
0.751( 0.86(
Can(deep(network(improve(subset7?(
11%(improvement(in(
performance((with(
1/3rd(of(the(feature(
set(

Results(
Test'Set' AUC'
Week(4( 0.856(
Week(8( 0.861(
Week(12( 0.852(
Week(16( 0.858(
Week(20( 0.853(
Is(deep(learning(temporally(robust?(
Performance(within(
1%(diﬀerence(upto(20(
weeks(

Conclusions(
•  Deep(Learning(using(H2O(is(beneﬁcial(for(payment(fraud(
preven3on(
–  Network(architecture(D(6(layers(with(600(neurons(each(performed(the(
best(
–  Ac3va3on(func3on((D(Rec3ﬁerWithDropout(performed(the(best(
–  Improved(performance(with(limited(feature(set(&(a(deep(network(
(11%(improvement(with(a(third(of(the(original(feature(set,(6(hidden(
layers,(600(neurons(each)(
–  Robust(to(temporal(varia3ons(

Conclusions(
•  Lessons(learned(in(using(H2O(
–  Slow(import(process((
–  Issues(with(compressed(data,(missing(values,(sparse(data(
–  Require(knowledge(of(performance(knobs(
–  Fantas3c(support(from(H2O(team(
•  Next(Steps(
–  Mul3Dclass(classiﬁca3on(
–  Produc3onalize(

PayPal's Fraud Detection with Deep Learning in H2O World 2014

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to PayPal's Fraud Detection with Deep Learning in H2O World 2014

Similar to PayPal's Fraud Detection with Deep Learning in H2O World 2014 (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

PayPal's Fraud Detection with Deep Learning in H2O World 2014