SenSec: Mobile Application Security through Passive Sensing

Jiang Zhu and Sean Wang

Dec 5th, 2011

1

•  Monitor and track user behavior on smartphones using various
on-device sensors
•  Convert sensory traces and other context information to Personal
Behavior Features
•  Build Risk Analysis Trees with these features and use it for
calculation of Certainty Scores
•  Trigger various Authentication Schemes when certain application
is launched.

2

60% •  “The 329 organizations
polled had collectively lost
50% more than 86,000 devices
… with average cost of lost
40% data at $49,246 per device,
30%
worth $2.1 billion or $6.4
million per organization.
20%

10%
"The Billion Dollar Lost-Laptop Study,"
0% conducted by Intel Corporation and the
Ponemon Institute, analyzed the scope
and circumstances of missing laptop
Mobile Device Loss or theft PCs.

Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September
21-28, 2010, with an oversample in the top 20 cities (based on population).

5

Application
Password Different
applications may
have different
A major source of
sensitivities
security vulnerabilities.
Easy to guess, reuse,
forgotten, shared
Usability
Authentication too-often or
sometimes too loose

6

Application
Access
Control

8

•  MobiSens app collects sensor data
•  Motion sensors
•  GPS and WiFi Scanning
•  In-use applications and their traffic patterns

•  SenSec module build user behavior models
•  Unsupervised Activity Segmentation and model the sequence using
Language model
•  Building Risk Analysis Tree (DT) to detect anomaly
•  Combine above to estimate risk (online): certainty score

•  SenSec broadcast certainty score to other applications

•  Application Access Control Module uses broadcast receiver

9

•  Feature vector calculated from a step window represent the
behavior state within a given time window
•  surrounding environment: GPS location, WiFi signal
•  activity: motions, applications in use
•  communication: network traffic

•  Using Decision Tree to detect anomaly in behaviors
•  Each node represents a feature dimension
•  Leaves can be one of the following
•  Owner Detection: owner [0,1], 0: Anomaly, 1: Normal
•  User Identification: user id [0,1,…. N], user’s identification, i.e. IMEI

•  Multiple trees can be built with subset of feature space
•  Weighted average
•  Voting

10

•  Convert feature vector series to label streams – dimension reduction

•  Using n-gram to model sequence of label stream for each sensory
dimension – current state and transition captured
•  Step window with assigned length

A1 A2 A1 A4

G2 G5 G2 G2

W2 W1 W2

P1 P3 P6 P1

A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1

11

•  User behavior at time t depends only on the last n-1 behaviors

•  Sequence of behaviors can be predicted by n consecutive
location in the past

•  Maximum Likelihood Estimation from training data by counting:

•  MLE assign zero probability to unseen n-grams
Incorporate smoothing function (Katz)
Discount probability for observed grams
Reserve probability for unseen grams

12

•  Feed sequence of the past behaviors in a stepping window of size
N to n-gram model for testing
•  For a testing sequence of behavior labels

•  Estimate the average log probability this sequence is generated
from the n-gram

•  If this likelihood drops below a threshold, flag an anomaly alert

13

Anomaly
Preprocessing
Detection

Behavior Text
N-gram
Generation
Fusion Model
MobiSens Extract
Trace Features

Sensing Decision Trees ~

Threshold >

Anomaly Y/N

15

•  Total data set size: 4GB
Dataset •  Remove 2 heavy users
Numer of users 50
•  Remove users with very
Device Android phones limited data duration
•  Remove users that don’t
Location Bay area
have application and traffic
Averag period 30 days data due to older MobiSens
version
Number of data
7
types •  25 users with comparable
Finest sampling dataset size
interval (motion 200 ms
sensors) •  Data duration: 4 hour ~ 2.5
days

17

•  Motion Sensors (100)
•  Used to summarize
acceleration stream
•  Calculated separately for each
dimension [x,y,z,m]

•  GPS: location label via density based clustering (1)

•  WiFi: (SSIDs, RSSIs) pairs ranked by signal strength (6)

•  Applications: Bitmap of well-known applications (60 + 1)

•  Application Traffic Pattern: Tx/Rx traffic vectors (120 + 2)

•  Step Window Size: 5 seconds

18

•  User Identification Test and Owner Detection Test for randomly
selected partial data set (4 users) with 1:1 training/test split
•  ~ 99% accuracy
•  number of leaves: 56 , size of tree: 111

•  Using non-motion attributes yields lower accuracy (96%)
•  Significant tree size reduction, number of leaves: 3, size of tree: 5
•  Cross entropy may be significant to easily distinguish users using some
features.

•  Using only motion attributes can distinguish different users
•  ~ 98% accuracy
•  very large tree, number of leaves: 267, size of tree 533
•  may cause performance issues on mobile platform

19

•  Apply cross-entropy filter to remove users that could be identified
easily using a small set of features
•  12 users with 210k data instances

•  User identification : train RAT model on 66% instances and rest
as testing
84.8% 83.5 79.3
100
7649
80
60 Accuracy
40 Size Factor
20 221 35

0
All Non-Motion Motion-Only
20

•  Experiments to discover anomaly usage with ~80% accuracy with
only days of training data
22

•  Extended data set for feature construction
TCP, UDP traffic; sound; ambient lighting; battery status, etc.

•  Data and Modeling
Gain more insights into the data, features and factorized relationships among
various sensors
Try other classification methods and compare results: LR, SVM, Random
Forest, etc

•  Enhanced security of SenSec components
Integration with Android security framework and other applications

•  Privacy challenges
Data collection, model training, privacy policy, etc.

•  Energy efficiency

23

•  Data Collection 9.=$(1/6'9.=$;1'
(1/6$/<' 9.=$(1/6'7+"@1/:
•  Running app list
!55;$"+#$./ A$21;.<<1,'
C./#,.;
D0 31%$"1' !55;$"+#$./6
•  Per-app traffic pattern 4,.2$;1'!40
!"#$%$#&'
9166+<1' ()**+,$-+#$./'

•  IPC Interface !"#$%$#&' 4..; 0/#1,2+"1
(1<*1/#+#$./ 31%$"1'
C./#,.;;1, 9.:1;
(#.,+<1' 718+%$.,'9.:1;$/<'
•  Certainty Score 4)68$/<
B1=(1,%$"1' (&6#1* !;<.,$#8*6
3+#+'
Broadcast mechanism !<<,1<+#., 3+#+'
>?"8+/<1' 9.=$(1/6'
!40 3+#+'
3+#+ 3+#+'4,15,."166.,
>?"8+/<1'
(1/6.,' D5;.+: !40
B$:<1#6

E+F'9.=$(1/6'9.=$;1'!55;$"+#$./ E=F'G$1,'H E"F'G$1,'I

•  Offline-Model Push via Data Exchange API
•  Risk Analysis Tree can be trained using global data on the MobiSens Server
and pushed back to the mobile device

28

•  MobiSens Server
•  Offline Clustering
•  K-means package from Weka Data Mining Toolkit
•  Using aggregated data from all users
•  Offline RAT training
•  Decision Tree package from Weka Data Mining Toolkit
•  Construct training data set and design evaluation strategy

•  MobiSens Client
•  Retrive RAT model from MobiSens Server
•  On-device n-gram label sequence construction (n=1,2,3; window size =5s)
•  RAT inference using Weka Toolkit on device
•  Status bar notification based on certainty value

29

•  Reactive API to Team Access
API call from Team Access to SenSec to retrieve the current Certainty Score
given the context

getCertaintyScore(SenSecContextType ctx, count)

•  Proactive API to Team Acess and other equivalent modules
Broadcast Receiver on Certainty Score

certaintyScore{
CertaintyScoreType scores[];
WindowSizeType window_size;
SenSecContextType ctx;
}

30

SenSec: Mobile Application Security through Passive Sensing

Related slideshows

More Related Content

SenSec: Mobile Application Security through Passive Sensing