Pattern mining is an unsupervised machine learning technique used to discover frequent patterns and relationships in log data. It involves finding the top frequent sets of items that occur together in the data at least a minimum number of times. There are two main approaches - candidate generation which generates and filters candidate patterns in multiple passes over the data, and pattern growth which constructs conditional databases to avoid multiple full scans. Pattern mining can be used to find commonly purchased itemsets, extract features from log data, and derive rules for recommendations.
Report
Share
Report
Share
1 of 73
Download to read offline
More Related Content
Pattern Mining: Extracting Value from Log Data
1. Pattern Mining: Getting the
most out of your log data.
Krishna Sridhar
Staff Data Scientist, Dato Inc.
krishna_srd
2. • Background
- Machine Learning (ML) Research.
- Ph.D Numerical Optimization @Wisconsin
• Now
- Build ML tools for data-scientists & developers @Dato.
- Help deploy ML algorithms.
@krishna_srd, @DatoInc
About Me!
6. Creating a model pipeline
Ingest Transform Model Deploy
Unstructured Data
exploration
data
modeling
Data Science Workflow
Ingest Transform Model Deploy
26. ML is not a black-box.
Transparency
Learning is also about understanding.
Interpretability
Whatever can go wrong, will go wrong.
Diagnosis
Moving on
30. Formulating Pattern Mining
Find the top K most frequent sets of length at least L
that occur at least M times.
- max_patterns
- min_length
- min_support
33. Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{C, D}: 5 is frequent
M = 4
{A, D}: 5 is not frequent
34. Principle 1: What is frequent?
A pattern is frequent if it occurs at least M times.
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{C, D}: 5 is frequent
M = 4
{A, D}: 5 is not frequent
min_support
35. Principle 2: Apriori principle
A pattern is frequent only if a subset is frequent
{B, C, D}
{A, C, D}
{A, B, C, D}
{A, D}
{B, C, D}
{B, C, D}
{B, C, D} : 5 is frequent therefore
{C, D} : 5 is frequent
{A} : 3 is not frequent therefore
{A, D} : 3 is not frequent
M = 4
59. Compare & Constrast
• Candidate Generation
+ Better than brute force
+ Filters candidate sets
- Multiple passes over the data
• Pattern Growth
+ Fewer passes over the data
+ Space efficient.
60. Compare & Constrast
• Candidate Generation
+ Better than brute force
+ Filters candidate sets
- Multiple passes over the data
• Pattern Growth
+ Fewer passes over the data
+ Space efficient.
Better choice
69. Creating a model pipeline
Ingest Transform Model Deploy
Unstructured Data
exploration
data
modeling
Data Science Workflow
Ingest Transform Model Deploy
71. Summary
Log Data Mining
≠
Rocket Science
• FP-Growth for finding frequent patterns.
• Find rules from patterns to make predictions.
• Extract features for useful ML in pattern space.