Practical Machine Learning and Rails Part2

TRAINING DATA:
- tweets
- positive/negative

TRAINING DATA:
- tweets
- positive/negative
- use emoticons from twitter

TRAINING DATA:
- tweets
- positive/negative
- use emoticons from twitter
:-) or :-(

BUILDING TRAINING DATA:
NEGATIVE
is upset that he cant update his Facebook by texting it... and might cry as a
result School today also. Blah!
I couldnt bear to watch it. And I thought the UA loss was embarrassing
I hate when I have to call and wake people up

POSITIVE
Just woke up. Having no school is the best feeling ever
Im enjoying a beautiful morning here in Phoenix
dropping molly off getting ice cream with Aaron

FEATURES:
BAG OF WORDS MODEL
split the text into words, create a dictionary,
and replace text with word counts

BAG OF WORDS
tweets:
I ran fast
Bob ran far
I ran to Bob

BAG OF WORDS
tweets:
I ran fast
Bob ran far
I ran to Bob

dictionary = %w{I ran fast Bob far to}

BAG OF WORDS
tweets: word vectors:
I ran fast [1 1 1 0 0 0]
Bob ran far [0 1 0 1 1 0]
I ran to Bob [1 1 0 1 0 1]

dictionary = %w{I ran fast Bob far to}

CLASSIFIER:
training examples:
word vector -> labels

CLASSIFIER:
training examples:

classiﬁcation algorithm

CLASSIFIER:
training examples:

classiﬁcation algorithm

model

WEKA
• open source java app
• contains common ML algorithms

WEKA
• gui interface

WEKA
• gui interface
• can access it from jruby

WEKA
• gui interface
• helps with:

WEKA
• gui interface
• helps with:
• converting words into vectors

WEKA
• gui interface
• helps with:
• converting words into vectors
• training/test, cross-validation,
metrics

TRAINING IN
WEKA

[SHOW EXAMPLE HERE]

EVALUATION
• correctly classiﬁed
• mean squared error

EVALUATION

false negative/positives

SENTIMENT
CLASSIFICATION
EXAMPLE
https://github.com/ryanstout/
mlexample

QUERYING
arff_path = Rails.root.join("data/sentiment.arff").to_s
arff = FileReader.new(arff_path)

model_path = Rails.root.join("models/sentiment.model").to_s
classifier = SerializationHelper.read(model_path)

data = begin
Instances.new(arff,1).tap do |instance|
if instance.class_index == -1
instance.set_class_index(instance.num_attributes - 1)
end
end
end

QUERYING

instance = SparseInstance.new(data.num_attributes)
instance.set_dataset(data)
instance.set_value(data.attribute(0), params[:sentiment][:message])

result = classifier.distribution_for_instance(instance).first
percent_positive = 1 - result.to_f

@message = "The text is #{(percent_positive*100.0).round}% positive"

HOW DO WE
IMPROVE?

•bigger dictionary

HOW DO WE
IMPROVE?

•bi-grams/tri-grams

HOW DO WE
IMPROVE?

•part of speech tagging

HOW DO WE
IMPROVE?

•part of speech tagging
•more data

Feature Generation

think about what information is
valuable to an expert

Feature Generation

think about what information is
valuable to an expert
remove data that isn't useful
(attribute selection)

ATTRIBUTE
SELECTION

[SHOW ATTRIBUTE SELECTION
EXAMPLE]

DOMAIN PRICE
PREDICTION

• predict how much a domain would
sell for

TRAINING DATA

• domains
• historical sale prices for domains

FEATURES
• split domain by words

FEATURES
• generate features for each word

FEATURES
• how common the word is

FEATURES
• number of google results for each
word

FEATURES
• number of google results for each
word
• cpc for the word

ALGORITHM

support vector regression
functions > SMOreg in weka

WHAT WE DIDN’T
COVER

• collaborative ﬁltering

WHAT WE DIDN’T
COVER

• clustering

WHAT WE DIDN’T
COVER

• clustering
• theorem proving (classical AI)

ADDITIONAL
RESOURCES

stanford machine learning class
ml-class.org

TOOLS
• weka
• libsvm, liblinear
• vowpal wabbit (big dictionaries)
• recommendify
• https://github.com/paulasmuth/recommendify

QUESTIONS

contact us on twitter at
@tectonic and @ryanstout

Practical Machine Learning and Rails Part2

Related slideshows

More Related Content

Practical Machine Learning and Rails Part2

Editor's Notes