SlideShare a Scribd company logo
Music Recommendation and Discovery in
            the Long Tail


                    Òscar Celma
              Doctoral Thesis Defense
    (Music Technology Group ~ Universitat Pompeu Fabra)
PhD defense // UPF // Feb 16th 2009



Music
     Recommendation
(personalized)

               and Discovery
(explore large music collections)

                        in the Long Tail
(non-obvious, novel, relevant music)
PhD defense // UPF // Feb 16th 2009
“The Paradox of Choice: Why More Is Less”, Barry Schwartz (2004)

               The problem
 Paradox of choice
PhD defense // UPF // Feb 16th 2009


music overload
• Today(August, 2007)
       iTunes: 6M tracks
   

       P2P: 15B tracks
   

       53% buy music on line
   




• Finding unknown, relevant music is hard!
       Awareness vs. access to content
   
PhD defense // UPF // Feb 16th 2009


music overload?
      Digital Tracks – Sales data for 2007
●

       ●


           Nearly 1 billion sold in 2007
       ●


       ●


           1% of tracks account for 80% of sales
       ●


       ●


           3.6 million tracks sold less than 100 copies, and
       ●


           1 million tracks sold exactly 1 copy
       ●


•
•
•Data from Nielsen Soundscan 'State of the (US) industry' 2007 report
PhD defense // UPF // Feb 16th 2009


the Long Tail of popularity
• Help me find it! [Anderson, 2006]
PhD defense // UPF // Feb 16th 2009


research questions
• 1) How can we evaluate/compare different music
  recommendation approaches?

• 2) How far into the Long Tail do music
  recommenders reach?

• 3) How do users perceive novel (unknown to
  them), non-obvious recommendations?
PhD defense // UPF // Feb 16th 2009




If you like
  The Beatles
    you might like ...
PhD defense // UPF // Feb 16th 2009
PhD defense // UPF // Feb 16th 2009
PhD defense // UPF // Feb 16th 2009
PhD defense // UPF // Feb 16th 2009

                                      • popularity bias
                                      • low novelty
                                        ratio
PhD defense // UPF // Feb 16th 2009




    FACTORS AFFECTING RECOMMENDATIONS:

    Novelty
    Relevance
    Diversity
    Cold start
    Coverage
    Explainability
    Temporal effects
PhD defense // UPF // Feb 16th 2009




    FACTORS AFFECTING RECOMMENDATIONS:

    Novelty
    Relevance
    Diversity
    Cold start
    Coverage
    Explainability
    Temporal effects
PhD defense // UPF // Feb 16th 2009


novelty vs. relevance
PhD defense // UPF // Feb 16th 2009


how can we measure novelty?
• predictive accuracy vs. perceived quality
• metrics
       MAE, RMSE, P/R/F-measure, ...
   



                                         Test


                                       Train




       Can't measure novelty
   
PhD defense // UPF // Feb 16th 2009


how can we measure novelty?
• predictive accuracy vs. perceived quality
• metrics
       MAE, RMSE, P/R/F-measure, ...
   




       Can measure novelty
   
PhD defense // UPF // Feb 16th 2009


how can we measure relevance?

   quot;The key utility measure is user happiness. It
     seems reasonable to assume that relevance of
     the results is the most important factor:
     blindingly fast, useless answers do not make a
     user happy.quot;

         quot;Introduction to Information Retrievalquot;
        (Manning, Raghavan, and Schutze, 2008)
PhD defense // UPF // Feb 16th 2009
PhD defense // UPF // Feb 16th 2009


research in music recommendation
• Google Scholar




   Papers that contain “music recommendation” or “music recommender”
   in the title (Accessed October 1st, 2008)
PhD defense // UPF // Feb 16th 2009


research in music recommendation
• ISMIR community
PhD defense // UPF // Feb 16th 2009


music recommendation approaches
• Expert-based
• Collaborative filtering
• Context-based
• Content-based
• Hybrid (combination)
PhD defense // UPF // Feb 16th 2009


music recommendation approaches
• Expert-based
       AllMusicGuide
   

       Pandora
   

• Collaborative filtering
• Context-based
• Content-based
• Hybrid (combination)
PhD defense // UPF // Feb 16th 2009


music recommendation approaches
• Expert-based
• Collaborative filtering
       User-Item matrix
                                     [Resnick, 1994], [Shardanand, 1995], [Sarwar, 2001]




• Context-based
• Content-based
PhD defense // UPF // Feb 16th 2009


music recommendation approaches
• Expert-based
• Collaborative filtering
       User-Item matrix
                                     [Resnick, 1994], [Shardanand, 1995], [Sarwar, 2001]

       Similarity
   

         Cosine

         Adj. cosine

         Pearson

         SVD / NMF: matrix factorization
• Context-based
• Content-based
PhD defense // UPF // Feb 16th 2009


music recommendation approaches
• Expert-based
• Collaborative filtering
       User-Item matrix
                                     [Resnick, 1994], [Shardanand, 1995], [Sarwar, 2001]

       Similarity
   

         Cosine

         Adj. cosine

         Pearson

         SVD / NMF: matrix factorization
       Prediction (user-based)
   

         Avg. weighted
PhD defense // UPF // Feb 16th 2009


 music recommendation approaches
 • Expert-based
 • Collaborative filtering
 • Context-based
         WebMIR
     
                                                                      thrash
                        [Schedl, 2008]



Content Reviews Lyrics Blogs heavy metal Tags Bios Playlists
                              Social
                                            Edgy
                               Weird
                                                         concert
                                                                             90s
                                                                          Loud
                                                                   rock
          [Hu&Downie, 2006]             [Celma et al., 2006] [Levy&Sandler, 2007]   [Baccigalupo, 2008]
                                                            [Symeonidis, 2008]

 • Content-based
 • Hybrid (combination)
PhD defense // UPF // Feb 16th 2009


music recommendation approaches
• Expert-based
• Collaborative filtering
• Context-based
• Content-based
       Audio features
   

         Bag-of-frames (MFCC) [Aucouturier, 2004], Rhythm   [Gouyon,
          2005], Harmony [Gomez, 2006], ...

       Similarity
   

         KL-divergence: GMM [Aucouturier, 2002]
         EMD [Logan, 2001]
         Euclidean: PCA [Cano, 2005]
         Cosine: mean/var (feature vectors)
         Ad-hoc
• Hybrid (combination)
PhD defense // UPF // Feb 16th 2009


music recommendation approaches
• Expert-based
• Collaborative filtering
• Context-based
• Content-based
• Hybrid (combination)
       Weighted
   

       Cascade
   

       Switching
   
PhD defense // UPF // Feb 16th 2009




                                      Work done
PhD defense // UPF // Feb 16th 2009


contributions
PhD defense // UPF // Feb 16th 2009


contributions

           1) Network-based evaluation
                Item Popularity + Complex networks
PhD defense // UPF // Feb 16th 2009


contributions

           1) Network-based evaluation
                Item Popularity + Complex networks




                                      2) User-based evaluation
PhD defense // UPF // Feb 16th 2009


contributions

           1) Network-based evaluation
                Item Popularity + Complex networks




                                      2) User-based evaluation
           3) Systems
PhD defense // UPF // Feb 16th 2009


contributions
PhD defense // UPF // Feb 16th 2009


contributions
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• 3 Artist similarity (directed) networks
       CF*: Social-based, incl. item-based CF (Last.fm)
   

         “people who listen to X also listen to Y”
       CB: Content-based Audio similarity
   

         “X and Y sound similar”
       EX: Human expert-based (AllMusicGuide)
   

         “X similar to (or influenced by) Y”
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• 3 Artist similarity (directed) networks
       CF*: Social-based, incl. item-based CF (Last.fm)
   

         “people who listen to X also listen to Y”
       CB: Content-based Audio similarity
   

         “X and Y sound similar”
       EX: Human expert-based (AllMusicGuide)
   

         “X similar to (or influenced by) Y”
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Small-world networks [Watts & Strogatz, 1998]




       Network traverse in a few clicks
   
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Indegree – avg. neighbor indegree correlation
       r = Pearson correlation
                                     [Newman, 2002]
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Indegree – avg. neighbor indegree correlation
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Indegree – avg. neighbor indegree correlation
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Indegree – avg. neighbor indegree correlation


Kin(Bruce Springsteen)=534
=>
avg(Kin(sim(Bruce Springsteen)))=463
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Indegree – avg. neighbor indegree correlation


Kin(Bruce Springsteen)=534
=>
avg(Kin(sim(Bruce Springsteen)))=463




Kin(Mike Shupp)=14
=>
avg(Kin(sim(Mike Shupp)))=15
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Indegree – avg. neighbor indegree correlation


Kin(Bruce Springsteen)=534
=>
avg(Kin(sim(Bruce Springsteen)))=463




Kin(Mike Shupp)=14
=>
avg(Kin(sim(Mike Shupp)))=15




Homophily effect!
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Indegree – avg. neighbor indegree correlation
       Last.fm presents assortative mixing (homophily)
   

         Artists with high indegree are connected together,
          and similarly for low indegree artists
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Last.fm is a scale-free network [Barabasi, 2000]
       power law exponent for the cumulative indegree
   

       distribution [Clauset, 2007]




       A few artists (hubs) control the network
   
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• Summary: artist similarity networks
|------------|---------|-----|-----------|
|            | Last.fm | CB | Exp (AMG) |
|------------|---------|-----|-----------|
|Small World |   yes   | yes |    yes    |
|            |         |     |           |
|Ass. mixing |   yes   | No |      No    |
|            |         |     |           |
| Scale-free |   yes   | No |      No    |
|------------|---------|-----|-----------|

        Last.fm artist similarity network resembles to a social
    

        network (e.g. facebook)
PhD defense // UPF // Feb 16th 2009


complex network analysis :: artists
• But, still some remaining questions...

       Are the hubs the most popular artists?
   




       How can we navigate along the Long Tail, using
   

       the artist similarity network?
PhD defense // UPF // Feb 16th 2009


contributions


                Long Tail analysis
PhD defense // UPF // Feb 16th 2009


the Long Tail in music
• last.fm dataset (~260K artists)
PhD defense // UPF // Feb 16th 2009


the Long Tail in music
• last.fm dataset (~260K artists)
           the beatles (50,422,827)




               radiohead (40,762,895)
                 red hot chili peppers (37,564,100)


                   muse (30,548,064)
                    death cab for cutie (29,335,085)
                      pink floyd (28,081,366)
                       coldplay (27,120,352)
                        metallica (25,749,442)
PhD defense // UPF // Feb 16th 2009


the Long Tail model                   [Kilkki, 2007]

• F(x) = Cumulative distribution up to x
PhD defense // UPF // Feb 16th 2009


the Long Tail model                      [Kilkki, 2007]

• Top-8 artists: F(8)~ 3.5% of total plays




             50,422,827     the beatles
             40,762,895     radiohead
             37,564,100     red hot chili peppers
             30,548,064     muse
             29,335,085     death cab for cutie
             28,081,366     pink floyd
             27,120,352     coldplay
             25,749,442     metallica
PhD defense // UPF // Feb 16th 2009


the Long Tail model                      [Kilkki, 2007]

• Split the curve in three parts




                 (82 artists)         (6,573 artists)     (~254K artists)
PhD defense // UPF // Feb 16th 2009


contributions


                                      +
             Long Tail analysis
PhD defense // UPF // Feb 16th 2009


artist indegree vs. artist popularity
• Are the network hubs the most popular artists?


                                      ???
PhD defense // UPF // Feb 16th 2009


artist indegree vs. artist popularity
       Last.fm: correlation between Kin and playcounts
   

         r = 0.621
PhD defense // UPF // Feb 16th 2009


artist indegree vs. artist popularity
       Audio CB similarity: no correlation
   

         r = 0.032
PhD defense // UPF // Feb 16th 2009


artist indegree vs. artist popularity
       Expert: correlation between Kin and playcounts
   

         r = 0.475
PhD defense // UPF // Feb 16th 2009


navigation along the Long Tail
• “From Hits to Niches”
       # clicks to reach a Tail artist, starting in the Head
   




                                      how many clicks?
PhD defense // UPF // Feb 16th 2009


navigation along the Long Tail
• “From Hits to Niches”
       Audio CB similarity example (VIDEO)
   
PhD defense // UPF // Feb 16th 2009


navigation along the Long Tail
• “From Hits to Niches”
       Audio CB similarity example
   

         Bruce Springsteen (14,433,411 plays)
PhD defense // UPF // Feb 16th 2009


navigation along the Long Tail
• “From Hits to Niches”
       Audio CB similarity example
   

         Bruce Springsteen (14,433,411 plays)
         The Rolling Stones (27,720,169 plays)
PhD defense // UPF // Feb 16th 2009


navigation along the Long Tail
• “From Hits to Niches”
       Audio CB similarity example
   

         Bruce Springsteen (14,433,411 plays)
         The Rolling Stones (27,720,169 plays)
         Mike Shupp (577 plays)
PhD defense // UPF // Feb 16th 2009


artist similarity vs. artist popularity
• navigation in the Long Tail
       Similar artists, given an artist in the HEAD part:
   


             CF                               CB                        EXP




                                              64,74%
                                                                        60,92%
             54,68%
    45,32%
                                                                                 33,26%
                                                       28,80%
                       (0%)           6,46%                     5,82%
   Head Mid Tail                      Head Mid Tail             Head Mid Tail

       Also, it can be seen as a Markovian Stochastic
   

       process...
PhD defense // UPF // Feb 16th 2009


artist similarity vs. artist popularity
• navigation in the Long Tail
       Markov transition matrix
   
PhD defense // UPF // Feb 16th 2009


artist similarity vs. artist popularity
• navigation in the Long Tail
       Markov transition matrix
   
PhD defense // UPF // Feb 16th 2009


artist similarity vs. artist popularity
• navigation in the Long Tail
       Last.fm Markov transition matrix
   
PhD defense // UPF // Feb 16th 2009


artist similarity vs. artist popularity
• navigation in the Long Tail
       From Head to Tail, with P(T|H) > 0.4
   

       Number of clicks needed
   

         CF : 5
         CB : 2
         EXP: 2            HEAD




                                      #clicks?
                                                 TAIL
PhD defense // UPF // Feb 16th 2009


artist popularity
Summary
|-----------------------|---------|-----|-----------|
|                                     | Last.fm | CB   | Exp (AMG) |
|-----------------------|---------|-----|-----------|
|   Indegree / popularity|                yes   |   no |    yes    |
|                                     |         |      |          |
|Similarity / popularity|                 yes   |   no |    no    |
|-----------------------|---------|-----|-----------|
PhD defense // UPF // Feb 16th 2009


summary: complex networks+popularity
|-----------------------|---------|-----|-----------|
|                                     | Last.fm | CB   | Exp (AMG) |
|-----------------------|---------|-----|-----------|
|                   Small World |         yes   | yes |      yes    |
|                                     |         |      |            |
|                     Scale-free |        yes   |   no |      no    |
|                                     |         |      |            |
|                   Ass. mixing |         yes   |   no |     no     |
|-----------------------|---------|-----|-----------|
|   Indegree / popularity|                yes   |   no |    yes     |
|                                     |         |      |            |
|Similarity / popularity|                 yes   |   no |     no     |
|-----------------------|---------|-----|-----------|
|            POPULARITY BIAS |            YES   |   NO |   FAIRLY   |
|-----------------------|---------|-----|-----------|
PhD defense // UPF // Feb 16th 2009


contributions

           1) Network-based evaluation
                Item Popularity + Complex networks




                                      2) User-based evaluation
           3) Systems
PhD defense // UPF // Feb 16th 2009


contribution #2: User-based evaluation
• How do users perceive novel, non-obvious
  recommendations?
       Survey
   

         288 participants
       Method: blind music recommendation
   

         no metadata (artist name, song title)
         only 30 sec. audio excerpt
PhD defense // UPF // Feb 16th 2009


music recommendation survey
• 3 approaches:
       CF: Social-based Last.fm similar tracks
   

       CB: Pure audio content-based similarity
   

       HYbrid: AMG experts + audio CB to rerank songs
   

         (Not a combination of the two previous approaches)
• User profile:
       last.fm, top-10 artists
   

• Procedure
       Do you recognize the song?
   

         Yes, Only Artist, Both Artist and Song title
       Do you like the song?
   

         Rating: [1..5]
PhD defense // UPF // Feb 16th 2009
PhD defense // UPF // Feb 16th 2009


music recommendation survey: results
• Overall results
PhD defense // UPF // Feb 16th 2009


music recommendation survey: results
• Overall results
PhD defense // UPF // Feb 16th 2009


music recommendation survey: results
• Familiar recommendations (Artist & Song)
PhD defense // UPF // Feb 16th 2009


music recommendation survey: results
• Ratings for novel recommendations
PhD defense // UPF // Feb 16th 2009


music recommendation survey: results
• Ratings for novel recommendations




       one-way ANOVA within subjects (F=29.13,   p<0.05)
   

       Tukey's test (pairwise comparison)
   
PhD defense // UPF // Feb 16th 2009


music recommendation survey: results
• % of novel recommendations
PhD defense // UPF // Feb 16th 2009


music recommendation survey: results
• % of novel recommendations




       one-way ANOVA within subjects (F=7,57,   p<0.05)
   

       Tukey's test (pairwise comparison)
   
PhD defense // UPF // Feb 16th 2009


music recommendation survey: results
• Novel recommendations




       Last.fm provides less % of novel songs, but of
   

       higher quality
PhD defense // UPF // Feb 16th 2009


contributions

           1) Network-based evaluation
                Item Popularity + Complex networks




                                      2) User-based evaluation
           3) Systems
PhD defense // UPF // Feb 16th 2009




Why?
besides better understanding of music recommendation...
Open questions in the State of the Art in music discovery &
  recommendation:

   Is it possible to create a music discovery engine exploiting the
      music content in the WWW? How to build it? How can we
      describe the available music content?
   => SearchSounds


   Is it possible to recommend, filter and personalize music
      content available on the WWW? How to describe a user
      profile? What can we recommend beyond similar artists?
   => FOAFing the Music
PhD defense // UPF // Feb 16th 2009


contribution #3: two complete systems
• Searchsounds
       Music search engine
   

         keyword based search
         “More like this” (audio CB)
PhD defense // UPF // Feb 16th 2009


contribution #3: two complete systems
• Searchsounds




       Crawl MP3 blogs
   

       > 400K songs analyzed
   
PhD defense // UPF // Feb 16th 2009


contribution #3: two complete systems
• Searchsounds
       Further work: improve song descriptions using
   

         Auto-tagging           [Lamere, 2008] [Turnbull, 2007]
             audio CB similarity [Sordo et al., 2007]
             tags from the text (music dictionary)
         Feedback from the users
             thumbs-up/down
             tag audio content
PhD defense // UPF // Feb 16th 2009


contribution #3: two complete systems
• FOAFing the music
       Music recommendation
   

         constantly gathering music related info via RSS feeds
         It offers:
             artist recommendation
             new music releases (iTunes, Amazon, eMusic, Rhapsody, Yahoo! Shopping)
             album reviews
             concerts close to user's locations
             related mp3 blogs and podcasts
PhD defense // UPF // Feb 16th 2009


contribution #3: two complete systems
• FOAFing the music
       Integrates different user accounts (circa 2005!)
   




       Semantic Web (FOAF, OWL/RDF) + Web 2.0
   

       2nd prize Semantic Web Challenge (ISWC 2006)
   
PhD defense // UPF // Feb 16th 2009


contribution #3: two complete systems
• FOAFing the music
       Further work:
   

         Follow Linking Open Data best practices
         Link our music recommendation ontology with
          Music Ontology [Raimond et al., 2007]
         (Automatically) add external information from:
             Myspace
             Jamendo
             Garageband
             ...
PhD defense // UPF // Feb 16th 2009


summary of contributions :: research questions
• 1) How can we evaluate/compare different music
  recommendation approaches?

• 2) How far into the Long Tail do music
  recommenders reach?

• 3) How do users perceive novel (unknown to
  them), non-obvious recommendations?
PhD defense // UPF // Feb 16th 2009


summary of contributions :: research questions
• 1) How can we evaluate/compare different music
  recommendation approaches?
       Objective framework comparing music rec.
   

       approaches (CF, CB, EX) using Complex Network
       analysis
       Highlights fundamental differences among the
   

       approaches


• 2) How far into the Long Tail do music
  recommenders reach?

• 3) How do users perceive novel (unknown to
  them), non-obvious recommendations?
PhD defense // UPF // Feb 16th 2009


summary of contributions :: research questions
• 1) How can we evaluate/compare different music
  recommendation approaches?

• 2) How far into the Long Tail do music
  recommenders reach?
       Combine 1) with the Long Tail model, and Markov
   

       model theory
       Highlights differences in terms of discovery and
   

       navigation


• 3) How do users perceive novel (unknown to
  them), non-obvious recommendations?
PhD defense // UPF // Feb 16th 2009


summary of contributions :: research questions
• 1) How can we evaluate/compare different music
  recommendation approaches?

• 2) How far into the Long Tail do music
  recommenders reach?

• 3) How do users perceive novel (unknown to
  them), non-obvious recommendations?
       Survey with 288 participants
   

       Still room to improve novelty (3/5 or less...)
   

         To appreciate novelty users need to understand the
          recommendations
PhD defense // UPF // Feb 16th 2009


summary of contributions :: research questions
• 1) How can we evaluate/compare different music
  recommendation approaches?
• 2) How far into the Long Tail do music
  recommenders reach?
• 3) How do users perceive novel (unknown to
  them), non-obvious recommendations?
=>
       Systems that perform best (CF) do not exploit the
   

       Long Tail, and
       Systems that can ease Long Tail navigation (CB) do
   

       not perform good enough
       Combine (hybrid) different approaches!
   
PhD defense // UPF // Feb 16th 2009




                                          Systems that perform
                                      

                                          best (CF) do not exploit
                                          the Long Tail, and
                                          Systems that can ease
                                      

                                          Long Tail navigation (CB)
                                          do not perform good
                                          enough
                                          Combine different
                                      

                                          approaches!
PhD defense // UPF // Feb 16th 2009


summary of contributions :: systems
• Furthermore...
       2 web systems that improved existing State of the
   

       Art work in music discovery and recommendation
         Searchsounds: music search engine exploiting music
          related content in the WWW
         FOAFing the Music: music recommender based on a
          FOAF user profile, also offering a number of extra
          features to complement the recommendations
PhD defense // UPF // Feb 16th 2009


further work :: limitations
• 1) How can we evaluate/compare different
  recommendations approaches?
       Dynamic networks
                                     [Leskovec, 2008]

         track item similarity over time
         track user's taste over time
         trend and hype detection
PhD defense // UPF // Feb 16th 2009


further work :: limitations
• 2) How far into the Long Tail do recommendation
  algorithms reach?
       Intercollections
   




       how to detect bad quality music in the tail?
   
PhD defense // UPF // Feb 16th 2009


further work :: limitations
• 3) How do users perceive novel, non-obvious
  recommendations?
    User understanding               [Jennings, 2007]

         savant, enthusiast, casual, indifferent
       Transparent, steerable recommendations
                                                        [Lamere &
       Maillet, 2008]

         Why? as important as What?
PhD defense // UPF // Feb 16th 2009


summary: articles
• #1) Network-based evaluation for RS
         O. Celma and P. Cano. “From hits to niches? or how
          popular artists can bias music recommendation and
          discovery”. ACM KDD, 2008.
         J. Park, O. Celma, M. Koppenberger, P. Cano, and J. M.
          Buldu. “The social network of contemporary popular
          musicians”. Journal of Bifurcation and Chaos (IJBC),
          17:2281–2288, 2007.
         M. Zanin, P. Cano, J. M. Buldu, and O. Celma. “Complex
          networks in recommendation systems”. WSEAS, 2008
         P. Cano, O. Celma, M. Koppenberger, and J. M. Buldu
          “Topology of music recommendation networks”. Journal
          Chaos (16), 2006.
• #2) User-based evaluation for RS
         O. Celma and P. Herrera. “A new approach to
          evaluating novel recommendations”. ACM RecSys, 2008.
PhD defense // UPF // Feb 16th 2009


summary: articles
• #3) Prototypes
       FOAFing the Music
   

         O. Celma and X. Serra. “FOAFing the music: Bridging
          the semantic gap in music recommendation”. Journal of
          Web Semantics, 6(4):250–256, 2008.
         O. Celma. “FOAFing the music”. 2nd Prize Semantic Web
          Challenge ISWC, 2006.
         O. Celma, M. Ramirez, and P. Herrera. “FOAFing the
          music: A music recommendation system based on rss
          feeds and user preferences”. ISMIR, 2005.
         O. Celma, M. Ramirez, and P. Herrera. “Getting music
          recommendations and filtering newsfeeds from foaf
          descriptions”. Scripting for the Semantic Web, ESWC,
          2005.
PhD defense // UPF // Feb 16th 2009


summary: articles
• #3) Prototypes
       Searchsounds
   

         O. Celma, P. Cano, and P. Herrera. “Search sounds: An
          audio crawler focused on weblogs”. ISMIR, 2006.
         V. Sandvold, T. Aussenac, O. Celma, and P. Herrera.
          “Good vibrations: Music discovery through personal
          musical concepts”. ISMIR, 2006.
         M. Sordo, C. Laurier, and O. Celma. “Annotating music
          collections: how content-based similarity helps to
          propagate labels”. ISMIR, 2007.
PhD defense // UPF // Feb 16th 2009


summary: articles
• Misc. (mainly MM semantics)
         R. Garcia C. Tsinaraki, O. Celma, and S. Christodoulakis.
          “Multimedia Content Description using Semantic Web
          Languages” book, Chapter 2. Springer–Verlag, 2008.
         O. Celma and Y. Raimond. “Zempod: A semantic web
          approach to podcasting”. Journal of Web Semantics,
          6(2):162–169, 2008.
         S. Boll, T. Burger, O. Celma, C. Halaschek-Wiener, E.
          Mannens. “Multimedia vocabularies on the Semantic
          Web”. W3C Technical report, 2007.
         O. Celma, P. Herrera, and X. Serra. “Bridging the music
          semantic gap”. SAMT, 2006.
         R. Garcia and O. Celma. “Semantic integration and
          retrieval of multimedia metadata”. ESWC, 2005
PhD defense // UPF // Feb 16th 2009


summary: articles
         R. Troncy, O. Celma, S. Little, R. Garcia and C. Tsinaraki.
          “MPEG-7 based multimedia ontologies: Interoperability
          support or interoperability issue?” MARESO, 2007.
         M. Sordo, O. Celma, M. Blech, and E. Guaus. “The quest
          for musical genres: Do the experts and the wisdom of
          crowds agree?”. ISMIR, 2008.
• Music Recommendation Tutorials -- with Paul Lamere
         ACM MM, 2008 (Vancouver, Canada)
         ISMIR, 2007 (Vienna, Austria)
         MICAI, 2007 (Aguascalientes, Mexico)
PhD defense // UPF // Feb 16th 2009


summary: dissemination
• PhD Webpage
       http://mtg.upf.edu/~ocelma/PhD
   

         PDF
         Source code
             Long Tail Model in R
         References
             Citeulike
         Related links
             delicious
PhD defense // UPF // Feb 16th 2009


acknowledgments




     NB: The complete list of acknowledgments can be found in the document
Music Recommendation and Discovery in
            the Long Tail


                    Òscar Celma
              Doctoral Thesis Defense
    (Music Technology Group ~ Universitat Pompeu Fabra)
PICA-PICA
UPF-Tanger, 3rd floor

More Related Content

Music Recommendation and Discovery in the Long Tail

  • 1. Music Recommendation and Discovery in the Long Tail Òscar Celma Doctoral Thesis Defense (Music Technology Group ~ Universitat Pompeu Fabra)
  • 2. PhD defense // UPF // Feb 16th 2009 Music Recommendation (personalized) and Discovery (explore large music collections) in the Long Tail (non-obvious, novel, relevant music)
  • 3. PhD defense // UPF // Feb 16th 2009 “The Paradox of Choice: Why More Is Less”, Barry Schwartz (2004) The problem Paradox of choice
  • 4. PhD defense // UPF // Feb 16th 2009 music overload • Today(August, 2007) iTunes: 6M tracks  P2P: 15B tracks  53% buy music on line  • Finding unknown, relevant music is hard! Awareness vs. access to content 
  • 5. PhD defense // UPF // Feb 16th 2009 music overload? Digital Tracks – Sales data for 2007 ● ● Nearly 1 billion sold in 2007 ● ● 1% of tracks account for 80% of sales ● ● 3.6 million tracks sold less than 100 copies, and ● 1 million tracks sold exactly 1 copy ● • • •Data from Nielsen Soundscan 'State of the (US) industry' 2007 report
  • 6. PhD defense // UPF // Feb 16th 2009 the Long Tail of popularity • Help me find it! [Anderson, 2006]
  • 7. PhD defense // UPF // Feb 16th 2009 research questions • 1) How can we evaluate/compare different music recommendation approaches? • 2) How far into the Long Tail do music recommenders reach? • 3) How do users perceive novel (unknown to them), non-obvious recommendations?
  • 8. PhD defense // UPF // Feb 16th 2009 If you like The Beatles you might like ...
  • 9. PhD defense // UPF // Feb 16th 2009
  • 10. PhD defense // UPF // Feb 16th 2009
  • 11. PhD defense // UPF // Feb 16th 2009
  • 12. PhD defense // UPF // Feb 16th 2009 • popularity bias • low novelty ratio
  • 13. PhD defense // UPF // Feb 16th 2009 FACTORS AFFECTING RECOMMENDATIONS: Novelty Relevance Diversity Cold start Coverage Explainability Temporal effects
  • 14. PhD defense // UPF // Feb 16th 2009 FACTORS AFFECTING RECOMMENDATIONS: Novelty Relevance Diversity Cold start Coverage Explainability Temporal effects
  • 15. PhD defense // UPF // Feb 16th 2009 novelty vs. relevance
  • 16. PhD defense // UPF // Feb 16th 2009 how can we measure novelty? • predictive accuracy vs. perceived quality • metrics MAE, RMSE, P/R/F-measure, ...  Test Train Can't measure novelty 
  • 17. PhD defense // UPF // Feb 16th 2009 how can we measure novelty? • predictive accuracy vs. perceived quality • metrics MAE, RMSE, P/R/F-measure, ...  Can measure novelty 
  • 18. PhD defense // UPF // Feb 16th 2009 how can we measure relevance? quot;The key utility measure is user happiness. It seems reasonable to assume that relevance of the results is the most important factor: blindingly fast, useless answers do not make a user happy.quot;  quot;Introduction to Information Retrievalquot; (Manning, Raghavan, and Schutze, 2008)
  • 19. PhD defense // UPF // Feb 16th 2009
  • 20. PhD defense // UPF // Feb 16th 2009 research in music recommendation • Google Scholar Papers that contain “music recommendation” or “music recommender” in the title (Accessed October 1st, 2008)
  • 21. PhD defense // UPF // Feb 16th 2009 research in music recommendation • ISMIR community
  • 22. PhD defense // UPF // Feb 16th 2009 music recommendation approaches • Expert-based • Collaborative filtering • Context-based • Content-based • Hybrid (combination)
  • 23. PhD defense // UPF // Feb 16th 2009 music recommendation approaches • Expert-based AllMusicGuide  Pandora  • Collaborative filtering • Context-based • Content-based • Hybrid (combination)
  • 24. PhD defense // UPF // Feb 16th 2009 music recommendation approaches • Expert-based • Collaborative filtering User-Item matrix  [Resnick, 1994], [Shardanand, 1995], [Sarwar, 2001] • Context-based • Content-based
  • 25. PhD defense // UPF // Feb 16th 2009 music recommendation approaches • Expert-based • Collaborative filtering User-Item matrix  [Resnick, 1994], [Shardanand, 1995], [Sarwar, 2001] Similarity   Cosine  Adj. cosine  Pearson  SVD / NMF: matrix factorization • Context-based • Content-based
  • 26. PhD defense // UPF // Feb 16th 2009 music recommendation approaches • Expert-based • Collaborative filtering User-Item matrix  [Resnick, 1994], [Shardanand, 1995], [Sarwar, 2001] Similarity   Cosine  Adj. cosine  Pearson  SVD / NMF: matrix factorization Prediction (user-based)   Avg. weighted
  • 27. PhD defense // UPF // Feb 16th 2009 music recommendation approaches • Expert-based • Collaborative filtering • Context-based WebMIR  thrash [Schedl, 2008] Content Reviews Lyrics Blogs heavy metal Tags Bios Playlists Social Edgy Weird concert 90s Loud rock [Hu&Downie, 2006] [Celma et al., 2006] [Levy&Sandler, 2007] [Baccigalupo, 2008] [Symeonidis, 2008] • Content-based • Hybrid (combination)
  • 28. PhD defense // UPF // Feb 16th 2009 music recommendation approaches • Expert-based • Collaborative filtering • Context-based • Content-based Audio features   Bag-of-frames (MFCC) [Aucouturier, 2004], Rhythm [Gouyon, 2005], Harmony [Gomez, 2006], ... Similarity   KL-divergence: GMM [Aucouturier, 2002]  EMD [Logan, 2001]  Euclidean: PCA [Cano, 2005]  Cosine: mean/var (feature vectors)  Ad-hoc • Hybrid (combination)
  • 29. PhD defense // UPF // Feb 16th 2009 music recommendation approaches • Expert-based • Collaborative filtering • Context-based • Content-based • Hybrid (combination) Weighted  Cascade  Switching 
  • 30. PhD defense // UPF // Feb 16th 2009 Work done
  • 31. PhD defense // UPF // Feb 16th 2009 contributions
  • 32. PhD defense // UPF // Feb 16th 2009 contributions 1) Network-based evaluation Item Popularity + Complex networks
  • 33. PhD defense // UPF // Feb 16th 2009 contributions 1) Network-based evaluation Item Popularity + Complex networks 2) User-based evaluation
  • 34. PhD defense // UPF // Feb 16th 2009 contributions 1) Network-based evaluation Item Popularity + Complex networks 2) User-based evaluation 3) Systems
  • 35. PhD defense // UPF // Feb 16th 2009 contributions
  • 36. PhD defense // UPF // Feb 16th 2009 contributions
  • 37. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • 3 Artist similarity (directed) networks CF*: Social-based, incl. item-based CF (Last.fm)   “people who listen to X also listen to Y” CB: Content-based Audio similarity   “X and Y sound similar” EX: Human expert-based (AllMusicGuide)   “X similar to (or influenced by) Y”
  • 38. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • 3 Artist similarity (directed) networks CF*: Social-based, incl. item-based CF (Last.fm)   “people who listen to X also listen to Y” CB: Content-based Audio similarity   “X and Y sound similar” EX: Human expert-based (AllMusicGuide)   “X similar to (or influenced by) Y”
  • 39. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Small-world networks [Watts & Strogatz, 1998] Network traverse in a few clicks 
  • 40. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Indegree – avg. neighbor indegree correlation r = Pearson correlation  [Newman, 2002]
  • 41. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Indegree – avg. neighbor indegree correlation
  • 42. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Indegree – avg. neighbor indegree correlation
  • 43. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Indegree – avg. neighbor indegree correlation Kin(Bruce Springsteen)=534 => avg(Kin(sim(Bruce Springsteen)))=463
  • 44. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Indegree – avg. neighbor indegree correlation Kin(Bruce Springsteen)=534 => avg(Kin(sim(Bruce Springsteen)))=463 Kin(Mike Shupp)=14 => avg(Kin(sim(Mike Shupp)))=15
  • 45. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Indegree – avg. neighbor indegree correlation Kin(Bruce Springsteen)=534 => avg(Kin(sim(Bruce Springsteen)))=463 Kin(Mike Shupp)=14 => avg(Kin(sim(Mike Shupp)))=15 Homophily effect!
  • 46. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Indegree – avg. neighbor indegree correlation Last.fm presents assortative mixing (homophily)   Artists with high indegree are connected together, and similarly for low indegree artists
  • 47. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Last.fm is a scale-free network [Barabasi, 2000] power law exponent for the cumulative indegree  distribution [Clauset, 2007] A few artists (hubs) control the network 
  • 48. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • Summary: artist similarity networks |------------|---------|-----|-----------| | | Last.fm | CB | Exp (AMG) | |------------|---------|-----|-----------| |Small World | yes | yes | yes | | | | | | |Ass. mixing | yes | No | No | | | | | | | Scale-free | yes | No | No | |------------|---------|-----|-----------| Last.fm artist similarity network resembles to a social  network (e.g. facebook)
  • 49. PhD defense // UPF // Feb 16th 2009 complex network analysis :: artists • But, still some remaining questions... Are the hubs the most popular artists?  How can we navigate along the Long Tail, using  the artist similarity network?
  • 50. PhD defense // UPF // Feb 16th 2009 contributions Long Tail analysis
  • 51. PhD defense // UPF // Feb 16th 2009 the Long Tail in music • last.fm dataset (~260K artists)
  • 52. PhD defense // UPF // Feb 16th 2009 the Long Tail in music • last.fm dataset (~260K artists) the beatles (50,422,827) radiohead (40,762,895) red hot chili peppers (37,564,100) muse (30,548,064) death cab for cutie (29,335,085) pink floyd (28,081,366) coldplay (27,120,352) metallica (25,749,442)
  • 53. PhD defense // UPF // Feb 16th 2009 the Long Tail model [Kilkki, 2007] • F(x) = Cumulative distribution up to x
  • 54. PhD defense // UPF // Feb 16th 2009 the Long Tail model [Kilkki, 2007] • Top-8 artists: F(8)~ 3.5% of total plays 50,422,827 the beatles 40,762,895 radiohead 37,564,100 red hot chili peppers 30,548,064 muse 29,335,085 death cab for cutie 28,081,366 pink floyd 27,120,352 coldplay 25,749,442 metallica
  • 55. PhD defense // UPF // Feb 16th 2009 the Long Tail model [Kilkki, 2007] • Split the curve in three parts (82 artists) (6,573 artists) (~254K artists)
  • 56. PhD defense // UPF // Feb 16th 2009 contributions + Long Tail analysis
  • 57. PhD defense // UPF // Feb 16th 2009 artist indegree vs. artist popularity • Are the network hubs the most popular artists? ???
  • 58. PhD defense // UPF // Feb 16th 2009 artist indegree vs. artist popularity Last.fm: correlation between Kin and playcounts   r = 0.621
  • 59. PhD defense // UPF // Feb 16th 2009 artist indegree vs. artist popularity Audio CB similarity: no correlation   r = 0.032
  • 60. PhD defense // UPF // Feb 16th 2009 artist indegree vs. artist popularity Expert: correlation between Kin and playcounts   r = 0.475
  • 61. PhD defense // UPF // Feb 16th 2009 navigation along the Long Tail • “From Hits to Niches” # clicks to reach a Tail artist, starting in the Head  how many clicks?
  • 62. PhD defense // UPF // Feb 16th 2009 navigation along the Long Tail • “From Hits to Niches” Audio CB similarity example (VIDEO) 
  • 63. PhD defense // UPF // Feb 16th 2009 navigation along the Long Tail • “From Hits to Niches” Audio CB similarity example   Bruce Springsteen (14,433,411 plays)
  • 64. PhD defense // UPF // Feb 16th 2009 navigation along the Long Tail • “From Hits to Niches” Audio CB similarity example   Bruce Springsteen (14,433,411 plays)  The Rolling Stones (27,720,169 plays)
  • 65. PhD defense // UPF // Feb 16th 2009 navigation along the Long Tail • “From Hits to Niches” Audio CB similarity example   Bruce Springsteen (14,433,411 plays)  The Rolling Stones (27,720,169 plays)  Mike Shupp (577 plays)
  • 66. PhD defense // UPF // Feb 16th 2009 artist similarity vs. artist popularity • navigation in the Long Tail Similar artists, given an artist in the HEAD part:  CF CB EXP 64,74% 60,92% 54,68% 45,32% 33,26% 28,80% (0%) 6,46% 5,82% Head Mid Tail Head Mid Tail Head Mid Tail Also, it can be seen as a Markovian Stochastic  process...
  • 67. PhD defense // UPF // Feb 16th 2009 artist similarity vs. artist popularity • navigation in the Long Tail Markov transition matrix 
  • 68. PhD defense // UPF // Feb 16th 2009 artist similarity vs. artist popularity • navigation in the Long Tail Markov transition matrix 
  • 69. PhD defense // UPF // Feb 16th 2009 artist similarity vs. artist popularity • navigation in the Long Tail Last.fm Markov transition matrix 
  • 70. PhD defense // UPF // Feb 16th 2009 artist similarity vs. artist popularity • navigation in the Long Tail From Head to Tail, with P(T|H) > 0.4  Number of clicks needed   CF : 5  CB : 2  EXP: 2 HEAD #clicks? TAIL
  • 71. PhD defense // UPF // Feb 16th 2009 artist popularity Summary |-----------------------|---------|-----|-----------| | | Last.fm | CB | Exp (AMG) | |-----------------------|---------|-----|-----------| | Indegree / popularity| yes | no | yes | | | | | | |Similarity / popularity| yes | no | no | |-----------------------|---------|-----|-----------|
  • 72. PhD defense // UPF // Feb 16th 2009 summary: complex networks+popularity |-----------------------|---------|-----|-----------| | | Last.fm | CB | Exp (AMG) | |-----------------------|---------|-----|-----------| | Small World | yes | yes | yes | | | | | | | Scale-free | yes | no | no | | | | | | | Ass. mixing | yes | no | no | |-----------------------|---------|-----|-----------| | Indegree / popularity| yes | no | yes | | | | | | |Similarity / popularity| yes | no | no | |-----------------------|---------|-----|-----------| | POPULARITY BIAS | YES | NO | FAIRLY | |-----------------------|---------|-----|-----------|
  • 73. PhD defense // UPF // Feb 16th 2009 contributions 1) Network-based evaluation Item Popularity + Complex networks 2) User-based evaluation 3) Systems
  • 74. PhD defense // UPF // Feb 16th 2009 contribution #2: User-based evaluation • How do users perceive novel, non-obvious recommendations? Survey   288 participants Method: blind music recommendation   no metadata (artist name, song title)  only 30 sec. audio excerpt
  • 75. PhD defense // UPF // Feb 16th 2009 music recommendation survey • 3 approaches: CF: Social-based Last.fm similar tracks  CB: Pure audio content-based similarity  HYbrid: AMG experts + audio CB to rerank songs   (Not a combination of the two previous approaches) • User profile: last.fm, top-10 artists  • Procedure Do you recognize the song?   Yes, Only Artist, Both Artist and Song title Do you like the song?   Rating: [1..5]
  • 76. PhD defense // UPF // Feb 16th 2009
  • 77. PhD defense // UPF // Feb 16th 2009 music recommendation survey: results • Overall results
  • 78. PhD defense // UPF // Feb 16th 2009 music recommendation survey: results • Overall results
  • 79. PhD defense // UPF // Feb 16th 2009 music recommendation survey: results • Familiar recommendations (Artist & Song)
  • 80. PhD defense // UPF // Feb 16th 2009 music recommendation survey: results • Ratings for novel recommendations
  • 81. PhD defense // UPF // Feb 16th 2009 music recommendation survey: results • Ratings for novel recommendations one-way ANOVA within subjects (F=29.13, p<0.05)  Tukey's test (pairwise comparison) 
  • 82. PhD defense // UPF // Feb 16th 2009 music recommendation survey: results • % of novel recommendations
  • 83. PhD defense // UPF // Feb 16th 2009 music recommendation survey: results • % of novel recommendations one-way ANOVA within subjects (F=7,57, p<0.05)  Tukey's test (pairwise comparison) 
  • 84. PhD defense // UPF // Feb 16th 2009 music recommendation survey: results • Novel recommendations Last.fm provides less % of novel songs, but of  higher quality
  • 85. PhD defense // UPF // Feb 16th 2009 contributions 1) Network-based evaluation Item Popularity + Complex networks 2) User-based evaluation 3) Systems
  • 86. PhD defense // UPF // Feb 16th 2009 Why? besides better understanding of music recommendation... Open questions in the State of the Art in music discovery & recommendation: Is it possible to create a music discovery engine exploiting the music content in the WWW? How to build it? How can we describe the available music content? => SearchSounds Is it possible to recommend, filter and personalize music content available on the WWW? How to describe a user profile? What can we recommend beyond similar artists? => FOAFing the Music
  • 87. PhD defense // UPF // Feb 16th 2009 contribution #3: two complete systems • Searchsounds Music search engine   keyword based search  “More like this” (audio CB)
  • 88. PhD defense // UPF // Feb 16th 2009 contribution #3: two complete systems • Searchsounds Crawl MP3 blogs  > 400K songs analyzed 
  • 89. PhD defense // UPF // Feb 16th 2009 contribution #3: two complete systems • Searchsounds Further work: improve song descriptions using   Auto-tagging [Lamere, 2008] [Turnbull, 2007] audio CB similarity [Sordo et al., 2007] tags from the text (music dictionary)  Feedback from the users thumbs-up/down tag audio content
  • 90. PhD defense // UPF // Feb 16th 2009 contribution #3: two complete systems • FOAFing the music Music recommendation   constantly gathering music related info via RSS feeds  It offers: artist recommendation new music releases (iTunes, Amazon, eMusic, Rhapsody, Yahoo! Shopping) album reviews concerts close to user's locations related mp3 blogs and podcasts
  • 91. PhD defense // UPF // Feb 16th 2009 contribution #3: two complete systems • FOAFing the music Integrates different user accounts (circa 2005!)  Semantic Web (FOAF, OWL/RDF) + Web 2.0  2nd prize Semantic Web Challenge (ISWC 2006) 
  • 92. PhD defense // UPF // Feb 16th 2009 contribution #3: two complete systems • FOAFing the music Further work:   Follow Linking Open Data best practices  Link our music recommendation ontology with Music Ontology [Raimond et al., 2007]  (Automatically) add external information from: Myspace Jamendo Garageband ...
  • 93. PhD defense // UPF // Feb 16th 2009 summary of contributions :: research questions • 1) How can we evaluate/compare different music recommendation approaches? • 2) How far into the Long Tail do music recommenders reach? • 3) How do users perceive novel (unknown to them), non-obvious recommendations?
  • 94. PhD defense // UPF // Feb 16th 2009 summary of contributions :: research questions • 1) How can we evaluate/compare different music recommendation approaches? Objective framework comparing music rec.  approaches (CF, CB, EX) using Complex Network analysis Highlights fundamental differences among the  approaches • 2) How far into the Long Tail do music recommenders reach? • 3) How do users perceive novel (unknown to them), non-obvious recommendations?
  • 95. PhD defense // UPF // Feb 16th 2009 summary of contributions :: research questions • 1) How can we evaluate/compare different music recommendation approaches? • 2) How far into the Long Tail do music recommenders reach? Combine 1) with the Long Tail model, and Markov  model theory Highlights differences in terms of discovery and  navigation • 3) How do users perceive novel (unknown to them), non-obvious recommendations?
  • 96. PhD defense // UPF // Feb 16th 2009 summary of contributions :: research questions • 1) How can we evaluate/compare different music recommendation approaches? • 2) How far into the Long Tail do music recommenders reach? • 3) How do users perceive novel (unknown to them), non-obvious recommendations? Survey with 288 participants  Still room to improve novelty (3/5 or less...)   To appreciate novelty users need to understand the recommendations
  • 97. PhD defense // UPF // Feb 16th 2009 summary of contributions :: research questions • 1) How can we evaluate/compare different music recommendation approaches? • 2) How far into the Long Tail do music recommenders reach? • 3) How do users perceive novel (unknown to them), non-obvious recommendations? => Systems that perform best (CF) do not exploit the  Long Tail, and Systems that can ease Long Tail navigation (CB) do  not perform good enough Combine (hybrid) different approaches! 
  • 98. PhD defense // UPF // Feb 16th 2009 Systems that perform  best (CF) do not exploit the Long Tail, and Systems that can ease  Long Tail navigation (CB) do not perform good enough Combine different  approaches!
  • 99. PhD defense // UPF // Feb 16th 2009 summary of contributions :: systems • Furthermore... 2 web systems that improved existing State of the  Art work in music discovery and recommendation  Searchsounds: music search engine exploiting music related content in the WWW  FOAFing the Music: music recommender based on a FOAF user profile, also offering a number of extra features to complement the recommendations
  • 100. PhD defense // UPF // Feb 16th 2009 further work :: limitations • 1) How can we evaluate/compare different recommendations approaches? Dynamic networks  [Leskovec, 2008]  track item similarity over time  track user's taste over time  trend and hype detection
  • 101. PhD defense // UPF // Feb 16th 2009 further work :: limitations • 2) How far into the Long Tail do recommendation algorithms reach? Intercollections  how to detect bad quality music in the tail? 
  • 102. PhD defense // UPF // Feb 16th 2009 further work :: limitations • 3) How do users perceive novel, non-obvious recommendations?  User understanding [Jennings, 2007]  savant, enthusiast, casual, indifferent Transparent, steerable recommendations  [Lamere & Maillet, 2008]  Why? as important as What?
  • 103. PhD defense // UPF // Feb 16th 2009 summary: articles • #1) Network-based evaluation for RS  O. Celma and P. Cano. “From hits to niches? or how popular artists can bias music recommendation and discovery”. ACM KDD, 2008.  J. Park, O. Celma, M. Koppenberger, P. Cano, and J. M. Buldu. “The social network of contemporary popular musicians”. Journal of Bifurcation and Chaos (IJBC), 17:2281–2288, 2007.  M. Zanin, P. Cano, J. M. Buldu, and O. Celma. “Complex networks in recommendation systems”. WSEAS, 2008  P. Cano, O. Celma, M. Koppenberger, and J. M. Buldu “Topology of music recommendation networks”. Journal Chaos (16), 2006. • #2) User-based evaluation for RS  O. Celma and P. Herrera. “A new approach to evaluating novel recommendations”. ACM RecSys, 2008.
  • 104. PhD defense // UPF // Feb 16th 2009 summary: articles • #3) Prototypes FOAFing the Music   O. Celma and X. Serra. “FOAFing the music: Bridging the semantic gap in music recommendation”. Journal of Web Semantics, 6(4):250–256, 2008.  O. Celma. “FOAFing the music”. 2nd Prize Semantic Web Challenge ISWC, 2006.  O. Celma, M. Ramirez, and P. Herrera. “FOAFing the music: A music recommendation system based on rss feeds and user preferences”. ISMIR, 2005.  O. Celma, M. Ramirez, and P. Herrera. “Getting music recommendations and filtering newsfeeds from foaf descriptions”. Scripting for the Semantic Web, ESWC, 2005.
  • 105. PhD defense // UPF // Feb 16th 2009 summary: articles • #3) Prototypes Searchsounds   O. Celma, P. Cano, and P. Herrera. “Search sounds: An audio crawler focused on weblogs”. ISMIR, 2006.  V. Sandvold, T. Aussenac, O. Celma, and P. Herrera. “Good vibrations: Music discovery through personal musical concepts”. ISMIR, 2006.  M. Sordo, C. Laurier, and O. Celma. “Annotating music collections: how content-based similarity helps to propagate labels”. ISMIR, 2007.
  • 106. PhD defense // UPF // Feb 16th 2009 summary: articles • Misc. (mainly MM semantics)  R. Garcia C. Tsinaraki, O. Celma, and S. Christodoulakis. “Multimedia Content Description using Semantic Web Languages” book, Chapter 2. Springer–Verlag, 2008.  O. Celma and Y. Raimond. “Zempod: A semantic web approach to podcasting”. Journal of Web Semantics, 6(2):162–169, 2008.  S. Boll, T. Burger, O. Celma, C. Halaschek-Wiener, E. Mannens. “Multimedia vocabularies on the Semantic Web”. W3C Technical report, 2007.  O. Celma, P. Herrera, and X. Serra. “Bridging the music semantic gap”. SAMT, 2006.  R. Garcia and O. Celma. “Semantic integration and retrieval of multimedia metadata”. ESWC, 2005
  • 107. PhD defense // UPF // Feb 16th 2009 summary: articles  R. Troncy, O. Celma, S. Little, R. Garcia and C. Tsinaraki. “MPEG-7 based multimedia ontologies: Interoperability support or interoperability issue?” MARESO, 2007.  M. Sordo, O. Celma, M. Blech, and E. Guaus. “The quest for musical genres: Do the experts and the wisdom of crowds agree?”. ISMIR, 2008. • Music Recommendation Tutorials -- with Paul Lamere  ACM MM, 2008 (Vancouver, Canada)  ISMIR, 2007 (Vienna, Austria)  MICAI, 2007 (Aguascalientes, Mexico)
  • 108. PhD defense // UPF // Feb 16th 2009 summary: dissemination • PhD Webpage http://mtg.upf.edu/~ocelma/PhD   PDF  Source code Long Tail Model in R  References Citeulike  Related links delicious
  • 109. PhD defense // UPF // Feb 16th 2009 acknowledgments NB: The complete list of acknowledgments can be found in the document
  • 110. Music Recommendation and Discovery in the Long Tail Òscar Celma Doctoral Thesis Defense (Music Technology Group ~ Universitat Pompeu Fabra)