SlideShare a Scribd company logo
Music retrieval technique
Shital Katkar
132011005
Index
What is QBH?
Basic Architecture
Application
Challenges
File Formats
System Architecture
Parson code algorithm
 Benchmarking MIR System
•“I don’t know the name. I don’t know who does it.
•But I can’t get this song out of my head.”
•Well, why not just hum it.
QBH System
Query By Humming
Basic Architecture
Microphone Extraction Transcription Comparison Result List
DB
Fig- Basic System Architecture
Applications
Shazam
•identify pre-recorded music being
broadcast from any source, such as a
radio, television
Applications
Sound Hound
•identify music by humming, singing
or playing a recorded track
Applications
Midomi
•identify music by
humming, singing or
playing a recorded track
Applications
Musipedia
identify music by whistling
a theme, playing it on a
virtual piano keyboard,
tapping the rhythm on
the computer keyboard
Challenges
• Users may not make perfect queries.
• Accurately capturing pitches and notes from user hums is
difficult, even if the user manages to submit a perfect
query.
• Similarly, accurately capturing melodic information from a
pre-recorded music file is difficult.
File Formats
Wav File Format
short form of the Wave Audio File Format
the most common use is to store an uncompressed audio
 quite large in size
first generation files of high quality
File Formats
MIDI File Format
Musical Instrument Digital Interface
MIDI files are not exactly the same as the typical digital audio formats we
use (like WAV, MP3, MP4 etc.)
a MIDI is made up of information that describes what musical notes are to
be played
MIDI Files therefore do not contain any 'real world' recordings
System Architecture
Parson Code
Algorithm
Algorithm
A note in the input is classified in one of three ways
1. U = "up," if the note is higher than the previous
note
2. D = "down," if the note is lower than the
previous note
3. r = "repeat," if the note is the same pitch as the
previous note
4. * = first tone as reference
Textual Pattern
C C G G GA A
U r
F F E E D D C
D r D r D r D* rUr D
72 72 79 79
81 81 79 77 77 76 76 7274 74
Query By humming - Music retrieval technology
Introduction
Music Information Retrieval (MIR)
efficient content-based searching
retrieval of musical information
should be easily operated by users
should be controlled by a simple-to-use graphical 'musical' interface
MIR System
Problem Definition
Lot of MIR System
All have the same task- to enable users to search for music
Very few systems that are actually publicly accessible and comparable
Some System works only with MIDI representation, some with transcriptions
Each system has a different set of files available in its database
Music Information
Retrieval Methods
MIR Systems may be divided into two categories
1. those that search symbolic representations of music
MIDI files or Common Music Notation (CMN)
2. those that search raw audio files
WAV or mp3 file format
Symbolic representations
 consist of a list of instructions as to how the piece should be played
include the notes, when and for how long each is played
Typical Query –
Involve a search for files with a given sequence of notes
List of MIDI files from database
Music Information
Retrieval Methods
raw audio files
digital representations of an actual recording
contain a level of complexity that is not found in the symbolic representations
composition is contaminated by noise
Music Information
Retrieval Methods
Two Approaches
Extraction
involves finding certain features, such as the mean and variance of
audio signal
Transcription
convert the query into a symbolic representation
Music Information
Retrieval Methods
Online MIR Systems
CatFind
MelDex
MelodyHound
ThemeFinder
Music Retrieval Demo
CatFind
 search MIDI files using either a musical transcription or a melodic profile
based on the Parson’s Code
It has minimal features
intended primarily for demonstration
Online MIR Systems
MelDex
 MELody inDEX
allows searching of the New Zealand Digital Library
designed to retrieve melodies from a database on the basis of a few
notes sung into a microphone
It accepts acoustic input from the user, transcribes it into common music
notation, then searches a database for tunes that contain the sung
pattern, or patterns similar to it.
Retrieval is ranked according to the closeness of the match
Online MIR Systems
MelodyHound
developed by Rainer Typke in 1997
originally known as "Tuneserver"
It searches directly on the Parsons Code
was designed initially for Query By Whistling
return the song in the database that most closely matches
Online MIR Systems
Themefinder
created by David Huron
allows one to identify common themes in Western classical
music, Folksongs of the sixteenth century
Online MIR Systems
Music Retrieval Demo
 performs similarity searches on raw audio data (WAV files)
No transcription of any kind is applied
It works by calculating the distance between the selected file and all
other files in the database
Online MIR Systems
Comparison Of MIR Systems
Evaluation Issues
The coverage of the collection, that is, the extent to which the system
includes relevant matter.
The time lag, that is, the average interval between the time the search
request is made and the time an answer is given.
The recall of the system, that is, the proportion of relevant material actually
retrieved in answer to a search request
The precision of the system, that is, the proportion of retrieved material
that is actually relevant.
Conclusion
In this work, we have laid down a framework for benchmarking of future
MIR systems. There are only a handful of MIR systems available
online, each of which is quite limited in scope. Still, these benchmarking
techniques were applied to five online systems. Proposals were made
concerning future benchmarking of full online audio retrieval systems. It is
hoped that these recommendations will be considered and expanded
upon as such systems become available.
Thank you

More Related Content

Query By humming - Music retrieval technology

  • 2. Index What is QBH? Basic Architecture Application Challenges File Formats System Architecture Parson code algorithm  Benchmarking MIR System
  • 3. •“I don’t know the name. I don’t know who does it. •But I can’t get this song out of my head.” •Well, why not just hum it. QBH System Query By Humming
  • 4. Basic Architecture Microphone Extraction Transcription Comparison Result List DB Fig- Basic System Architecture
  • 5. Applications Shazam •identify pre-recorded music being broadcast from any source, such as a radio, television
  • 6. Applications Sound Hound •identify music by humming, singing or playing a recorded track
  • 7. Applications Midomi •identify music by humming, singing or playing a recorded track
  • 8. Applications Musipedia identify music by whistling a theme, playing it on a virtual piano keyboard, tapping the rhythm on the computer keyboard
  • 9. Challenges • Users may not make perfect queries. • Accurately capturing pitches and notes from user hums is difficult, even if the user manages to submit a perfect query. • Similarly, accurately capturing melodic information from a pre-recorded music file is difficult.
  • 10. File Formats Wav File Format short form of the Wave Audio File Format the most common use is to store an uncompressed audio  quite large in size first generation files of high quality
  • 11. File Formats MIDI File Format Musical Instrument Digital Interface MIDI files are not exactly the same as the typical digital audio formats we use (like WAV, MP3, MP4 etc.) a MIDI is made up of information that describes what musical notes are to be played MIDI Files therefore do not contain any 'real world' recordings
  • 13. Parson Code Algorithm Algorithm A note in the input is classified in one of three ways 1. U = "up," if the note is higher than the previous note 2. D = "down," if the note is lower than the previous note 3. r = "repeat," if the note is the same pitch as the previous note 4. * = first tone as reference
  • 14. Textual Pattern C C G G GA A U r F F E E D D C D r D r D r D* rUr D 72 72 79 79 81 81 79 77 77 76 76 7274 74
  • 16. Introduction Music Information Retrieval (MIR) efficient content-based searching retrieval of musical information should be easily operated by users should be controlled by a simple-to-use graphical 'musical' interface
  • 17. MIR System Problem Definition Lot of MIR System All have the same task- to enable users to search for music Very few systems that are actually publicly accessible and comparable Some System works only with MIDI representation, some with transcriptions Each system has a different set of files available in its database
  • 18. Music Information Retrieval Methods MIR Systems may be divided into two categories 1. those that search symbolic representations of music MIDI files or Common Music Notation (CMN) 2. those that search raw audio files WAV or mp3 file format
  • 19. Symbolic representations  consist of a list of instructions as to how the piece should be played include the notes, when and for how long each is played Typical Query – Involve a search for files with a given sequence of notes List of MIDI files from database Music Information Retrieval Methods
  • 20. raw audio files digital representations of an actual recording contain a level of complexity that is not found in the symbolic representations composition is contaminated by noise Music Information Retrieval Methods
  • 21. Two Approaches Extraction involves finding certain features, such as the mean and variance of audio signal Transcription convert the query into a symbolic representation Music Information Retrieval Methods
  • 23. CatFind  search MIDI files using either a musical transcription or a melodic profile based on the Parson’s Code It has minimal features intended primarily for demonstration Online MIR Systems
  • 24. MelDex  MELody inDEX allows searching of the New Zealand Digital Library designed to retrieve melodies from a database on the basis of a few notes sung into a microphone It accepts acoustic input from the user, transcribes it into common music notation, then searches a database for tunes that contain the sung pattern, or patterns similar to it. Retrieval is ranked according to the closeness of the match Online MIR Systems
  • 25. MelodyHound developed by Rainer Typke in 1997 originally known as "Tuneserver" It searches directly on the Parsons Code was designed initially for Query By Whistling return the song in the database that most closely matches Online MIR Systems
  • 26. Themefinder created by David Huron allows one to identify common themes in Western classical music, Folksongs of the sixteenth century Online MIR Systems
  • 27. Music Retrieval Demo  performs similarity searches on raw audio data (WAV files) No transcription of any kind is applied It works by calculating the distance between the selected file and all other files in the database Online MIR Systems
  • 28. Comparison Of MIR Systems
  • 29. Evaluation Issues The coverage of the collection, that is, the extent to which the system includes relevant matter. The time lag, that is, the average interval between the time the search request is made and the time an answer is given. The recall of the system, that is, the proportion of relevant material actually retrieved in answer to a search request The precision of the system, that is, the proportion of retrieved material that is actually relevant.
  • 30. Conclusion In this work, we have laid down a framework for benchmarking of future MIR systems. There are only a handful of MIR systems available online, each of which is quite limited in scope. Still, these benchmarking techniques were applied to five online systems. Proposals were made concerning future benchmarking of full online audio retrieval systems. It is hoped that these recommendations will be considered and expanded upon as such systems become available.

Editor's Notes

  1. When we don’t knw the songs name. We don’t knw who does the song. But that perticular song cannot goes out of our mind. In this situation QBH is a powerfull to for serching that song.
  2. The difference bet MP3 and wav is mp3 is compressed while wav file in uncompressed file.
  3. The difference bet MP3 and wav is mp3 is compressed while wav file in uncompressed file.
  4. Now you have some knowlegde of this notes and you knwo the parson code rules.So we will convert this song into a textuual pattern.The song is Twinkle twinkle little star. Notations are CC GG AA GFirst note is C . 72nd note We will make it as reference note. And put the *Second note is also C  Since it is repeating, we will put RNext is G  G note is upper than C so we will put U – U for upperFor second G  We put R
  5. Goal of this paper is to create an accurate and effective benchmarking system for music information retrieval (MIR) systems. This will serve the multiple purposes of inspiring the MIR community to add additional features and increased speed into existing projects, and to measure the performance of their work and incorporate the ideas of other works. To date, there has been no systematic rigorous review of the field, and thus there is little knowledge of when an MIR implementation might fail in a real world setting. Benchmarking MIR systems is currently hindered by the diversity of the systems, by their relatively new and unrefined nature, and by the limited number of accessible systems. Thus most of what will be described here will be introductory and will lay down the framework for future benchmarking and analysis. Particular attention will be paid to the evaluation issues surrounding retrieval of audio in test collections.
  6. The Music Information Retrieval (MIR) field is primarily concerned with efficient content-based searching and retrieval of musical information from online databases. The musical data may be stored in a variety of formats ranging from encoded scores to digital audio. MIR systems should be easily operated by users with a wide range of musical ability and understanding and should be controlled by a simple-to-use graphical 'musical' interface, both for search queries and for the presentation of results.
  7. There are a lot of MIR systems in various stages of development. These systems all have the same task- to enable users to search for music in a database. But there are very few systems that are actually publicly accessible and comparable, To date, there has been no formal analysis or quantitative comparison methodology (benchmark) of the available preliminary MIR systems. Some systems work only with MIDI representations, some with monophonic transcriptions, and some with scores. In addition, each system has a different set of files available in its database. Foreg, in MIDOMI.com it does not have bollywood songs To date, there is no online, publicly available system, that attempts to search for music based on polyphonic transcriptions. Thus, one goal of this work is to find ways by which these different systems can be compared. A benchmarking of MIR search engines will also provide an effective measure of the progress in the field.
  8. The symbolic representations typically consist of MIDI files or Common Music Notation (CMN)The raw audio files are typically WAV or mp3 file format.
  9. The symbolic representations usually consist of a list of instructions as to how the piece should be played. These include the notes, when and for how long each is played, the dynamics and the instruments that should be used. Other symbolic representations that may be searched include piano rolls and Parsons notation[3]. A typical query may involve a search for files with a given sequence of notes, and might produce a list of MIDI files from a database. Such queries are pertinent to musicians and musicologists who have a knowledge of musical representations.
  10. Essentially, they are digital representations of an actual recording. Thus they contain a level of complexity that is not found in the symbolic representations. The composition is contaminated by noise and incorporates slight variations in the timing and dynamics of the notes. By comparison, symbolic representations are ambiguous, since they often leave certain characteristics of the piece unspecified. Thus, two performances may have the same MIDI or CMN representations, but differ notably in their audio files.
  11. MIR systems that operate on audio files have followed two approaches, feature extraction[5] and transcription[6]. Feature extraction involves finding certain features, such as the mean and variance that typifies the audio or a portion thereof. The query and all files in the database are classified in terms of these parameters. Retrieval systems then operate as multidimensional searches on these parameters. Fast search methods have been described for such systems.[7] No attempt is made to relate these features to the musical qualities they might represent, e.g., energy to loudness, frequency to pitch. Transcription based raw audio MIR systems convert the query into a symbolic representation, and seek to match it against symbolic representations of the audio files in the database. Thus such a technique typically uses feature extraction as well, but then has an intermediate step attempting to relate these features to a description of the notes and instruments. This is an exceedingly difficult to task, and to date, no system achieves this effectively and accurately over a wide range of music.
  12. For the purposes of this work, we considered five online MIR systems. The systems considered all have certain properties in common. They may all be used online via the World Wide Web. They all are used by entering a query concerning a piece of music, and all may return information about music that matches that query. However, these systems differ greatly in their features, goals and implementation. These differences are discussed in detail below.
  13. CatFind[13] allows one to search MIDI files using either a musical transcription or a melodic profile based on the Parson’s Code. It has minimal features, and was intended primarily for demonstration. Although it seems unlikely that this system will be extended, it is still useful here as a system for comparison.
  14. This allows searching of the New Zealand Digital Library. Ie it only recongnize songs of new zealandcontry. The MELodyinDEX system[14, 15] is designed to retrieve melodies from a database on the basis of a few notes sung into a microphone. It accepts acoustic input from the user, transcribes it into common music notation, then searches a database for tunes that contain the sung pattern, or patterns similar to it.Thus the query is audio although the retrieved files are in symbolic representation. Retrieval is ranked according to the closeness of the match. A variety of different mechanisms are provided to control the search, depending on the precision of the input.
  15. MelodyHoundThis melody recognition system[16] was developed by Rainer Typke in 1997. It was originally known as "Tuneserver" and hosted by the university of Karlsruhe. It searches directly on the Parsons Code and was designed initially for Query By Whistling. That is, it will return the song in the database that most closely matches a whistled query.
  16. Themefinder[17], created by David Huron, et. al.,[18] allows one to identify common themes in Western classical music, Folksongs, and latin Motets of the sixteenth century. Themefinder provides a web-based interface to the Humdrum thema command[19], which in turn allows searching of databases containing musical themes or incipits (opening note sequences). Themes and incipits available through Themefinder are first encoded in the kern music data format. Groups of incipits are assembled into databases. Currently there are three databases: Classical Instrumental Music, European Folksongs, and Latin Motets from the sixteenth century. Matched themes are displayed on-screen in graphical notation.
  17. Music Retrieval Demo The Music Retrieval Demo[20] is notably different from the other MIR systems considered herein. The Music Retrieval Demo performs similarity searches on raw audio data (WAV files). No transcription of any kind is applied. It works by calculating the distance between the selected file and all other files in the database. The other files can then be displayed in a list ranked by their similarity, such that the more similar files are nearer the top. Distances are computed between templates, which are representations of the audio files, not the audio itself. The waveform is Hamming-windowed into overlapping segments; each segment is processed into a spectral representation of Mel- frequency cepstral coefficients. This is a data-reducing transformation that replaces each 20ms window with 12 cepstral coefficients plus an energy term, yielding a 13-valued vector. The next step is to quantize each vector using a specially- designed quantization tree. This recursively divides the vector space into bins, each of which corresponds to a leaf of the tree. Any MFCC vector will fall into one and only one bin. Given a segment of audio, the distribution of the vectors in the various bins characterize that audio. Counting how many vectors fall into each bin yields a histogram template that is used in the distance measure. For this demonstration, the distance between audio files is the simple Euclidean distance between their corresponding templates (or rather 1 minus the distance, so closer files have larger scores). Once scores have been computed for each audio clip, they are sorted by magnitude to produce a ranked list like other search engines.
  18. In Table 1, we present a comparison of the features of the various MIR systems under investigation. Note first that each of these systems was designed for a different purpose, and none of them can be considered a finished product. This table allows one to get an overview of the state of the MIR systems available., the features that one may wish to include in an MIR system, and the areas where improvement is most necessary. It also highlights the need for a standardized testbed. Each of the MIR systems use a different database of files for audio retrieval. Both CatFind and the Music Retrieval Demo have databases with less than 500 files. Thus, any benchmarking estimates, such as retrieval times and efficiency, are rendered useless. MelDex, MelodyHound and ThemeFinder have databases containing over 10,000 files. This should be sufficient for estimating search efficiency and slalability.
  19. In this work, we have laid down a framework for benchmarking of future MIR systems. There are only a handful of MIR systems available online, each of which is quite limited in scope. Still, these benchmarking techniques were applied to five online systems. Proposals were made concerning future benchmarking of full online audio retrieval systems. It is hoped that these recommendations will be considered and expanded upon as such systems become available.