Query By humming - Music retrieval technology
- 2. Index
What is QBH?
Basic Architecture
Application
Challenges
File Formats
System Architecture
Parson code algorithm
Benchmarking MIR System
- 3. •“I don’t know the name. I don’t know who does it.
•But I can’t get this song out of my head.”
•Well, why not just hum it.
QBH System
Query By Humming
- 9. Challenges
• Users may not make perfect queries.
• Accurately capturing pitches and notes from user hums is
difficult, even if the user manages to submit a perfect
query.
• Similarly, accurately capturing melodic information from a
pre-recorded music file is difficult.
- 10. File Formats
Wav File Format
short form of the Wave Audio File Format
the most common use is to store an uncompressed audio
quite large in size
first generation files of high quality
- 11. File Formats
MIDI File Format
Musical Instrument Digital Interface
MIDI files are not exactly the same as the typical digital audio formats we
use (like WAV, MP3, MP4 etc.)
a MIDI is made up of information that describes what musical notes are to
be played
MIDI Files therefore do not contain any 'real world' recordings
- 13. Parson Code
Algorithm
Algorithm
A note in the input is classified in one of three ways
1. U = "up," if the note is higher than the previous
note
2. D = "down," if the note is lower than the
previous note
3. r = "repeat," if the note is the same pitch as the
previous note
4. * = first tone as reference
- 14. Textual Pattern
C C G G GA A
U r
F F E E D D C
D r D r D r D* rUr D
72 72 79 79
81 81 79 77 77 76 76 7274 74
- 16. Introduction
Music Information Retrieval (MIR)
efficient content-based searching
retrieval of musical information
should be easily operated by users
should be controlled by a simple-to-use graphical 'musical' interface
- 17. MIR System
Problem Definition
Lot of MIR System
All have the same task- to enable users to search for music
Very few systems that are actually publicly accessible and comparable
Some System works only with MIDI representation, some with transcriptions
Each system has a different set of files available in its database
- 18. Music Information
Retrieval Methods
MIR Systems may be divided into two categories
1. those that search symbolic representations of music
MIDI files or Common Music Notation (CMN)
2. those that search raw audio files
WAV or mp3 file format
- 19. Symbolic representations
consist of a list of instructions as to how the piece should be played
include the notes, when and for how long each is played
Typical Query –
Involve a search for files with a given sequence of notes
List of MIDI files from database
Music Information
Retrieval Methods
- 20. raw audio files
digital representations of an actual recording
contain a level of complexity that is not found in the symbolic representations
composition is contaminated by noise
Music Information
Retrieval Methods
- 23. CatFind
search MIDI files using either a musical transcription or a melodic profile
based on the Parson’s Code
It has minimal features
intended primarily for demonstration
Online MIR Systems
- 24. MelDex
MELody inDEX
allows searching of the New Zealand Digital Library
designed to retrieve melodies from a database on the basis of a few
notes sung into a microphone
It accepts acoustic input from the user, transcribes it into common music
notation, then searches a database for tunes that contain the sung
pattern, or patterns similar to it.
Retrieval is ranked according to the closeness of the match
Online MIR Systems
- 25. MelodyHound
developed by Rainer Typke in 1997
originally known as "Tuneserver"
It searches directly on the Parsons Code
was designed initially for Query By Whistling
return the song in the database that most closely matches
Online MIR Systems
- 26. Themefinder
created by David Huron
allows one to identify common themes in Western classical
music, Folksongs of the sixteenth century
Online MIR Systems
- 27. Music Retrieval Demo
performs similarity searches on raw audio data (WAV files)
No transcription of any kind is applied
It works by calculating the distance between the selected file and all
other files in the database
Online MIR Systems
- 29. Evaluation Issues
The coverage of the collection, that is, the extent to which the system
includes relevant matter.
The time lag, that is, the average interval between the time the search
request is made and the time an answer is given.
The recall of the system, that is, the proportion of relevant material actually
retrieved in answer to a search request
The precision of the system, that is, the proportion of retrieved material
that is actually relevant.
- 30. Conclusion
In this work, we have laid down a framework for benchmarking of future
MIR systems. There are only a handful of MIR systems available
online, each of which is quite limited in scope. Still, these benchmarking
techniques were applied to five online systems. Proposals were made
concerning future benchmarking of full online audio retrieval systems. It is
hoped that these recommendations will be considered and expanded
upon as such systems become available.
Editor's Notes
- When we don’t knw the songs name. We don’t knw who does the song. But that perticular song cannot goes out of our mind. In this situation QBH is a powerfull to for serching that song.
- The difference bet MP3 and wav is mp3 is compressed while wav file in uncompressed file.
- The difference bet MP3 and wav is mp3 is compressed while wav file in uncompressed file.
- Now you have some knowlegde of this notes and you knwo the parson code rules.So we will convert this song into a textuual pattern.The song is Twinkle twinkle little star. Notations are CC GG AA GFirst note is C . 72nd note We will make it as reference note. And put the *Second note is also C Since it is repeating, we will put RNext is G G note is upper than C so we will put U – U for upperFor second G We put R
- Goal of this paper is to create an accurate and effective benchmarking system for music information retrieval (MIR) systems. This will serve the multiple purposes of inspiring the MIR community to add additional features and increased speed into existing projects, and to measure the performance of their work and incorporate the ideas of other works. To date, there has been no systematic rigorous review of the field, and thus there is little knowledge of when an MIR implementation might fail in a real world setting. Benchmarking MIR systems is currently hindered by the diversity of the systems, by their relatively new and unrefined nature, and by the limited number of accessible systems. Thus most of what will be described here will be introductory and will lay down the framework for future benchmarking and analysis. Particular attention will be paid to the evaluation issues surrounding retrieval of audio in test collections.
- The Music Information Retrieval (MIR) field is primarily concerned with efficient content-based searching and retrieval of musical information from online databases. The musical data may be stored in a variety of formats ranging from encoded scores to digital audio. MIR systems should be easily operated by users with a wide range of musical ability and understanding and should be controlled by a simple-to-use graphical 'musical' interface, both for search queries and for the presentation of results.
- There are a lot of MIR systems in various stages of development. These systems all have the same task- to enable users to search for music in a database. But there are very few systems that are actually publicly accessible and comparable, To date, there has been no formal analysis or quantitative comparison methodology (benchmark) of the available preliminary MIR systems. Some systems work only with MIDI representations, some with monophonic transcriptions, and some with scores. In addition, each system has a different set of files available in its database. Foreg, in MIDOMI.com it does not have bollywood songs To date, there is no online, publicly available system, that attempts to search for music based on polyphonic transcriptions. Thus, one goal of this work is to find ways by which these different systems can be compared. A benchmarking of MIR search engines will also provide an effective measure of the progress in the field.
- The symbolic representations typically consist of MIDI files or Common Music Notation (CMN)The raw audio files are typically WAV or mp3 file format.
- The symbolic representations usually consist of a list of instructions as to how the piece should be played. These include the notes, when and for how long each is played, the dynamics and the instruments that should be used. Other symbolic representations that may be searched include piano rolls and Parsons notation[3]. A typical query may involve a search for files with a given sequence of notes, and might produce a list of MIDI files from a database. Such queries are pertinent to musicians and musicologists who have a knowledge of musical representations.
- Essentially, they are digital representations of an actual recording. Thus they contain a level of complexity that is not found in the symbolic representations. The composition is contaminated by noise and incorporates slight variations in the timing and dynamics of the notes. By comparison, symbolic representations are ambiguous, since they often leave certain characteristics of the piece unspecified. Thus, two performances may have the same MIDI or CMN representations, but differ notably in their audio files.
- MIR systems that operate on audio files have followed two approaches, feature extraction[5] and transcription[6]. Feature extraction involves finding certain features, such as the mean and variance that typifies the audio or a portion thereof. The query and all files in the database are classified in terms of these parameters. Retrieval systems then operate as multidimensional searches on these parameters. Fast search methods have been described for such systems.[7] No attempt is made to relate these features to the musical qualities they might represent, e.g., energy to loudness, frequency to pitch. Transcription based raw audio MIR systems convert the query into a symbolic representation, and seek to match it against symbolic representations of the audio files in the database. Thus such a technique typically uses feature extraction as well, but then has an intermediate step attempting to relate these features to a description of the notes and instruments. This is an exceedingly difficult to task, and to date, no system achieves this effectively and accurately over a wide range of music.
- For the purposes of this work, we considered five online MIR systems. The systems considered all have certain properties in common. They may all be used online via the World Wide Web. They all are used by entering a query concerning a piece of music, and all may return information about music that matches that query. However, these systems differ greatly in their features, goals and implementation. These differences are discussed in detail below.
- CatFind[13] allows one to search MIDI files using either a musical transcription or a melodic profile based on the Parson’s Code. It has minimal features, and was intended primarily for demonstration. Although it seems unlikely that this system will be extended, it is still useful here as a system for comparison.
- This allows searching of the New Zealand Digital Library. Ie it only recongnize songs of new zealandcontry. The MELodyinDEX system[14, 15] is designed to retrieve melodies from a database on the basis of a few notes sung into a microphone. It accepts acoustic input from the user, transcribes it into common music notation, then searches a database for tunes that contain the sung pattern, or patterns similar to it.Thus the query is audio although the retrieved files are in symbolic representation. Retrieval is ranked according to the closeness of the match. A variety of different mechanisms are provided to control the search, depending on the precision of the input.
- MelodyHoundThis melody recognition system[16] was developed by Rainer Typke in 1997. It was originally known as "Tuneserver" and hosted by the university of Karlsruhe. It searches directly on the Parsons Code and was designed initially for Query By Whistling. That is, it will return the song in the database that most closely matches a whistled query.
- Themefinder[17], created by David Huron, et. al.,[18] allows one to identify common themes in Western classical music, Folksongs, and latin Motets of the sixteenth century. Themefinder provides a web-based interface to the Humdrum thema command[19], which in turn allows searching of databases containing musical themes or incipits (opening note sequences). Themes and incipits available through Themefinder are first encoded in the kern music data format. Groups of incipits are assembled into databases. Currently there are three databases: Classical Instrumental Music, European Folksongs, and Latin Motets from the sixteenth century. Matched themes are displayed on-screen in graphical notation.
- Music Retrieval Demo The Music Retrieval Demo[20] is notably different from the other MIR systems considered herein. The Music Retrieval Demo performs similarity searches on raw audio data (WAV files). No transcription of any kind is applied. It works by calculating the distance between the selected file and all other files in the database. The other files can then be displayed in a list ranked by their similarity, such that the more similar files are nearer the top. Distances are computed between templates, which are representations of the audio files, not the audio itself. The waveform is Hamming-windowed into overlapping segments; each segment is processed into a spectral representation of Mel- frequency cepstral coefficients. This is a data-reducing transformation that replaces each 20ms window with 12 cepstral coefficients plus an energy term, yielding a 13-valued vector. The next step is to quantize each vector using a specially- designed quantization tree. This recursively divides the vector space into bins, each of which corresponds to a leaf of the tree. Any MFCC vector will fall into one and only one bin. Given a segment of audio, the distribution of the vectors in the various bins characterize that audio. Counting how many vectors fall into each bin yields a histogram template that is used in the distance measure. For this demonstration, the distance between audio files is the simple Euclidean distance between their corresponding templates (or rather 1 minus the distance, so closer files have larger scores). Once scores have been computed for each audio clip, they are sorted by magnitude to produce a ranked list like other search engines.
- In Table 1, we present a comparison of the features of the various MIR systems under investigation. Note first that each of these systems was designed for a different purpose, and none of them can be considered a finished product. This table allows one to get an overview of the state of the MIR systems available., the features that one may wish to include in an MIR system, and the areas where improvement is most necessary. It also highlights the need for a standardized testbed. Each of the MIR systems use a different database of files for audio retrieval. Both CatFind and the Music Retrieval Demo have databases with less than 500 files. Thus, any benchmarking estimates, such as retrieval times and efficiency, are rendered useless. MelDex, MelodyHound and ThemeFinder have databases containing over 10,000 files. This should be sufficient for estimating search efficiency and slalability.
- In this work, we have laid down a framework for benchmarking of future MIR systems. There are only a handful of MIR systems available online, each of which is quite limited in scope. Still, these benchmarking techniques were applied to five online systems. Proposals were made concerning future benchmarking of full online audio retrieval systems. It is hoped that these recommendations will be considered and expanded upon as such systems become available.