Optical Music Recognition Virtual Demo

Version 2.1, September 30, 2002

This page is a virtual guided tour of the Optical Music Recognition system being developed by the Levy Digital Sheet Music Project.

This demonstration provides only a basic overview of the process. For more detailed technical information, please see the references on The Gamera webpage.

Introduction

Optical music recognition (OMR) allows pages of sheet music to be interpreted by a computer. Each page is first input using a flat-bed scanner or digital camera to retreive a digital image. The OMR system then recognizes the individual symbols on the page and interprets their musical meaning. The results are stored in a musical representation language known as GUIDO.

Having a musical representation of the score allows one to do things that would not be possible with the raw graphical image alone. For example, the music can be played on a MIDI synthesiser. Large quantities of music can be stored in a database and then retrieved using a music search engine or analysed with automatic musical analysis tools.

The cover page of our example

Guided Tour

Source image

For the guided tour, we will use an excerpt from "I dreamt my little boy of thee", by Frank W. Green and Alfred Lee. You can see the entire score and associated information on the Levy collection website. The original image, obtained using a digital camera, is shown below.

Original image

Identification of staff lines

One of the difficulties of separating musical symbols using a computer is that the vast majority of them are connected by staff lines. To make it easier to separate the individual symbols, the staff lines are identified and removed. The following image shows the excerpt with the staff lines removed.

Original
    image with staff lines removed

Removal and recognition of text

Next, the text, such as lyrics, titles and tempo markings, are removed and sent to an external optical character recognition (OCR) application. The following image shows the text that is sent to the OCR program.

The text from
    the original image

Identification of common symbols using heuristics

Vertical lines (stems and barlines) and black noteheads are the most commonly occurring symbols in most music that we deal with. Therefore, they are removed using heuristic rules before the more general symbol identification procedure is applied. Vertical lines, for example, are considered any black area that is very thin and quite tall. This image shows all of the vertical lines that were identified in the excerpt.

The vertical lines identified in the
    original image

All of the black noteheads in the score are shown in the following image. Note that some of the black noteheads, particularly those that touch other noteheads, could not be identified using the heuristic rules and will be passed on to the general symbol identification phase.

The noteheads identified in the
    original image

General symbol classification

Next, each symbol is classified by matching every element on the page as closely as possible to known symbols in a database. A unique feature of our approach is that the database that describes each symbol is not fixed. Instead, the appearance of new symbols is learned by example. Therefore, new symbols can be easily added to the system. This approach also allows for symbols that may look dramatically different, such as handwritten vs. typeset quarter notes, to be recognized. Each rectangle in the following image represents a symbol that was classified.

Symbol recognition

Semantic interpretation of symbol relationships

Once the location and identity of all the symbols on the page is known, they can be interpreted musically. The first step is to create relationships between symbols. For example, to determine the pitch of a note, it must be related to a set of staff lines, a clef and a key signature. Durations are determined by relating a notehead to stems, beams and flags. The following diagram shows a note (in red) with all its related symbols highlighted (in yellow and blue).

Metric correction

Once the relationships between symbols has been determined, it is possible to correct errors made in previous stages by examining the durations of notes. Since, by convention, each measure should have the same number of beats in each part, and the beats should line up vertically between parts, many errors can be corrected by checking for consistency against the relative locations of notes. For instance, in the last measure of the excerpt, the second eighth rest in the right hand of the piano part was not properly identified by the general symbol classifier, since its "stem" had degraded in the printing process. However, the metric correction phase was able to shift the timing of the following eighth notes so that they are in the correct position. The following image shows the results of metric correction. Each blue vertical line represents a beat.

Feedback and score colouring

When the score is fully interpreted, the system can give us feedback about its interpretation in by coloring the score in various ways. For instance, the figure below shows all the notes coloured by their durations. Eighth notes are green, quarter notes are blue, and dotted quarter notes are beige.

Output

The final musical interpretation of the score is output in GUIDO format. The first measure in GUIDO format is given below.

To listen to the score, the GUIDO file is converted to MIDI using the program gmn2midi by Ludger Martin and Holger Hoos. This MIDI file can then be played using any virtually any MIDI synthesis system.

Download the full-page example:

The score can also be re-rendered using any computer-based music notation software that supports the GUIDO format, such as GUIDO NoteServer or NoteAbility Pro. The figure below shows the first two measures of the excerpt re-rendered using GUIDO NoteServer.

Example
    rendered using GUIDO NoteServer.

Acknowledgments

Funding for this work was provided by the National Science Foundation, the National Endowment for the Humanities, the Institute for Museum and Library Services, and the Levy Family.

This web page was written by Michael Droettboom 09/12/00, 04/15/01, 09/30/02.