P6-15: A Transformer-Based "Spellchecker" for Detecting Errors in OMR Output
de Reuse, Timothy*, Fujinaga, Ichiro
Subjects (starting with primary): Domain knowledge -> representations of music ; MIR tasks -> optical music recognition ; MIR fundamentals and methodology -> symbolic music processing ; Domain knowledge -> machine learning/artificial intelligence for music ; Applications -> music retrieval systems
Presented In-person, in Bengaluru: 4-minute short-format presentation
The outputs of Optical Music Recognition (OMR) systems require time-consuming human correction. Given that most of the errors induced by OMR processes appear "non-musical" to humans, we propose that the time to correct errors may be reduced by marking all symbols on a score that are musically unlikely, allowing the human to focus their attention accordingly. Using a dataset of Romantic string quartets, we train a variant of the Transformer network architecture on the task of classifying each symbol of an optically-recognized musical piece in symbolic format as correct or erroneous, based on whether a manual correction of the piece would require an insertion, deletion, or replacement of a symbol at that location. Since we have a limited amount of data with real OMR errors, we employ extensive data augmentation to add errors into training data in a way that mimics how OMR would modify the score. Our best-performing models achieve 99% recall and 50% precision on this error-detection task.