Opening Session

Preeti Rao (IIT Bombay), Hema Murthy (IIT Madras), Ajay Srinivasamurthy (Amazon Alexa India)

2022-12-05 | 09:00 (Asia/Calcutta)

Welcome to ISMIR 2022! Meet your hosts and hear about what is happening at this year's very special conference.

Poster Sessions

Paper Session - 1

Session Chair: Emilia Parada-Cabaleiro (Johannes Kepler University)

2022-12-05 | 10:00 (Asia/Calcutta)

Browse the active poster session's channels, joining calls to ask questions and discuss research with presenters, and leave comments in the channel for asynchronous chatting later.

  • P1-01*: Interpreting Song Lyrics with an Audio-Informed Pre-trained Language Model
    Yixiao Zhang, Junyan Jiang, Gus Xia, Simon Dixon
  • P1-02: Toward postprocessing-free neural networks for joint beat and downbeat estimation
    Tsung-Ping Chen, Li Su
  • P1-03: Music Translation: Generating Piano Arrangements in Different Playing Levels
    Matan Gover, Oded Zewi
  • P1-04: Scaling Polyphonic Transcription with Mixtures of Monophonic Transcriptions
    Ian Simon, Joshua Gardner, Curtis Hawthorne, Ethan Manilow, Jesse Engel
  • P1-05: Attention-based audio embeddings for query-by-example
    Anup Singh, Kris Demuynck, Vipul Arora
  • P1-06: SIATEC-C: Computationally efficient repeated pattern discovery in polyphonic music
    Otso Björklund
  • P1-07: Tailed U-Net: Multi-Scale Music Representation Learning
    Marcel A Vélez Vásquez, John Ashley Burgoyne
  • P1-08: DDSP-based Singing Vocoders: A New Subtractive-based Synthesizer and A Comprehensive Evaluation
    Da-Yi Wu, Wen-Yi Hsiao, Fu-Rong Yang, Oscar D Friedman, Warren Jackson, Scott Bruzenak, Yi-Wen Liu, Yi-Hsuan Yang
  • P1-09: Equivariant self-supervision for musical tempo estimation
    Elio Quinton
  • P1-10: How Music features and Musical Data Representations Affect Objective Evaluation of Music Composition: A Review of CSMT Data Challenge 2020
    Yuqiang Li, Shengchen Li, George Fazekas
  • P1-11: YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations
    Eunjin Choi, Yoonjin Chung, Seolhee Lee, Jongik Jeon, Taegyun Kwon, Juhan Nam
  • P1-12: Detecting Symmetries of All Cardinalities With Application to Musical 12-Tone Rows
    Anil Venkatesh, Viren Sachdev
  • P1-13: The power of deep without going deep? A study of HDPGMM music representation learning
    Jaehun Kim, Cynthia C. S. Liem
  • P1-14: Pop Music Generation with Controllable Phrase Lengths
    Daiki Naruse, Tomoyuki Takahata, Yusuke Mukuta, Tatsuya Harada
  • P1-15: Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation
    Yen-Tung Yeh, Yi-Hsuan Yang, Bo-Yu Chen
  • P1-16: Modeling the rhythm from lyrics for melody generation of pop songs
    Daiyu Zhang, Ju-Chiang Wang, Katerina Kosta, Jordan B. L. Smith, Shicen Zhou

An asterisk (*) indicates long presentations (paper award candidates)

Paper Session - 2

Session Chair: Chitralekha Gupta (National University of Singapore)

2022-12-05 | 13:30 (Asia/Calcutta)

Browse the active poster session's channels, joining calls to ask questions and discuss research with presenters, and leave comments in the channel for asynchronous chatting later.

  • P2-01*: Visualization for AI-Assisted Composing
    Simeon Rau, Frank Heyen, Stefan Wagner, Michael Sedlmair
  • P2-02: Retrieving musical information from neural data: how cognitive features enrich acoustic ones
    Ellie Bean Abrams, Eva Muñoz Vidal, Claire Pelofi, Pablo Ripollés
  • P2-03: Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention
    Jingwei Zhao, Gus Xia, Ye Wang
  • P2-04: Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-Supervised Learning
    Seungyeon Rhyu, Sarah Kim, Kyogu Lee
  • P2-05: Exploiting Device and Audio Data to Tag Music with User-Aware Listening Contexts
    Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, Gaël Richard
  • P2-06: Jukedrummer: Conditional Beat-aware Audio-domain Drum Accompaniment Generation via Transformer VQ-VAE
    Yueh-Kao Wu, Ching-Yu Chiu, Yi-Hsuan Yang
  • P2-07: Learning Hierarchical Metrical Structure Beyond Measures
    Junyan Jiang, Daniel Chin, Yixiao Zhang, Gus Xia
  • P2-08: Mid-level Harmonic Audio Features for Musical Style Classification
    Francisco C. F. Almeida, Gilberto Bernardes, Christof Weiss
  • P2-09: Distortion Audio Effects: Learning How to Recover the Clean Signal
    Johannes Imort, Giorgio Fabbro, Marco A Martinez Ramirez, Stefan Uhlich, Yuichiro Koyama, Yuki Mitsufuji
  • P2-10: End-to-End Full-Page Optical Music Recognition for Mensural Notation
    Antonio Ríos-Vila, Jose M. Inesta, Jorge Calvo-Zaragoza
  • P2-11: Mel Spectrogram Inversion with Stable Pitch
    Bruno Di Giorgi, Mark Levy, Richard Sharp
  • P2-12: Latent feature augmentation for chorus detection
    Xingjian Du, Huidong Liang, Yuan Wan, Yuheng Lin, Ke Chen, Bilei Zhu, Zejun Ma
  • P2-13: AccoMontage2: A Complete Harmonization and Accompaniment Arrangement System
    Li Yi, Haochen Hu, Jingwei Zhao, Gus Xia
  • P2-14: Supervised and Unsupervised Learning of Audio Representations for Music Understanding
    Matthew C Mccallum, Filip Korzeniowski, Sergio Oramas, Fabien Gouyon, Andreas Ehmann
  • P2-15: Generating Coherent Drum Accompaniment with Fills and Improvisations
    Rishabh A Dahale, Vaibhav Vinayak Talwadker, Preeti Rao, Prateek Verma
  • P2-16: Bottlenecks and solutions for audio to score alignment research
    Alia Ahmed Morsi, Xavier Serra

An asterisk (*) indicates long presentations (paper award candidates)

WiMIR Meetups

WiMIR plenary session

Moderators: Xiao Hu (Hong Kong University)(remote), Ranjani H G (Ericsson R&D) (in-person)

2022-12-05 | 16:00 (Asia/Calcutta)

In the WiMIR plenary session, we invited few women researchers to present their work and share their journey. The panelists will then be available for an open Q&A with the audience.

Panelist: Dr. Xiao Hu
Title: Music for learning and wellbeing

In this session, I will briefly introduce our recent and ongoing research in the Cultural Computing and Multimodal Information Research (CCMIR) group in the University of Hong Kong, on the broad theme of “leveraging the power of music for learning and wellbeing.” Starting from explorations of music usage among real users, our investigation covers three themes: multimodal analysis of user-music interactions in the lab; remote monitoring of user-music interactions in the natural settings; and music recommendations for enhancing learning and wellbeing. Through the series of studies, we aim to broaden the impact of MIR research to related fields such as education, psychology and cognitive science.


Dr. Xiao Hu, is an Associate Professor in the Human Communication, Development and Information Science (CDIS) Academic Unit in the Faculty of Education at the University of Hong Kong. Her main research interests lie in the interactions of technology and human users, including music information retrieval, technology-enhanced learning and wellbeing, and digital cultural heritage. Dr. Hu served as a board member of The International Society for Music Information Retrieval (ISMIR) (2011-2017), a program co-chair for ISMIR 2017 and 2018, and a conference co-chair for ISMIR 2014. Her earlier research focused on music emotion recognition and her studies in recent years have expanded to how music can impact human learning and wellbeing and how to leverage MIR technologies to optimize the positive effects of music.

Panelist: Dr. Emilia Parada-Cabaleiro
Title: Working in MIR with a ""diverse"" background: A personal view

Computer Science, Psychology, Engineering, Music Theory, Social Sciences, Statistics: The field of MIR involves researchers from many different disciplines. Although this opens up a wide range of possibilities and research directions in principle, given the large diversity of backgrounds, it is sometimes challenging to comprehend each other's terminology. Moreover, in order to exploit synergies the best way, it is essential to agree upon suitable methods and identify the associated requirements. In this talk, a music therapist and musicologist will share her personal experiences of working in the MIR community. Examples and pitfalls will be discussed, with the goal of laying the foundation for a more fruitful collaboration.


Dr. Emilia Parada-Cabaleiro received her PhD in 2017 from the University of Rome Tor Vergata (Italy). Her formal education includes degrees in Music Education, Musicology, and Music Management as well as professional diplomas in Piano Performance and Music Therapy. Currently, she is a University Assistant at the Institute of Computational Perception at the Johannes Kepler University Linz (Austria). Her research, having a particular focus on Affective Computing, explores the use of computational methods to support some of the aforementioned music-related fields.

Panelist: Dr. Chitralekha Gupta
Title: Automated Singing Quality Analysis - Overview and Challenges

Singing quality assessment refers to the degree to which a particular vocal production meets professional standards of singing excellence. The aim of automated singing quality evaluation is to develop computational techniques for evaluating singing skill in the same way that music experts do. Such methods, therefore, seek to objectively measure musically-relevant perceptual parameters, such an intonation accuracy and rhythm consistency, to provide meaningful feedback to the singers. There have been two broad approaches for automatic singing skill evaluation: reference-dependent and reference-independent. Reference-dependent methods compare a test singing rendition against a template or an ideal singing rendition, while reference-independent methods rely on the inherent characteristics of singing quality, independent of a template singing rendition or song. In this talk, I will present an overview of the field of automatic singing quality evaluation including different quantitative methods applied in both of these approaches, as well as the current challenges and open research questions in this field.


Dr. Chitralekha Gupta is a post-doctoral research fellow at the National University of Singapore (NUS). Her research interests lie in the intersection of speech and music, particularly singing voice analysis, applications of ASR in music, and neural audio synthesis. She received her Ph.D. degree from NUS in 2019, her Master's degree from the Indian Institute of Technology Bombay in 2011 and has worked in the software industry for three years. She has been awarded a start-up grant and is the founder of MuSigPro, a music tech company, in Singapore. She received the NUS Dean's Graduate Research Achievement Award 2018, and the Best Student Paper Award in APSIPA 2017. She was a co-captain at MIREX 2020 and has played an active role in the organizing committees of international conferences such as ISMIR 2022 and 2017, ICASSP 2022, and ASRU 2019.

Panelist: Shahar Elisha
Title: Research on the Industrial Lane

My experience as a researcher has been shaped by the ways of working in industry. I will present a high-level overview of the various MIR projects that I have worked on at Spotify, and I will share my experience as I transitioned from engineering into a research role. I will focus on my own approach to research within industry, and highlight how it differs from academic research, illustrating challenges and successes.


Shahar is a Research Engineer at Spotify and a Research MSc candidate at the Centre for Digital Music at Queen Mary University of London under Dr. Emmanouil Benetos. Shahar completed her bachelor's degree in Computer Science at City, University of London, before joining Spotify as a Backend Engineer. At Spotify, she transitioned to research through work on MIR projects, such as audio identification and content categorisation. She is interested in solving real-life problems using audio-based machine learning models on both music and speech.

Social Events

Performance by Dhaatu Puppet Show

Team Dhaatu

2022-12-05 | 17:30 (Asia/Calcutta)

A performance of Kalidasa's play Mālavikāgnimitram by Dhaatu Puppet Theater will be presented as a part of the welcome reception at the Satish Dhawan Auditorium, IISc, Bengaluru.

Additional details about the performance can be found here:

Welcome Reception

ISMIR 2022 committee

2022-12-05 | 19:00 (Asia/Calcutta)

Given that the conference has returned to a hybrid format after two virtual-only editions, the welcome reception on the Main Guest House lawns is planned to encourage in-person interactions among the seasoned and new ISMIR participants after a brief (long?) hiatus due to the pandemic.

Virtual Special Sessions

Special Session A (Online): Ethics/Code of Conduct for ISMIR

Moderators: Andre Holzapfel (KTH Royal Institute of Technology, Sweden), Fabio Morreale (University of Auckland), Bob Sturm (KTH Royal Institute of Technology, Sweden)

2022-12-05 | 22:00 (Asia/Calcutta)

This special session will discuss an action plan towards a code of ethics for the ISMIR community. A code of ethics represents a specific list of values and behaviors that a research community either endorses or objects to. Codes of ethics have been established on the general level of engineering associations (IEEE, ACM), but also more specifically by research communities such as NIME. Whereas ISMIR has seen a series of tutorials on ethics and values, and guidelines have been proposed (, these attempts have not yet manifested into a official code of ethics. Does ISMIR need such a code? What is the function of the code? How can we establish and maintain such a code? What are the main ethical concerns regarding ISMIR research and practice?