Keynote Speaker

Richa Singh / Professor and Head, Dept. of Computer Science and Engineering, Indian Institute of Technology Jodhpur
2022-12-07 | 09:00 (Asia/Calcutta)
The increasing capabilities for machine learning algorithms is enabling the usage of ML models for a variety of tasks including for creativity such as generating new music and modifying existing music. Similar applications are present in different kinds of audio signals such as voice biometrics, speaker and speech recognition. However, these technologies that support creativity can also be used for malicious purposes. Deepfake audios are one such technology which enable flawlessly altering existing audio signals or creating new signals from any given text. Audio can also be integrated with videos to provide a complete multimodal experience, which can be purely synthetic and fake. While there is significant research ongoing in image and video, the space of detecting these anomalies in audio processing is relatively unaddressed. We will discuss some of these possible adventures of machine learning in audio processing and the research efforts that we are undertaking to detect them. In addition, we will also discuss the bias and fairness issues in audio processing where we will highlight "out of distribution" behavior of popular approaches and some strategies to address them.
Richa Singh received her Ph.D. degree in computer science from West Virginia University, Morgantown, USA, in 2008. She is currently a Professor and Head at Department of CSE, IIT Jodhpur. She has co-edited the book Deep Learning in Biometrics and has delivered keynote talks/tutorials on deep learning, trusted AI, and domain adaptation in NVIDIA GTC 2021, BIOSIG2021, ICCV 2017, AFGR 2017, and IJCNN 2017. Her areas of interest are pattern recognition, machine learning, and biometrics. She is a Fellow of IEEE, IAPR and AAIA, and a Senior Member of ACM. She was a recipient of the Kusum and Mohandas Pai Faculty Research Fellowship at the IIIT-Delhi, the FAST Award by the Department of Science and Technology, India, and several best paper and best poster awards in international conferences. She is/was served as the Program Co-Chair of CVPR2022, ICMI2022, IJCB2020, AFGR2019 and BTAS 2016, and a General Co-Chair of FG 2021 and ISBA 2017. She is also the Vice President (Publications) of the IEEE Biometrics Council and an Associate Editor-in-Chief of Pattern Recognition.

Poster Sessions

Paper Session - 5

Session Chair: Rachel Bittner (Spotify)

2022-12-07 | 10:00 (Asia/Calcutta)

Browse the active poster session's channels, joining calls to ask questions and discuss research with presenters, and leave comments in the channel for asynchronous chatting later.

  • P5-01*: Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations
    Jaidev Shriram, Makarand Tapaswi, Vinoo Alluri
  • P5-02: Musika! Fast Infinite Waveform Music Generation
    Marco Pasini, Jan Schlüter
  • P5-03: Symphony Generation with Permutation Invariant Language Model
    Jiafeng Liu, Yuanliang Dong, Zehua Cheng, Xinran Zhang, Xiaobing Li, Feng Yu, Maosong Sun
  • P5-04: MuLan: A Joint Embedding of Music Audio and Natural Language
    Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, Daniel P W Ellis
  • P5-05: MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks
    Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu
  • P5-06: Towards robust music source separation on loud commercial music
    Chang-Bin Jeon, Kyogu Lee
  • P5-07: Towards Quantifying the Strength of Music Scenes Using Live Event Data
    Michael Zhou, Andrew Mcgraw, Douglas R Turnbull
  • P5-08: Learning Multi-Level Representations for Hierarchical Music Structure Analysis
    Morgan Buisson, Brian Mcfee, Slim Essid, Hélène C. Crayencour Crayencour
  • P5-09: Multi-instrument Music Synthesis with Spectrogram Diffusion
    Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Joshua Gardner, Ethan Manilow, Jesse Engel
  • P5-10: DDX7: Differentiable FM Synthesis of Musical Instrument Sounds
    Franco Caspe, Andrew Mcpherson, Mark Sandler
  • P5-11: Singing beat tracking with Self-supervised front-end and linear transformers
    Mojtaba Heydari, Zhiyao Duan
  • P5-12: EnsembleSet: a new high quality synthesised dataset for chamber ensemble separation
    Saurjya Sarkar, Emmanouil Benetos, Mark Sandler
  • P5-13: End-to-End Lyrics Transcription Informed by Pitch and Onset Estimation
    Tengyu Deng, Eita Nakamura, Kazuyoshi Yoshii
  • P5-14: Contrastive Audio-Language Learning for Music
    Ilaria Manco, Emmanouil Benetos, Elio Quinton, George Fazekas
  • P5-15: MusAV: A dataset of relative arousal-valence annotations for validation of audio models
    Dmitry Bogdanov, Xavier Lizarraga-Seijas, Pablo Alonso-Jiménez, Xavier Serra
  • P5-16: What is missing in deep music generation? A study of repetition and structure in popular music
    Shuqi Dai, Huiran Yu, Roger B Dannenberg
  • P5-17: Heterogeneous Graph Neural Network for Music Emotion Recognition
    Angelo Cesar Mendes Da Silva, Diego F Silva, Ricardo Marcondes Marcacini

An asterisk (*) indicates long presentations (paper award candidates)

Paper Session - 6

Session Chair: Juhan Nam (Korea Advanced Institute of Science and Technology)

2022-12-07 | 13:30 (Asia/Calcutta)

Browse the active poster session's channels, joining calls to ask questions and discuss research with presenters, and leave comments in the channel for asynchronous chatting later.

  • P6-01*: And what if two musical versions don't share melody, harmony, rhythm, or lyrics?
    Mathilde Abrassart, Guillaume Doras
  • P6-02: A diffusion-inspired training strategy for singing voice extraction in the waveform domain
    Genís Plaja-Roglans, Marius Miron, Xavier Serra
  • P6-03: A Model You Can Hear: Audio Identification with Playable Prototypes
    Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loic Landrieu
  • P6-04: An Exploration of Generating Sheet Music Images
    Marcos Acosta, Irmak Bukey, T J Tsai
  • P6-05: HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription
    Weixing Wei, Peilin Li, Yi Yu, Wei Li
  • P6-06: Generating music with sentiment using Transformer-GANs
    Pedro L T Neves, José Fornari, João B Florindo
  • P6-07: Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments
    Ke Chen, Hao-Wen Dong, Yi Luo, Julian Mcauley, Taylor Berg-Kirkpatrick, Miller Puckette, Shlomo Dubnov
  • P6-08: Ethics of Singing Voice Synthesis: Perceptions of Users and Developers
    Kyungyun Lee, Gladys Hitt, Emily Terada, Jin Ha Lee
  • P6-09: Emotion-driven Harmonisation And Tempo Arrangement of Melodies Using Transfer Learning
    Takuya Takahashi, Mathieu Barthet
  • P6-10: Using Activation Functions for Improving Measure-Level Audio Synchronization
    Yigitcan Özer, Matej Ištvánek, Vlora Arifi-Müller, Meinard Müller
  • P6-11: A deep learning method for melody extraction from a polyphonic symbolic music representation
    Katerina Kosta, Wei Tsung Lu, Gabriele Medeot, Pierre Chanquion
  • P6-12: A Reproducibility Study on User-centric MIR Research and Why it is Important
    Peter Knees, Bruce Ferwerda, Andreas Rauber, Sebastian Strumbelj, Annabel Resch, Laurenz Tomandl, Valentin Bauer, Fung Yee Tang, Josip Bobinac, Amila Ceranic, Riad Dizdar
  • P6-13: Music Separation Enhancement with Generative Modeling
    Noah Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo
  • P6-14: SampleMatch: Drum Sample Retrieval by Musical Context
    Stefan Lattner
  • P6-15: A Transformer-Based ""Spellchecker"" for Detecting Errors in OMR Output
    Timothy De Reuse, Ichiro Fujinaga
  • P6-16: ""More than words"": Linking Music Preferences and Moral Values through Lyrics
    Vjosa Preniqi, Kyriaki Kalimeri, Charalampos Saitis

An asterisk (*) indicates long presentations (paper award candidates)

Special Sessions

Special Session 2: Enhancing music creativity with MIR

Moderator: Jan Van Balen (Spotify)

2022-12-07 | 16:00 (Asia/Calcutta)

While audio technology has always had an important role in music production, it is now recognised that MIR tools can provide for workflows that enhance music creativity at every stage of the journey. The panel will discuss the possibilities and challenges of this exciting partnership between music computing and creativity.

Panelists: Georgi Dzhambazov (Smule), Dorien Herremans (SUTD), Oriol Nieto (Adobe), Akira Maezawa (Yamaha), Igor Pereira (

Social Events

ISMIR 2022 Banquet

2022-12-07 | 17:00 (Asia/Calcutta)

Virtual Special Sessions

Special Session - C (Online): TISMIR: the open journal of the ISMIR society

Moderator: Emilia Gómez (Joint Research Centre, European Commission and Universitat Pompeu Fabra)

2022-12-07 | 22:00 (Asia/Calcutta)

Transactions of the International Society for Music Information Retrieval(TISMIR) was established in 2018 to complement the ISMIR conference proceedings and provide a vehicle for the dissemination of the highest quality and most substantial scientific research in MIR. TISMIR retains the Open Access model of the ISMIR Conference proceedings, encourages reproducibility of the published research papers, and maintains a low publication cost. Almost 5 years later, this ISMIR 2022 is devoted to discuss and brainstorm on the current status and future perspectives of the journal with a series of TISMIR recent and potential authors, reviewers and editors. We will address the following questions, and others proposed by participants: What do you appreciate more about TISMIR? What is the link and complementarity to the ISMIR conference? Which are the main challenges/limitations that need to be addressed? How to make TISMIR competitive as a journal in the current publication landscape? How to engage with more community members in order to make TISMIR a success? Which are future avenues for conference vs journal outlets in the ISMIR field?