ISMIR 2022: Day 4

Keynote Speaker

Keynote - 2: Richa Singh on "Adventures of AI: Deepfake and Bias in Audio Processing"

Richa Singh / Professor and Head, Dept. of Computer Science and Engineering, Indian Institute of Technology Jodhpur

2022-12-07 | 09:00 (Asia/Calcutta)

keynote-richa

The increasing capabilities for machine learning algorithms is enabling the usage of ML models for a variety of tasks including for creativity such as generating new music and modifying existing music. Similar applications are present in different kinds of audio signals such as voice biometrics, speaker and speech recognition. However, these technologies that support creativity can also be used for malicious purposes. Deepfake audios are one such technology which enable flawlessly altering existing audio signals or creating new signals from any given text. Audio can also be integrated with videos to provide a complete multimodal experience, which can be purely synthetic and fake. While there is significant research ongoing in image and video, the space of detecting these anomalies in audio processing is relatively unaddressed. We will discuss some of these possible adventures of machine learning in audio processing and the research efforts that we are undertaking to detect them. In addition, we will also discuss the bias and fairness issues in audio processing where we will highlight "out of distribution" behavior of popular approaches and some strategies to address them.

Bio

Richa Singh received her Ph.D. degree in computer science from West Virginia University, Morgantown, USA, in 2008. She is currently a Professor and Head at Department of CSE, IIT Jodhpur. She has co-edited the book Deep Learning in Biometrics and has delivered keynote talks/tutorials on deep learning, trusted AI, and domain adaptation in NVIDIA GTC 2021, BIOSIG2021, ICCV 2017, AFGR 2017, and IJCNN 2017. Her areas of interest are pattern recognition, machine learning, and biometrics. She is a Fellow of IEEE, IAPR and AAIA, and a Senior Member of ACM. She was a recipient of the Kusum and Mohandas Pai Faculty Research Fellowship at the IIIT-Delhi, the FAST Award by the Department of Science and Technology, India, and several best paper and best poster awards in international conferences. She is/was served as the Program Co-Chair of CVPR2022, ICMI2022, IJCB2020, AFGR2019 and BTAS 2016, and a General Co-Chair of FG 2021 and ISBA 2017. She is also the Vice President (Publications) of the IEEE Biometrics Council and an Associate Editor-in-Chief of Pattern Recognition.

Poster Sessions

Paper Session - 5

Session Chair: Rachel Bittner (Spotify)

2022-12-07 | 10:00 (Asia/Calcutta)

Browse the active poster session's channels, joining calls to ask questions and discuss research with presenters, and leave comments in the channel for asynchronous chatting later.

P5-01*: Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations
Jaidev Shriram, Makarand Tapaswi, Vinoo Alluri
P5-02: Musika! Fast Infinite Waveform Music Generation
Marco Pasini, Jan Schlüter
P5-03: Symphony Generation with Permutation Invariant Language Model
Jiafeng Liu, Yuanliang Dong, Zehua Cheng, Xinran Zhang, Xiaobing Li, Feng Yu, Maosong Sun
P5-04: MuLan: A Joint Embedding of Music Audio and Natural Language
Qingqing Huang, Aren Jansen, Joonseok Lee, Ravi Ganti, Judith Yue Li, Daniel P W Ellis
P5-05: MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks
Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu
P5-06: Towards robust music source separation on loud commercial music
Chang-Bin Jeon, Kyogu Lee
P5-07: Towards Quantifying the Strength of Music Scenes Using Live Event Data
Michael Zhou, Andrew Mcgraw, Douglas R Turnbull
P5-08: Learning Multi-Level Representations for Hierarchical Music Structure Analysis
Morgan Buisson, Brian Mcfee, Slim Essid, Hélène C. Crayencour Crayencour
P5-09: Multi-instrument Music Synthesis with Spectrogram Diffusion
Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Joshua Gardner, Ethan Manilow, Jesse Engel
P5-10: DDX7: Differentiable FM Synthesis of Musical Instrument Sounds
Franco Caspe, Andrew Mcpherson, Mark Sandler
P5-11: Singing beat tracking with Self-supervised front-end and linear transformers
Mojtaba Heydari, Zhiyao Duan
P5-12: EnsembleSet: a new high quality synthesised dataset for chamber ensemble separation
Saurjya Sarkar, Emmanouil Benetos, Mark Sandler
P5-13: End-to-End Lyrics Transcription Informed by Pitch and Onset Estimation
Tengyu Deng, Eita Nakamura, Kazuyoshi Yoshii
P5-14: Contrastive Audio-Language Learning for Music
Ilaria Manco, Emmanouil Benetos, Elio Quinton, George Fazekas
P5-15: MusAV: A dataset of relative arousal-valence annotations for validation of audio models
Dmitry Bogdanov, Xavier Lizarraga-Seijas, Pablo Alonso-Jiménez, Xavier Serra
P5-16: What is missing in deep music generation? A study of repetition and structure in popular music
Shuqi Dai, Huiran Yu, Roger B Dannenberg
P5-17: Heterogeneous Graph Neural Network for Music Emotion Recognition
Angelo Cesar Mendes Da Silva, Diego F Silva, Ricardo Marcondes Marcacini

An asterisk (*) indicates long presentations (paper award candidates)

Paper Session - 6

Session Chair: Juhan Nam (Korea Advanced Institute of Science and Technology)

2022-12-07 | 13:30 (Asia/Calcutta)

Browse the active poster session's channels, joining calls to ask questions and discuss research with presenters, and leave comments in the channel for asynchronous chatting later.

P6-01*: And what if two musical versions don't share melody, harmony, rhythm, or lyrics?
Mathilde Abrassart, Guillaume Doras
P6-02: A diffusion-inspired training strategy for singing voice extraction in the waveform domain
Genís Plaja-Roglans, Marius Miron, Xavier Serra
P6-03: A Model You Can Hear: Audio Identification with Playable Prototypes
Romain Loiseau, Baptiste Bouvier, Yann Teytaut, Elliot Vincent, Mathieu Aubry, Loic Landrieu
P6-04: An Exploration of Generating Sheet Music Images
Marcos Acosta, Irmak Bukey, T J Tsai
P6-05: HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription
Weixing Wei, Peilin Li, Yi Yu, Wei Li
P6-06: Generating music with sentiment using Transformer-GANs
Pedro L T Neves, José Fornari, João B Florindo
P6-07: Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments
Ke Chen, Hao-Wen Dong, Yi Luo, Julian Mcauley, Taylor Berg-Kirkpatrick, Miller Puckette, Shlomo Dubnov
P6-08: Ethics of Singing Voice Synthesis: Perceptions of Users and Developers
Kyungyun Lee, Gladys Hitt, Emily Terada, Jin Ha Lee
P6-09: Emotion-driven Harmonisation And Tempo Arrangement of Melodies Using Transfer Learning
Takuya Takahashi, Mathieu Barthet
P6-10: Using Activation Functions for Improving Measure-Level Audio Synchronization
Yigitcan Özer, Matej Ištvánek, Vlora Arifi-Müller, Meinard Müller
P6-11: A deep learning method for melody extraction from a polyphonic symbolic music representation
Katerina Kosta, Wei Tsung Lu, Gabriele Medeot, Pierre Chanquion
P6-12: A Reproducibility Study on User-centric MIR Research and Why it is Important
Peter Knees, Bruce Ferwerda, Andreas Rauber, Sebastian Strumbelj, Annabel Resch, Laurenz Tomandl, Valentin Bauer, Fung Yee Tang, Josip Bobinac, Amila Ceranic, Riad Dizdar
P6-13: Music Separation Enhancement with Generative Modeling
Noah Schaffer, Boaz Cogan, Ethan Manilow, Max Morrison, Prem Seetharaman, Bryan Pardo
P6-14: SampleMatch: Drum Sample Retrieval by Musical Context
Stefan Lattner
P6-15: A Transformer-Based ""Spellchecker"" for Detecting Errors in OMR Output
Timothy De Reuse, Ichiro Fujinaga
P6-16: ""More than words"": Linking Music Preferences and Moral Values through Lyrics
Vjosa Preniqi, Kyriaki Kalimeri, Charalampos Saitis

An asterisk (*) indicates long presentations (paper award candidates)

Special Sessions

Special Session 2: Enhancing music creativity with MIR

Moderator: Jan Van Balen (Spotify)

2022-12-07 | 16:00 (Asia/Calcutta)

While audio technology has always had an important role in music production, it is now recognised that MIR tools can provide for workflows that enhance music creativity at every stage of the journey. The panel will discuss the possibilities and challenges of this exciting partnership between music computing and creativity.

Panelists: Georgi Dzhambazov (Smule), Dorien Herremans (SUTD), Oriol Nieto (Adobe), Akira Maezawa (Yamaha), Igor Pereira (Moises.ai)

ISMIR 2022 Banquet

2022-12-07 | 17:00 (Asia/Calcutta)

Virtual Special Sessions

Special Session - C (Online): TISMIR: the open journal of the ISMIR society

Moderator: Emilia Gómez (Joint Research Centre, European Commission and Universitat Pompeu Fabra)

2022-12-07 | 22:00 (Asia/Calcutta)

Transactions of the International Society for Music Information Retrieval(TISMIR) was established in 2018 to complement the ISMIR conference proceedings and provide a vehicle for the dissemination of the highest quality and most substantial scientific research in MIR. TISMIR retains the Open Access model of the ISMIR Conference proceedings, encourages reproducibility of the published research papers, and maintains a low publication cost. Almost 5 years later, this ISMIR 2022 is devoted to discuss and brainstorm on the current status and future perspectives of the journal with a series of TISMIR recent and potential authors, reviewers and editors. We will address the following questions, and others proposed by participants: What do you appreciate more about TISMIR? What is the link and complementarity to the ISMIR conference? Which are the main challenges/limitations that need to be addressed? How to make TISMIR competitive as a journal in the current publication landscape? How to engage with more community members in order to make TISMIR a success? Which are future avenues for conference vs journal outlets in the ISMIR field?