(1) J. Dettmer, I. Vatolkin, and T. Glasmachers: Weighted Initialisation of Evolutionary Instrument and Pitch Detection in Polyphonic Music. Accepted for Proceedings of the 13th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART)
Abstract: Current state-of-the-art methods for instrument and pitch detection in polyphonic music often require large datasets and long training times; resources which are sparse in the field of music information retrieval, presenting a need for unsupervised alternative methods that do not require such prerequisites. We present a modification to an evolutionary algorithm for polyphonic music approximation through synthesis that uses spectral information to initialise populations with probable pitches. This algorithm can perform joint instrument and pitch detection on polyphonic music pieces without any of the aforementioned constraints. Sets of tuples of (instrument, style, pitch) are graded with a COSH distance fitness function and finally determine the algorithm’s instrument and pitch labels for a given part of a music piece. Further investigation into this fitness function indicates that it tends to create false positives which may conceal the true potential of our modified approach. Regardless of that, our modification still shows significantly faster convergence speed and slightly improved pitch and instrument detection errors over the baseline algorithm on both single onset and full piece experiments.
(2) L. Fricke, M. Gotham, F. Ostermann, and I. Vatolkin: Adaptation and Optimization of AugmentedNet for Roman Numeral Analysis Applied to Audio Signals. Accepted for Proceedings of the 13th International Conference on Artificial Intelligence in Music, Sound, Art and Design (EvoMUSART)
Abstract: Automatic music harmony analysis has recently been significantly improved by AugmentedNet, a convolutional recurrent neural network for predicting Roman numeral labels. The original network receives perfect note annotations from the digital score as inputs and predicts various tonal descriptors: key, chord root, bass note, harmonic rhythm, etc. However, for many music tracks the score is not available at hand. For this study, we have first adjusted AugmentedNet for a direct application to audio signals represented either by chromagrams or semitone spectra. Second, we have implemented and compared further modifications to the network architecture: a preprocessing block designed to learn pitch spellings, increase of the network size, and addition of dropout layers. The statistical analysis helped to identify the best among all proposed configurations and has shown that some of the optimization steps significantly increased the classification performance. Besides, AugmentedNet can reach similar accuracies with audio features as inputs, compared to the perfect annotations that it was originally designed for.