Supervisor
Dr. Muhammad Iqbal
Programme
MSc in Data Analytics
Subject
Computer Science
Abstract
The field of Music information retrieval (MIR) is concerned with computational systems which help humans better make sense of the processing, searching, organizing, and accessing of music-related data and takes in disciplines such as music theory, computer science, psychology, neuroscience, library science, electrical engineering and machine learning. MIR processes relating to audio classification have applications in fields such as speech recognition, automatic bandwidth allocation and Audio Database Indexing which is especially of relevance to large audio collections in broadcasting facilities, the movie industry or music content providers.
These indexing aspects of MIR have been applied to classifying audio into musical genres , identifying individual instruments within a piece of music and automatically transcribing music but as of yet none have focused on the identification and cataloguing of the various audio effects that can be applied to music. If it can be established what class of audio effect has been applied to an audio file it has a variety of potential applications such as aiding with the identification of the underlying instruments in the audio or facilitating any library or cataloguing based activities.
The novelty of this paper lies in the following two aspects: 1) no previous studies have been uncovered on the classification of audio effects; 2) The use of a difference file as an input into a neural network is heretofore unexplored.
The dataset used in the study consisted of 7,648 files which were processed through eight different audio effects to give 61,184 unique inputs. A total of 18 different neural network runs were undertaken with three different sets of inputs: MFCCs generated from sub samples of the audio files and images generated from the full length audio files in both MFCC and difference file formats. Variations in the length of the audio file inputs, the number of inputs provided to the model, and the number of epochs run were assessed for their impact on model performance.
The best performing Convolutional Recurrent Neural Network (CRNN) using sub samples of the audio files converted to MFCCs achieved an accuracy of 97.3% via the use of 10 second subsamples generating 100,000 inputs being run for 30 epochs. The overall best performing model used 256*256 px full length difference files as inputs and achieved an accuracy score of 98.4% on just 10 epochs. These results demonstrate the effectiveness of CRNNs at classifying audio effects and the potential of difference files as an image input format for neural networks.
Date of Award
2025
Full Publication Date
2025
Access Rights
open access
Document Type
Capstone Project
Resource Type
thesis
Recommended Citation
Sneyd, P. (2025) A Comparative Evaluation of the Effectiveness of Mel Frequency Cepstral Coefficients and Difference Files for Audio Effect Identification Using Convolutional Recurrent Neural Networks. CCT College Dublin.