Supervisor
Kislay Raj
Programme
MSc in Data Analytics
Abstract
Bird populations are widely used as indicators of ecosystem health, but traditional monitoring based on manual observation is labour-intensive and difficult to scale. Recent advances in deep learning and low-cost edge hardware offer new opportunities for automated, real-time bird identification in gardens and other local habitats. This thesis investigates whether video-based deep learning models can reliably classify common Irish garden birds from short motion-triggered clips and how temporal modelling compares to image-based models.
A primary dataset of 20-second clips was collected in a private garden in Ireland using a Raspberry Pi with a high-resolution camera and a YOLO-based trigger to record only when birds were present. The footage was cropped and manually annotated into track-level sequences for five species: chaffinch, goldfinch, greenfinch, house sparrow, and starling. A small secondary dataset of European greenfinch videos from open-access sources was curated to probe generalisation across hardware and appearance conditions. After exploratory data analysis and a standardised spatial–temporal preprocessing pipeline, three model families were compared under matched training conditions: (i) a frame-based ResNet18 CNN with track-level logit averaging, (ii) a CNN–LSTM hybrid using a frozen ResNet18 backbone and a single-layer LSTM, and (iii) a 3DCNN (r3d_18) with frozen and fine-tuned backbones.
On the primary dataset, the best CNN–LSTM model, using packed sequences and grayscale/contrast-enhancing augmentation, achieved 92.5% accuracy and 92.1% balanced accuracy. A fine-tuned 3DCNN slightly outperformed it (94.0% / 93.4%), while a frame-based CNN with an unfrozen backbone reached 98% track-level accuracy, misclassifying only one sequence. Adding the secondary greenfinch data did not improve and sometimes reduced performance of CNN-LSTM models due to domain shift, whereas carefully chosen photometric augmentation clearly improved generalisation on the primary data. Overall, the results show that temporal models can accurately predict a bird species from short garden clips, but for this dataset a 2D CNN shows the most accurate results. The findings have implications for the design of practical, low-cost bird monitoring systems for citizen science and local conservation.
Date of Award
2025
Full Publication Date
2025
Access Rights
open access
Document Type
Capstone Project
Resource Type
thesis
Recommended Citation
Dolynenko, A.
(2025) Deep Learning for Irish Garden Bird Identification: Exploring the Role of CNN-LSTM in Video-Based Recognition CCT College Dublin.
DOI: https://doi.org/10.63227/652.299.67