Supervisor

Kislay Raj

Programme

MSc in Data Analytics

Abstract

Bird populations are widely used as indicators of ecosystem health, but traditional monitoring based on manual observation is labour-intensive and difficult to scale. Recent advances in deep learning and low-cost edge hardware offer new opportunities for automated, real-time bird identification in gardens and other local habitats. This thesis investigates whether video-based deep learning models can reliably classify common Irish garden birds from short motion-triggered clips and how temporal modelling compares to image-based models.

A primary dataset of 20-second clips was collected in a private garden in Ireland using a Raspberry Pi with a high-resolution camera and a YOLO-based trigger to record only when birds were present. The footage was cropped and manually annotated into track-level sequences for five species: chaffinch, goldfinch, greenfinch, house sparrow, and starling. A small secondary dataset of European greenfinch videos from open-access sources was curated to probe generalisation across hardware and appearance conditions. After exploratory data analysis and a standardised spatial–temporal preprocessing pipeline, three model families were compared under matched training conditions: (i) a frame-based ResNet18 CNN with track-level logit averaging, (ii) a CNN–LSTM hybrid using a frozen ResNet18 backbone and a single-layer LSTM, and (iii) a 3DCNN (r3d_18) with frozen and fine-tuned backbones.

On the primary dataset, the best CNN–LSTM model, using packed sequences and grayscale/contrast-enhancing augmentation, achieved 92.5% accuracy and 92.1% balanced accuracy. A fine-tuned 3DCNN slightly outperformed it (94.0% / 93.4%), while a frame-based CNN with an unfrozen backbone reached 98% track-level accuracy, misclassifying only one sequence. Adding the secondary greenfinch data did not improve and sometimes reduced performance of CNN-LSTM models due to domain shift, whereas carefully chosen photometric augmentation clearly improved generalisation on the primary data. Overall, the results show that temporal models can accurately predict a bird species from short garden clips, but for this dataset a 2D CNN shows the most accurate results. The findings have implications for the design of practical, low-cost bird monitoring systems for citizen science and local conservation.

Date of Award

2025

Full Publication Date

2025

Access Rights

open access

Document Type

Capstone Project

Resource Type

thesis

Share

COinS