Tosy

ASR

ML

Home

/

Projects

/

Overview

About the project

implementation of End-to-End Automatic Speech Recognition architectures.

Date

January 25, 2024

My Role

Developer

Automatic Speech Recognition

Implemented automatic speech recognition architectures based on CTC¹, Listen, Attend and Spell² and LAS-CTC³.

The models were trained and tested on a subset of the HarperValleyBank Dataset⁴. Which is hosted here. The dataset is used to train models which predicts each spoken character.

Highlights

Feature Extraction
- Uses Librosa to extract wav log melspectrogram
- Character encoding
Training end-to-end ASR
- Multiple implementation of ASR model architecture including attention based models
- Regularization of attention-based network to respect CTC alignments (LAS-CTC)
- Utilizes Lightning Trainer API
- Training process logs and visualization with Wandb
- Teacher-forcing
Decoding
- Greedy decoding
- imposes a CTC objective on the decoding
- CTC-Rules

Model Run Report

View project on GitHub

‍

Reference

Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks, A Graves et al.
Listen, Attend and Spell, W Chan et al.
Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning, S Kim et al.
CS224S: Spoken Language Processing

‍