Department

Computer Science

First Advisor

Ruben Gamboa

Description

The field of artificial intelligence (AI) has long found that it is the things that humans find very easy to do that are most difficult to achieve. An example of this is the task of sound analysis. Humans are quite adept at making accurate classifications about a speaker based solely on the sounds they make during speech. These classifications include the gender, age, and natural language, to name a few. While such a task seems very simple to most of us, it represents a major challenge for an AI. Such a program could be used in a number of applications, including phone based classification of speakers and speaker verification. The goal of this project was to use deep learning to train an artificial neural network (ANN) to classify speakers from recorded audio. We trained this network using the Speech Accent Archive as the training dataset. This data includes more than 2300 speaker recordings of a paragraph that is designed to cover all of the sounds in the English language, complete with meta-data labels for each speaker. Our software is able to load saved ANNs to be trained, analyzed, or used for classification.

Comments

Oral Presentation

Included in

Education Commons

Share

COinS
 

Speaker Classification through Deep Learning

The field of artificial intelligence (AI) has long found that it is the things that humans find very easy to do that are most difficult to achieve. An example of this is the task of sound analysis. Humans are quite adept at making accurate classifications about a speaker based solely on the sounds they make during speech. These classifications include the gender, age, and natural language, to name a few. While such a task seems very simple to most of us, it represents a major challenge for an AI. Such a program could be used in a number of applications, including phone based classification of speakers and speaker verification. The goal of this project was to use deep learning to train an artificial neural network (ANN) to classify speakers from recorded audio. We trained this network using the Speech Accent Archive as the training dataset. This data includes more than 2300 speaker recordings of a paragraph that is designed to cover all of the sounds in the English language, complete with meta-data labels for each speaker. Our software is able to load saved ANNs to be trained, analyzed, or used for classification.