Few-shot learning for speaker recognition

Tutkielman tavoitteena on vertailla uusimpia koneoppimiseen pohjautuvia menetelmiä puhujan tunnistamiseen vähäisellä datan määrällä. Puhujan tunnistamisessa tavoitteena on tunnistaa eri puhujat äänidatasta, sen käyttötarkoituksiin sisältyy mm. puhujan diarioiminen ja biometrinen tunnistus äänen avul...

Full description

Bibliographic Details
Main Author: Jokinen, Ville
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:eng
Published: 2021
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/76866
_version_ 1826225752197038080
author Jokinen, Ville
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet Jokinen, Ville Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Jokinen, Ville Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort Jokinen, Ville
datasource_str_mv jyx
description Tutkielman tavoitteena on vertailla uusimpia koneoppimiseen pohjautuvia menetelmiä puhujan tunnistamiseen vähäisellä datan määrällä. Puhujan tunnistamisessa tavoitteena on tunnistaa eri puhujat äänidatasta, sen käyttötarkoituksiin sisältyy mm. puhujan diarioiminen ja biometrinen tunnistus äänen avulla. Tutkielma rajoittuu puhujan tapaukseen, jossa käytettävissä on kaksi lyhyttä nauhoitetta, joko yhdeltä tai kahdelta, ennestään tuntemattomalta puhujalta. Joiden pohjalta pyritään tunnistamaan, sisältävätkö nauhoitteet puhetta samalta puhujalta. Lisäksi tutkielmassa tutkitaan Englanninkielisellä puheella koulutettujen neuroverkkojen tarkkuutta Suomenkieliseen puheeseen sovellettuna. Johon kehitetään sopiva datasetti Suomenkielisen puhekorpuksen pohjalta. Tutkielman tulokset osoittavat uusimpien menetelmien suoriutuvan erinomaisesti. Vaikkakin parhaiden tuloksien saavuttaminen osoittautui vaativan enemmän koulutusdataa kuin mitä tutkielmassa käytetään. Menetelmät yleistyvät hyvin myös suomenkieliselle puheelle siitä huolimatta, että koulutuksessa käytettiin vain englanninkielistä puhetta. Lisäksi tuloksien pohjalta tehdään mielenkiintoisia huomioita vertailuun valittujen muuttujien osalta, joita käytetään neuroverkkojen koulutuksessa. Vertailussa oli menetelmien lisäksi koulutusdatan puhujien määrä, puhe esimerkkin pituus ja äänidatan augmentointi. This thesis sets out to compare recent methods in speaker recognition, from a small amount of data. Speaker recognition aims to distinguish speakers from within audio data containing speech, the use cases include for example speaker diarization and voice biometric authentication. The scope is limited to identification, two samples from one or two distinct previously unknown speakers are provided. With the aim being to identify whether the two samples are spoken by the same speaker. Additionally, the accuracy of networks trained on English speech on Finnish speech is also measured. For which a new dataset, suitable for benchmarking speaker recognition, consisting of Finnish speech was developed from an existing speech recognition dataset. The results show that the latest methods perform very well. However, to achieve the best results it is apparent that more training data is required, than what was used in this thesis. The methods generalized to Finnish speech, despite being trained with English speech. Additionally, interesting observations are made regarding the parameters chosen for training. In addition to comparing different methods, the effects of different number of speakers used for training, various sample lengths and data augmentation are also compared.
first_indexed 2021-06-28T20:01:33Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "K\u00e4rkk\u00e4inen, Tommi", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.advisor", "value": "H\u00e4m\u00e4l\u00e4inen, Joonas", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Jokinen, Ville", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2021-06-28T13:30:50Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2021-06-28T13:30:50Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2021", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/76866", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Tutkielman tavoitteena on vertailla uusimpia koneoppimiseen pohjautuvia menetelmi\u00e4 puhujan tunnistamiseen v\u00e4h\u00e4isell\u00e4 datan m\u00e4\u00e4r\u00e4ll\u00e4. Puhujan tunnistamisessa tavoitteena on tunnistaa eri puhujat \u00e4\u00e4nidatasta, sen k\u00e4ytt\u00f6tarkoituksiin sis\u00e4ltyy mm. puhujan diarioiminen ja biometrinen tunnistus \u00e4\u00e4nen avulla. Tutkielma rajoittuu puhujan tapaukseen, jossa k\u00e4ytett\u00e4viss\u00e4 on kaksi lyhytt\u00e4 nauhoitetta, joko yhdelt\u00e4 tai kahdelta, ennest\u00e4\u00e4n tuntemattomalta puhujalta. Joiden pohjalta pyrit\u00e4\u00e4n tunnistamaan, sis\u00e4lt\u00e4v\u00e4tk\u00f6 nauhoitteet puhetta samalta puhujalta. Lis\u00e4ksi tutkielmassa tutkitaan Englanninkielisell\u00e4 puheella koulutettujen neuroverkkojen tarkkuutta Suomenkieliseen puheeseen sovellettuna. Johon kehitet\u00e4\u00e4n sopiva datasetti Suomenkielisen puhekorpuksen pohjalta.\n\nTutkielman tulokset osoittavat uusimpien menetelmien suoriutuvan erinomaisesti. Vaikkakin parhaiden tuloksien saavuttaminen osoittautui vaativan enemm\u00e4n koulutusdataa kuin mit\u00e4 tutkielmassa k\u00e4ytet\u00e4\u00e4n. Menetelm\u00e4t yleistyv\u00e4t hyvin my\u00f6s suomenkieliselle puheelle siit\u00e4 huolimatta, ett\u00e4 koulutuksessa k\u00e4ytettiin vain englanninkielist\u00e4 puhetta. Lis\u00e4ksi tuloksien pohjalta tehd\u00e4\u00e4n mielenkiintoisia huomioita vertailuun valittujen muuttujien osalta, joita k\u00e4ytet\u00e4\u00e4n neuroverkkojen koulutuksessa. Vertailussa oli menetelmien lis\u00e4ksi koulutusdatan puhujien m\u00e4\u00e4r\u00e4, puhe esimerkkin pituus ja \u00e4\u00e4nidatan augmentointi.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "This thesis sets out to compare recent methods in speaker recognition, from a small amount of data. Speaker recognition aims to distinguish speakers from within audio data containing speech, the use cases include for example speaker diarization and voice biometric authentication. The scope is limited to identification, two samples from one or two distinct previously unknown speakers are provided. With the aim being to identify whether the two samples are spoken by the same speaker. Additionally, the accuracy of networks trained on English speech on Finnish speech is also measured. For which a new dataset, suitable for benchmarking speaker recognition, consisting of Finnish speech was developed from an existing speech recognition dataset.\n\nThe results show that the latest methods perform very well. However, to achieve the best results it is apparent that more training data is required, than what was used in this thesis. The methods generalized to Finnish speech, despite being trained with English speech. Additionally, interesting observations are made regarding the parameters chosen for training. In addition to comparing different methods, the effects of different number of speakers used for training, various sample lengths and data augmentation are also compared.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Miia Hakanen (mihakane@jyu.fi) on 2021-06-28T13:30:50Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2021-06-28T13:30:50Z (GMT). No. of bitstreams: 0\n Previous issue date: 2021", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "69", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "speaker identification", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "few-shot learning", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Few-shot learning for speaker recognition", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202106284056", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietotekniikka", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Mathematical Information Technology", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "602", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "koneoppiminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "puhujantunnistus", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "neuroverkot", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "machine learning", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "speaker recognition", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "neural networks (information technology)", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_76866
language eng
last_indexed 2025-02-18T10:54:35Z
main_date 2021-01-01T00:00:00Z
main_date_str 2021
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/fa822f5b-490c-4d01-b13f-5233a7ff9745\/download","text":"URN:NBN:fi:jyu-202106284056.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2021
record_format qdc
source_str_mv jyx
spellingShingle Jokinen, Ville Few-shot learning for speaker recognition speaker identification few-shot learning Tietotekniikka Mathematical Information Technology 602 koneoppiminen puhujantunnistus neuroverkot machine learning speaker recognition neural networks (information technology)
title Few-shot learning for speaker recognition
title_full Few-shot learning for speaker recognition
title_fullStr Few-shot learning for speaker recognition Few-shot learning for speaker recognition
title_full_unstemmed Few-shot learning for speaker recognition Few-shot learning for speaker recognition
title_short Few-shot learning for speaker recognition
title_sort few shot learning for speaker recognition
title_txtP Few-shot learning for speaker recognition
topic speaker identification few-shot learning Tietotekniikka Mathematical Information Technology 602 koneoppiminen puhujantunnistus neuroverkot machine learning speaker recognition neural networks (information technology)
topic_facet 602 Mathematical Information Technology Tietotekniikka few-shot learning koneoppiminen machine learning neural networks (information technology) neuroverkot puhujantunnistus speaker identification speaker recognition
url https://jyx.jyu.fi/handle/123456789/76866 http://www.urn.fi/URN:NBN:fi:jyu-202106284056
work_keys_str_mv AT jokinenville fewshotlearningforspeakerrecognition