A study on the embedding spaces of the BERT language model

Tässä työssä käsitellään luonnollisen kielen käsittelynä tunnettua tekoälyn osa-aluetta. Työssä keskitytään niin kutsuttuun transformer-arkkitehtuuriin pohjautuvaan BERT-nimiseen tekoälymalliin. Erityisesti työssä tarkastellaan tämän mallin upotusvektoreita, jotka kuvastavat mallin sisäistä luonnoll...

Full description

Bibliographic Details
Main Author: Luisto, Rami
Other Authors: Faculty of Information Technology, Informaatioteknologian tiedekunta, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:eng
Published: 2024
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/97810
_version_ 1826225684603731968
author Luisto, Rami
author2 Faculty of Information Technology Informaatioteknologian tiedekunta Jyväskylän yliopisto University of Jyväskylä
author_facet Luisto, Rami Faculty of Information Technology Informaatioteknologian tiedekunta Jyväskylän yliopisto University of Jyväskylä Luisto, Rami Faculty of Information Technology Informaatioteknologian tiedekunta Jyväskylän yliopisto University of Jyväskylä
author_sort Luisto, Rami
datasource_str_mv jyx
description Tässä työssä käsitellään luonnollisen kielen käsittelynä tunnettua tekoälyn osa-aluetta. Työssä keskitytään niin kutsuttuun transformer-arkkitehtuuriin pohjautuvaan BERT-nimiseen tekoälymalliin. Erityisesti työssä tarkastellaan tämän mallin upotusvektoreita, jotka kuvastavat mallin sisäistä luonnollisen tekstin esitysmuotoa. This thesis considers a subfield of artificial intelligence called Natural Language Processing (NLP). More specifically we study a language model named BERT based on the so called \emph{transformer} architecture, and the internal language representation of BERT called embedding vectors.
first_indexed 2024-10-29T21:00:27Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "\u00c4yr\u00e4m\u00f6, Sami", "language": null, "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Luisto, Rami", "language": null, "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2024-10-29T08:30:49Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2024-10-29T08:30:49Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2024", "language": null, "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/97810", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "T\u00e4ss\u00e4 ty\u00f6ss\u00e4 k\u00e4sitell\u00e4\u00e4n luonnollisen kielen k\u00e4sittelyn\u00e4 tunnettua teko\u00e4lyn osa-aluetta. Ty\u00f6ss\u00e4 keskityt\u00e4\u00e4n niin kutsuttuun transformer-arkkitehtuuriin pohjautuvaan BERT-nimiseen teko\u00e4lymalliin. Erityisesti ty\u00f6ss\u00e4 tarkastellaan t\u00e4m\u00e4n mallin upotusvektoreita, jotka kuvastavat mallin sis\u00e4ist\u00e4 luonnollisen tekstin esitysmuotoa.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "This thesis considers a subfield of artificial intelligence called Natural Language Processing (NLP). More specifically we study a language model named BERT based on the so called \\emph{transformer} architecture, and the internal language representation of BERT called embedding vectors.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by jyx lomake-julkaisija (jyx-julkaisija.group@korppi.jyu.fi) on 2024-10-29T08:30:49Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2024-10-29T08:30:49Z (GMT). No. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "95", "language": null, "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "CC BY 4.0", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.title", "value": "A study on the embedding spaces of the BERT language model", "language": null, "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202410296663", "language": null, "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Master's Degree Programme in Computer Science", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietojenk\u00e4sittelytieteen maisteriohjelma", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.copyright", "value": "\u00a9 The Author(s)", "language": null, "element": "rights", "qualifier": "copyright", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://creativecommons.org/licenses/by/4.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}]
id jyx.123456789_97810
language eng
last_indexed 2025-02-18T10:55:43Z
main_date 2024-01-01T00:00:00Z
main_date_str 2024
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/cec568f6-b611-4a8b-a623-bccc74b90079\/download","text":"URN:NBN:fi:jyu-202410296663.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2024
record_format qdc
source_str_mv jyx
spellingShingle Luisto, Rami A study on the embedding spaces of the BERT language model Master's Degree Programme in Computer Science Tietojenkäsittelytieteen maisteriohjelma
title A study on the embedding spaces of the BERT language model
title_full A study on the embedding spaces of the BERT language model
title_fullStr A study on the embedding spaces of the BERT language model A study on the embedding spaces of the BERT language model
title_full_unstemmed A study on the embedding spaces of the BERT language model A study on the embedding spaces of the BERT language model
title_short A study on the embedding spaces of the BERT language model
title_sort study on the embedding spaces of the bert language model
title_txtP A study on the embedding spaces of the BERT language model
topic Master's Degree Programme in Computer Science Tietojenkäsittelytieteen maisteriohjelma
topic_facet Master's Degree Programme in Computer Science Tietojenkäsittelytieteen maisteriohjelma
url https://jyx.jyu.fi/handle/123456789/97810 http://www.urn.fi/URN:NBN:fi:jyu-202410296663
work_keys_str_mv AT luistorami astudyontheembeddingspacesofthebertlanguagemodel AT luistorami studyontheembeddingspacesofthebertlanguagemodel