Word sense disambiguation for Finnish with an application to language learning

Tehtävää sanan oikean merkityksen määritämiseksi automattisesti jossakin luonnollisen kielen ilmaisussa kutsutaan saneiden alamerkitysten yksiselitteistämiseksi. Tämä pro gradu -tutkielma kuvaa saneiden alamerkitysten yksiselitteistämisen itoimeenpanoa ja arviointia suomen kielelle, ja sitä motivoi...

Full description

Bibliographic Details
Main Author: Robertson, Frankie
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:eng
Published: 2020
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/68477
_version_ 1828193069372014592
author Robertson, Frankie
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet Robertson, Frankie Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Robertson, Frankie Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort Robertson, Frankie
datasource_str_mv jyx
description Tehtävää sanan oikean merkityksen määritämiseksi automattisesti jossakin luonnollisen kielen ilmaisussa kutsutaan saneiden alamerkitysten yksiselitteistämiseksi. Tämä pro gradu -tutkielma kuvaa saneiden alamerkitysten yksiselitteistämisen itoimeenpanoa ja arviointia suomen kielelle, ja sitä motivoi tämän tehtävän uudenlainen soveltaminen tietokoneavusteiseen kielen oppimiseen. Tutkielmassa kaksikieliseen tekstitysaineistoon pohjaava sanojen alamerkitysten mukaan annotoitu korpus on luotu automattisesti palvelemaan opetusaineistona koneoppimiseen pohjautuville saneiden alamerkitysten yksiselitteistämisen tekniikoille. Seuravaksi saneiden alamerkitysten yksiselitteistämisen algoritmeja on muokattu suomen kielelle ja arvioitu niiden F1-mitan mukaan. Sen jälkeen on rakennettu sekä leksikaalinen tietämyskanta klusteroimalla ja tunnistamalla vastaavuuksia että välineet kompleksisten lekseemien poimimiseen ja analysointiin. Lopuksi on esitelty NiinMikäOli?!, tietokoneavusteinen kielen oppimisen väline, joka käyttää saneiden alamerkitysten yksiselitteistämistä uudella leksikaalisella resurssilla tarjotakseen sanojen rakenteeseen ja merkitykseen liittyvää kontekstisidonaista apua kielenoppijoille. Lisäksi on selitetty NiinMikäOli?!:n rakentamista ja käyttöliittymää ohjaavat suunnittelun periaatteet. The task of automatically determining the correct meaning of a word within some natural language utterance is referred to as Word Sense Disambiguation (WSD). This thesis describes the implementation and evaluation of WSD for the Finnish language, motivated by its novel application to Computer Aided Language Learning (CALL). To serve as training data for Machine Learning (ML) based WSD techniques, a sense-annotated corpus is automatically created based on a collection of bilingual subtitles. Next, several WSD algorithms are adapted to Finnish and evaluated according to their F1-measure. Then, a Lexical Knowledge Base (LKB) is constructed by clustering and aligning existing resources, and tools to extract and analyse complex lexical units are created. Finally, TheWhatNow?!, a CALL tool which uses WSD on this new lexical resource to offer in context help related to word structure and meaning to language learners is introduced and the design principles guiding its construction and user interface are expounded.
first_indexed 2024-09-11T08:52:18Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Cochez, Michael", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Robertson, Frankie", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2020-04-07T07:53:06Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2020-04-07T07:53:06Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2020", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/68477", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Teht\u00e4v\u00e4\u00e4 sanan oikean merkityksen m\u00e4\u00e4rit\u00e4miseksi automattisesti jossakin luonnollisen kielen ilmaisussa kutsutaan saneiden alamerkitysten yksiselitteist\u00e4miseksi. T\u00e4m\u00e4 pro gradu -tutkielma kuvaa saneiden alamerkitysten yksiselitteist\u00e4misen itoimeenpanoa ja arviointia suomen kielelle, ja sit\u00e4 motivoi t\u00e4m\u00e4n teht\u00e4v\u00e4n uudenlainen soveltaminen tietokoneavusteiseen kielen oppimiseen. Tutkielmassa kaksikieliseen tekstitysaineistoon pohjaava sanojen alamerkitysten mukaan annotoitu korpus on luotu automattisesti palvelemaan opetusaineistona koneoppimiseen pohjautuville saneiden alamerkitysten yksiselitteist\u00e4misen tekniikoille. Seuravaksi saneiden alamerkitysten yksiselitteist\u00e4misen algoritmeja on muokattu suomen kielelle ja arvioitu niiden F1-mitan mukaan. Sen j\u00e4lkeen on rakennettu sek\u00e4 leksikaalinen tiet\u00e4myskanta klusteroimalla ja tunnistamalla vastaavuuksia ett\u00e4 v\u00e4lineet kompleksisten lekseemien poimimiseen ja analysointiin. Lopuksi on esitelty NiinMik\u00e4Oli?!, tietokoneavusteinen kielen oppimisen v\u00e4line, joka k\u00e4ytt\u00e4\u00e4 saneiden alamerkitysten yksiselitteist\u00e4mist\u00e4 uudella leksikaalisella resurssilla tarjotakseen sanojen rakenteeseen ja merkitykseen liittyv\u00e4\u00e4 kontekstisidonaista apua kielenoppijoille. Lis\u00e4ksi on selitetty NiinMik\u00e4Oli?!:n rakentamista ja k\u00e4ytt\u00f6liittym\u00e4\u00e4 ohjaavat suunnittelun periaatteet.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "The task of automatically determining the correct meaning of a word within some natural language utterance is referred to as Word Sense Disambiguation (WSD). This thesis describes the implementation and evaluation of WSD for the Finnish language, motivated by its novel application to Computer Aided Language Learning (CALL). To serve as training data for Machine Learning (ML) based WSD techniques, a sense-annotated corpus is automatically created based on a collection of bilingual subtitles. Next, several WSD algorithms are adapted to Finnish and evaluated according to their F1-measure. Then, a Lexical Knowledge Base (LKB) is constructed by clustering and aligning existing resources, and tools to extract and analyse complex lexical units are created. Finally, TheWhatNow?!, a CALL tool which uses WSD on this new lexical resource to offer in context help related to word structure and meaning to language learners is introduced and the design principles guiding its construction and user interface are expounded.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Paivi Vuorio (paelvuor@jyu.fi) on 2020-04-07T07:53:05Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2020-04-07T07:53:06Z (GMT). No. of bitstreams: 0\n Previous issue date: 2020", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "191", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "word sense disambiguation", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "computer aided language learning", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "saneiden alamerkitysten yksiselitteist\u00e4minen", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "tietokoneavusteinen kielen oppiminen", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Word sense disambiguation for Finnish with an application to language learning", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202004072692", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietotekniikka", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Mathematical Information Technology", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "602", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "muoto-oppi (kielitiede)", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tietokoneavusteinen oppiminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "sanasemantiikka", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "kieli ja kielet", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "kieliteknologia", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tietokonelingvistiikka", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "suomen kieli", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "kielen oppiminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "arviointi", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "toinen kieli", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "morphology (grammar)", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "computer-assisted learning", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "lexical semantics", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "languages", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "language technology", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "computer linguistics", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "Finnish language", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "language learning", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "evaluation", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "second language", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_68477
language eng
last_indexed 2025-03-31T20:03:27Z
main_date 2020-01-01T00:00:00Z
main_date_str 2020
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/5423fd45-b8d1-4335-bf93-aa7d27b17554\/download","text":"URN:NBN:fi:jyu-202004072692.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2020
record_format qdc
source_str_mv jyx
spellingShingle Robertson, Frankie Word sense disambiguation for Finnish with an application to language learning word sense disambiguation computer aided language learning saneiden alamerkitysten yksiselitteistäminen tietokoneavusteinen kielen oppiminen Tietotekniikka Mathematical Information Technology 602 muoto-oppi (kielitiede) tietokoneavusteinen oppiminen sanasemantiikka kieli ja kielet kieliteknologia tietokonelingvistiikka suomen kieli kielen oppiminen arviointi toinen kieli morphology (grammar) computer-assisted learning lexical semantics languages language technology computer linguistics Finnish language language learning evaluation second language
title Word sense disambiguation for Finnish with an application to language learning
title_full Word sense disambiguation for Finnish with an application to language learning
title_fullStr Word sense disambiguation for Finnish with an application to language learning Word sense disambiguation for Finnish with an application to language learning
title_full_unstemmed Word sense disambiguation for Finnish with an application to language learning Word sense disambiguation for Finnish with an application to language learning
title_short Word sense disambiguation for Finnish with an application to language learning
title_sort word sense disambiguation for finnish with an application to language learning
title_txtP Word sense disambiguation for Finnish with an application to language learning
topic word sense disambiguation computer aided language learning saneiden alamerkitysten yksiselitteistäminen tietokoneavusteinen kielen oppiminen Tietotekniikka Mathematical Information Technology 602 muoto-oppi (kielitiede) tietokoneavusteinen oppiminen sanasemantiikka kieli ja kielet kieliteknologia tietokonelingvistiikka suomen kieli kielen oppiminen arviointi toinen kieli morphology (grammar) computer-assisted learning lexical semantics languages language technology computer linguistics Finnish language language learning evaluation second language
topic_facet 602 Finnish language Mathematical Information Technology Tietotekniikka arviointi computer aided language learning computer linguistics computer-assisted learning evaluation kielen oppiminen kieli ja kielet kieliteknologia language learning language technology languages lexical semantics morphology (grammar) muoto-oppi (kielitiede) sanasemantiikka saneiden alamerkitysten yksiselitteistäminen second language suomen kieli tietokoneavusteinen kielen oppiminen tietokoneavusteinen oppiminen tietokonelingvistiikka toinen kieli word sense disambiguation
url https://jyx.jyu.fi/handle/123456789/68477 http://www.urn.fi/URN:NBN:fi:jyu-202004072692
work_keys_str_mv AT robertsonfrankie wordsensedisambiguationforfinnishwithanapplicationtolanguagelearning