Luonnollisen suomen kielen ymmärtäminen koneellisesti

Tässä tutkimuksessa selvitettiin, miten luonnollisen kielen ymmärtämiseen rakennetut teknologiat soveltuvat suomen kielen käsittelyyn. Tutkimusosuuksissa selvisi, että vain harvat teknologioista tukevat suomen kieltä. Kielten tukitaso vaikutti perustuvan täysin palveluntarjoajien omaan käsitykseen k...

Full description

Bibliographic Details
Main Authors: Lehtomäki, Eerik, Kukkaniemi, Riku
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:fin
Published: 2020
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/69035
_version_ 1826225741879050240
author Lehtomäki, Eerik Kukkaniemi, Riku
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet Lehtomäki, Eerik Kukkaniemi, Riku Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Lehtomäki, Eerik Kukkaniemi, Riku Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort Lehtomäki, Eerik
datasource_str_mv jyx
description Tässä tutkimuksessa selvitettiin, miten luonnollisen kielen ymmärtämiseen rakennetut teknologiat soveltuvat suomen kielen käsittelyyn. Tutkimusosuuksissa selvisi, että vain harvat teknologioista tukevat suomen kieltä. Kielten tukitaso vaikutti perustuvan täysin palveluntarjoajien omaan käsitykseen kielituen laajuudesta. Teknologioiden isoimmaksi ongelmaksi muodostui suomen kielen kohdalla taivutusmuodossa olevien sanojen käsittely. Teknologiat pystyivät käsittelemään sanoja ainoastaan siinä muodossa, jossa sanat oltiin teknologioille opetettu. Tämä tarkoittaa sitä, että teknologioiden toiminta suomen kielellä vaatisi kattavan opetusdatan, jossa tulisi ottaa tunnistettavien sanojen lisäksi huomioon kaikki sanojen taivutusmuodot. Tutkimuksessa tähän ongelmaan löytyi ratkaisu lemmauksesta, jonka avulla sanat pystyttiin muuttamaan perusmuotoon ennen teknologioiden käsittelyä. This study investigated how technologies built for understanding natural language are applicable to Finnish language processing. The research revealed that only a few technologies support the Finnish language. The level of language support seemed to be based entirely on service providers’ own perception of the scope of language support. The biggest problem with technologies in the Finnish language was the processing of inflectional forms of words. Technologies could only handle words in the form in which the words were taught to the technologies. This means that the operation of technologies in the Finnish language would require comprehensive instructional data, which should include not only identifiable words but also any possible inflectional form. The study found a solution to this problem in lemmatisation, which allowed words to be transformed into their basic form before the technologies processed them.
first_indexed 2020-05-18T20:00:49Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Nieminen, Paavo", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.advisor", "value": "\u00c4yr\u00e4m\u00f6, Sami", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Lehtom\u00e4ki, Eerik", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Kukkaniemi, Riku", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2020-05-18T12:38:31Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2020-05-18T12:38:31Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2020", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/69035", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "T\u00e4ss\u00e4 tutkimuksessa selvitettiin, miten luonnollisen kielen ymm\u00e4rt\u00e4miseen rakennetut teknologiat soveltuvat suomen kielen k\u00e4sittelyyn. Tutkimusosuuksissa selvisi, ett\u00e4 vain harvat teknologioista tukevat suomen kielt\u00e4. Kielten tukitaso vaikutti perustuvan t\u00e4ysin palveluntarjoajien omaan k\u00e4sitykseen kielituen laajuudesta. \n\nTeknologioiden isoimmaksi ongelmaksi muodostui suomen kielen kohdalla taivutusmuodossa olevien sanojen k\u00e4sittely. Teknologiat pystyiv\u00e4t k\u00e4sittelem\u00e4\u00e4n sanoja ainoastaan siin\u00e4 muodossa, jossa sanat oltiin teknologioille opetettu. T\u00e4m\u00e4 tarkoittaa sit\u00e4, ett\u00e4 teknologioiden toiminta suomen kielell\u00e4 vaatisi kattavan opetusdatan, jossa tulisi ottaa tunnistettavien sanojen lis\u00e4ksi huomioon kaikki sanojen taivutusmuodot. Tutkimuksessa t\u00e4h\u00e4n ongelmaan l\u00f6ytyi ratkaisu lemmauksesta, jonka avulla sanat pystyttiin muuttamaan perusmuotoon ennen teknologioiden k\u00e4sittely\u00e4.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "This study investigated how technologies built for understanding natural language are applicable to Finnish language processing. The research revealed that only a few technologies support the Finnish language. The level of language support seemed to be based entirely on service providers\u2019 own perception of the scope of language support.\n\nThe biggest problem with technologies in the Finnish language was the processing of inflectional forms of words. Technologies could only handle words in the form in which the words were taught to the technologies. This means that the operation of technologies in the Finnish language would require comprehensive instructional data, which should include not only identifiable words but also any possible inflectional form. The study found a solution to this problem in lemmatisation, which allowed words to be transformed into their basic form before the technologies processed them.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Paivi Vuorio (paelvuor@jyu.fi) on 2020-05-18T12:38:31Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2020-05-18T12:38:31Z (GMT). No. of bitstreams: 0\n Previous issue date: 2020", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "71", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "fin", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "luonnollisen kielen k\u00e4sittely", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "luonnollisen kielen ymm\u00e4rt\u00e4minen", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "NLP", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "NLU", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "keskustelubotti", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "entiteetti", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "sanaluokittelu", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "saneistus", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "normalisointi", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "lemmaus", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "stemmaus", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Dialogflow", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Wit.ai", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "LUIS", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Watson Assistant", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Amazon Lex", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Recast.ai", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Rasa", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Snips", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Luonnollisen suomen kielen ymm\u00e4rt\u00e4minen koneellisesti", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202005183289", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietotekniikka", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Mathematical Information Technology", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "602", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "suomen kieli", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "neuroverkot", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "koneoppiminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "intentio", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "teko\u00e4ly", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "luonnollinen kieli", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_69035
language fin
last_indexed 2025-02-18T10:54:55Z
main_date 2020-01-01T00:00:00Z
main_date_str 2020
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/dd6edafc-4016-4bd7-a013-758b3133cee0\/download","text":"URN:NBN:fi:jyu-202005183289.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2020
record_format qdc
source_str_mv jyx
spellingShingle Lehtomäki, Eerik Kukkaniemi, Riku Luonnollisen suomen kielen ymmärtäminen koneellisesti luonnollisen kielen käsittely luonnollisen kielen ymmärtäminen NLP NLU keskustelubotti entiteetti sanaluokittelu saneistus normalisointi lemmaus stemmaus Dialogflow Wit.ai LUIS Watson Assistant Amazon Lex Recast.ai Rasa Snips Tietotekniikka Mathematical Information Technology 602 suomen kieli neuroverkot koneoppiminen intentio tekoäly luonnollinen kieli
title Luonnollisen suomen kielen ymmärtäminen koneellisesti
title_full Luonnollisen suomen kielen ymmärtäminen koneellisesti
title_fullStr Luonnollisen suomen kielen ymmärtäminen koneellisesti Luonnollisen suomen kielen ymmärtäminen koneellisesti
title_full_unstemmed Luonnollisen suomen kielen ymmärtäminen koneellisesti Luonnollisen suomen kielen ymmärtäminen koneellisesti
title_short Luonnollisen suomen kielen ymmärtäminen koneellisesti
title_sort luonnollisen suomen kielen ymmärtäminen koneellisesti
title_txtP Luonnollisen suomen kielen ymmärtäminen koneellisesti
topic luonnollisen kielen käsittely luonnollisen kielen ymmärtäminen NLP NLU keskustelubotti entiteetti sanaluokittelu saneistus normalisointi lemmaus stemmaus Dialogflow Wit.ai LUIS Watson Assistant Amazon Lex Recast.ai Rasa Snips Tietotekniikka Mathematical Information Technology 602 suomen kieli neuroverkot koneoppiminen intentio tekoäly luonnollinen kieli
topic_facet 602 Amazon Lex Dialogflow LUIS Mathematical Information Technology NLP NLU Rasa Recast.ai Snips Tietotekniikka Watson Assistant Wit.ai entiteetti intentio keskustelubotti koneoppiminen lemmaus luonnollinen kieli luonnollisen kielen käsittely luonnollisen kielen ymmärtäminen neuroverkot normalisointi sanaluokittelu saneistus stemmaus suomen kieli tekoäly
url https://jyx.jyu.fi/handle/123456789/69035 http://www.urn.fi/URN:NBN:fi:jyu-202005183289
work_keys_str_mv AT lehtomäkieerik luonnollisensuomenkielenymmärtäminenkoneellisesti