Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa

Tekstin representaatio on kiinteä osa luonnollisen kielen prosessointia, sillä se mahdollistaa luonnollisten kielten laskennallisen analysoinnin. Yleiset representaatiomenetelmät ovat syntaksiin perustuvia. Luonnolliseen kieleen liittyy kuitenkin olennaisesti tulkinnanvaraisuutta, mikä aiheuttaa syn...

Full description

Bibliographic Details
Main Author: Patron, Anri
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Format: Bachelor's thesis
Language:fin
Published: 2019
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/64166
_version_ 1826225800073969664
author Patron, Anri
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet Patron, Anri Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Patron, Anri Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort Patron, Anri
datasource_str_mv jyx
description Tekstin representaatio on kiinteä osa luonnollisen kielen prosessointia, sillä se mahdollistaa luonnollisten kielten laskennallisen analysoinnin. Yleiset representaatiomenetelmät ovat syntaksiin perustuvia. Luonnolliseen kieleen liittyy kuitenkin olennaisesti tulkinnanvaraisuutta, mikä aiheuttaa syntaktisiin representaatioihin vääristymiä. Tutkielmassa tarkastellaan tekstin representointia katkaistulla pääakselihajotelmalla luokitteluongelman näkökulmasta. Pääakselihajotelmalla approksimoimalla tekstiaineistosta voidaan löytää termien ja dokumenttien assosiatiivisten yhteyksien rakenne, jota voidaan käyttää tekstin representointiin. Menetelmällä saatavat tulokset vaikuttavat lupaavilta syntaksiin perustuviin representaatiomentelmiin verrattuna. Text representation is a critical part of natural language processing and a prerequisite for any computational analysis. Popular representational methods are based on syntactic terms. However interpretability of natural language causes noise in syntactic representations. This paper evaluates the use of truncated singular value decomposition as text representation in text categorization. Singular value decomposition is used in transforming original term by document matrix into a subspace where text is represented as associations of terms and documents. Results show truncated singular value decomposition to be promising replacement for syntactic representation methods.
first_indexed 2024-09-11T08:51:28Z
format Kandityö
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Lakanen, Antti-Jussi", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Patron, Anri", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2019-05-24T06:16:00Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2019-05-24T06:16:00Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2019", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/64166", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Tekstin representaatio on kiinte\u00e4 osa luonnollisen kielen prosessointia, sill\u00e4 se mahdollistaa luonnollisten kielten laskennallisen analysoinnin. Yleiset representaatiomenetelm\u00e4t ovat syntaksiin perustuvia. Luonnolliseen kieleen liittyy kuitenkin olennaisesti tulkinnanvaraisuutta, mik\u00e4 aiheuttaa syntaktisiin representaatioihin v\u00e4\u00e4ristymi\u00e4. Tutkielmassa tarkastellaan tekstin representointia katkaistulla p\u00e4\u00e4akselihajotelmalla luokitteluongelman n\u00e4k\u00f6kulmasta. P\u00e4\u00e4akselihajotelmalla approksimoimalla tekstiaineistosta voidaan l\u00f6yt\u00e4\u00e4 termien ja dokumenttien assosiatiivisten yhteyksien rakenne, jota voidaan k\u00e4ytt\u00e4\u00e4 tekstin representointiin. Menetelm\u00e4ll\u00e4 saatavat tulokset vaikuttavat lupaavilta syntaksiin perustuviin representaatiomentelmiin verrattuna.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Text representation is a critical part of natural language processing and a prerequisite for any computational analysis. Popular representational methods are based on syntactic terms. However interpretability of natural language causes noise in syntactic representations. This paper evaluates the use of truncated singular value decomposition as text representation in text categorization. Singular value decomposition is used in transforming original term by document matrix into a subspace where text is represented as associations of terms and documents. Results show truncated singular value decomposition to be promising replacement for syntactic representation methods.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Paivi Vuorio (paelvuor@jyu.fi) on 2019-05-24T06:16:00Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2019-05-24T06:16:00Z (GMT). No. of bitstreams: 0\n Previous issue date: 2019", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "25", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.language.iso", "value": "fin", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "p\u00e4\u00e4akselihajotelma", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "luokittelu", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Tekstin representointi katkaistulla p\u00e4\u00e4akselihajotelmalla luokittelussa", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "bachelor thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-201905242780", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Bachelor's thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Kandidaatinty\u00f6", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietotekniikka", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Mathematical Information Technology", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_7a1f", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "bachelorThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "602", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "representaatio", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "luonnollinen kieli", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "approksimointi", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}]
id jyx.123456789_64166
language fin
last_indexed 2025-02-18T10:56:39Z
main_date 2019-01-01T00:00:00Z
main_date_str 2019
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/f107e64a-a528-425b-91ea-f11dcad52208\/download","text":"URN:NBN:fi:jyu-201905242780.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2019
record_format qdc
source_str_mv jyx
spellingShingle Patron, Anri Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa pääakselihajotelma luokittelu Tietotekniikka Mathematical Information Technology 602 representaatio luonnollinen kieli approksimointi
title Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa
title_full Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa
title_fullStr Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa
title_full_unstemmed Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa
title_short Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa
title_sort tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa
title_txtP Tekstin representointi katkaistulla pääakselihajotelmalla luokittelussa
topic pääakselihajotelma luokittelu Tietotekniikka Mathematical Information Technology 602 representaatio luonnollinen kieli approksimointi
topic_facet 602 Mathematical Information Technology Tietotekniikka approksimointi luokittelu luonnollinen kieli pääakselihajotelma representaatio
url https://jyx.jyu.fi/handle/123456789/64166 http://www.urn.fi/URN:NBN:fi:jyu-201905242780
work_keys_str_mv AT patronanri tekstinrepresentointikatkaistullapääakselihajotelmallaluokittelussa