Tiedonlouhinta rakenteisista dokumenteista

Tutkielman kokonaistavoite on vastata tietotulvan tuomiin haasteisiin tiedonlouhinnan tekniikoita käyttäen. Yleisenä tutkimuskohteena on tiedonlouhinta rakenteisista dokumenteista. Täsmällisemmin määriteltynä tutkimusongelma käsittää samaa skeemaa noudattavien XML-dokumenttien klusteroinnin ja tiedo...

Full description

Bibliographic Details
Main Author: Nurminen, Miika
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Tietotekniikan laitos, Department of Mathematical Information Technology, University of Jyväskylä, Jyväskylän yliopisto
Format: Master's thesis
Language:fin
Published: 2005
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/12507
_version_ 1826225712911089664
author Nurminen, Miika
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Tietotekniikan laitos Department of Mathematical Information Technology University of Jyväskylä Jyväskylän yliopisto
author_facet Nurminen, Miika Informaatioteknologian tiedekunta Faculty of Information Technology Tietotekniikan laitos Department of Mathematical Information Technology University of Jyväskylä Jyväskylän yliopisto Nurminen, Miika Informaatioteknologian tiedekunta Faculty of Information Technology Tietotekniikan laitos Department of Mathematical Information Technology University of Jyväskylä Jyväskylän yliopisto
author_sort Nurminen, Miika
datasource_str_mv jyx
description Tutkielman kokonaistavoite on vastata tietotulvan tuomiin haasteisiin tiedonlouhinnan tekniikoita käyttäen. Yleisenä tutkimuskohteena on tiedonlouhinta rakenteisista dokumenteista. Täsmällisemmin määriteltynä tutkimusongelma käsittää samaa skeemaa noudattavien XML-dokumenttien klusteroinnin ja tiedonhaun. Lisäksi käsitellään erilaisten haku- ja klusterointitekniikoiden yhdistämisen tuomia mahdollisuuksia dokumenttikokoelmien hahmottamisessa. Teoreettisessa osuudessa käydään läpi erilaisia indeksirakenteita, samanlaisuusmittoja, klusterointialgoritmeja ja hakumenetelmiä. Empiirisessä osuudessa on kehitetty ExtMiner-sovellus, joka tukee hakua, klusterointia ja visualisointia erilaisille XML-dokumenttikokoelmille. The overall objective of this thesis is to consider the challenges posed by information overflow using data mining techniques. The research concentrates on data mining from structured documents. More precisely, the research problem involves information retrieval and clustering from XML documents conforming to the same schema. The potential of combining various search and clustering techniques in order to comprehend document collections is considered. Various index structures, similarity measures, clustering algorithms and ranking techniques are reviewed in the theoretical part of this thesis. In the empirical part the ExtMiner-application is developed, supporting searching, clustering and visualization for various XML document collections.
first_indexed 2023-03-22T09:58:09Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "K\u00e4rkk\u00e4inen, Tommi", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.advisor", "value": "\u00c4yr\u00e4m\u00f6, Sami", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Nurminen, Miika", "language": null, "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2008-01-08T09:30:29Z", "language": "", "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2008-01-08T09:30:29Z", "language": "", "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2005", "language": null, "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.other", "value": "oai:jykdok.linneanet.fi:959314", "language": null, "element": "identifier", "qualifier": "other", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/12507", "language": "", "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Tutkielman kokonaistavoite on vastata tietotulvan tuomiin haasteisiin tiedonlouhinnan tekniikoita k\u00e4ytt\u00e4en. Yleisen\u00e4 tutkimuskohteena on tiedonlouhinta rakenteisista dokumenteista. T\u00e4sm\u00e4llisemmin m\u00e4\u00e4riteltyn\u00e4 tutkimusongelma k\u00e4sitt\u00e4\u00e4 samaa skeemaa noudattavien XML-dokumenttien klusteroinnin ja tiedonhaun. Lis\u00e4ksi k\u00e4sitell\u00e4\u00e4n erilaisten haku- ja klusterointitekniikoiden yhdist\u00e4misen tuomia mahdollisuuksia dokumenttikokoelmien hahmottamisessa. Teoreettisessa osuudessa k\u00e4yd\u00e4\u00e4n l\u00e4pi erilaisia indeksirakenteita, samanlaisuusmittoja, klusterointialgoritmeja ja hakumenetelmi\u00e4. Empiirisess\u00e4 osuudessa on kehitetty ExtMiner-sovellus, joka tukee hakua, klusterointia ja visualisointia erilaisille XML-dokumenttikokoelmille.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "The overall objective of this thesis is to consider the challenges posed by information overflow using data mining techniques. The research concentrates on data mining from structured documents. More precisely, the research problem involves information retrieval and clustering from XML documents conforming to the same schema. The potential of combining various search and clustering techniques in order to comprehend document collections is considered. Various index structures, similarity measures, clustering algorithms and ranking techniques are reviewed in the theoretical part of this thesis. In the empirical part the ExtMiner-application is developed, supporting searching, clustering and visualization for various XML document collections.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2008-01-08T09:30:29Z (GMT). No. of bitstreams: 1\r\nURN_NBN_fi_jyu-200594.pdf: 1429805 bytes, checksum: 0e8b600815fd2880f8f9db57c783b961 (MD5)\r\n Previous issue date: 2005", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "137 sivua", "language": null, "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "fin", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "dokumenttien klusterointi", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "tiedonlouhinta", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Tiedonlouhinta rakenteisista dokumenteista", "language": null, "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-200594", "language": null, "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.dcmitype", "value": "Text", "language": "en", "element": "type", "qualifier": "dcmitype", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Tietotekniikan laitos", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Department of Mathematical Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietotekniikka", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Mathematical Information Technology", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": "fi", "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "602", "language": null, "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "bibliometriikka", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "klusterit", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "hakuohjelmat", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "monimuuttujamenetelm\u00e4t", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tekstitietokannat", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tiedonhaku", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tiedonhakuj\u00e4rjestelm\u00e4t", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "XML", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_12507
language fin
last_indexed 2025-02-18T10:54:49Z
main_date 2005-01-01T00:00:00Z
main_date_str 2005
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/56e093c7-40d6-491a-891c-c505dfe81df2\/download","text":"URN_NBN_fi_jyu-200594.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2005
record_format qdc
source_str_mv jyx
spellingShingle Nurminen, Miika Tiedonlouhinta rakenteisista dokumenteista dokumenttien klusterointi tiedonlouhinta Tietotekniikka Mathematical Information Technology 602 bibliometriikka klusterit hakuohjelmat monimuuttujamenetelmät tekstitietokannat tiedonhaku tiedonhakujärjestelmät XML
title Tiedonlouhinta rakenteisista dokumenteista
title_full Tiedonlouhinta rakenteisista dokumenteista
title_fullStr Tiedonlouhinta rakenteisista dokumenteista Tiedonlouhinta rakenteisista dokumenteista
title_full_unstemmed Tiedonlouhinta rakenteisista dokumenteista Tiedonlouhinta rakenteisista dokumenteista
title_short Tiedonlouhinta rakenteisista dokumenteista
title_sort tiedonlouhinta rakenteisista dokumenteista
title_txtP Tiedonlouhinta rakenteisista dokumenteista
topic dokumenttien klusterointi tiedonlouhinta Tietotekniikka Mathematical Information Technology 602 bibliometriikka klusterit hakuohjelmat monimuuttujamenetelmät tekstitietokannat tiedonhaku tiedonhakujärjestelmät XML
topic_facet 602 Mathematical Information Technology Tietotekniikka XML bibliometriikka dokumenttien klusterointi hakuohjelmat klusterit monimuuttujamenetelmät tekstitietokannat tiedonhaku tiedonhakujärjestelmät tiedonlouhinta
url https://jyx.jyu.fi/handle/123456789/12507 http://www.urn.fi/URN:NBN:fi:jyu-200594
work_keys_str_mv AT nurminenmiika tiedonlouhintarakenteisistadokumenteista