Semantic annotation and big data techniques for patent information processing

This thesis analyzes approaches to generate semantic annotations on patent records, as well as on other structured data, by relying on the structure and semantic representation of documents. Information in patent records reflects how real-world technologies evolve, and the approximately 3 million...

Full description

Bibliographic Details
Main Author: Mwakyusa, Phesto Enock
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Information Technology, Tietotekniikan laitos, University of Jyväskylä, Jyväskylän yliopisto
Format: Master's thesis
Language:eng
Published: 2017
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/55667
_version_ 1828193101133381632
author Mwakyusa, Phesto Enock
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Information Technology Tietotekniikan laitos University of Jyväskylä Jyväskylän yliopisto
author_facet Mwakyusa, Phesto Enock Informaatioteknologian tiedekunta Faculty of Information Technology Information Technology Tietotekniikan laitos University of Jyväskylä Jyväskylän yliopisto Mwakyusa, Phesto Enock Informaatioteknologian tiedekunta Faculty of Information Technology Information Technology Tietotekniikan laitos University of Jyväskylä Jyväskylän yliopisto
author_sort Mwakyusa, Phesto Enock
datasource_str_mv jyx
description This thesis analyzes approaches to generate semantic annotations on patent records, as well as on other structured data, by relying on the structure and semantic representation of documents. Information in patent records reflects how real-world technologies evolve, and the approximately 3 million annual new patent applications capture the global inventive frontier. The volume of this information is too big to be effectively analyzed purely with human effort, necessitating Big data approaches to analyze it with computer aided tools and techniques. Big data is a term that describes a massive volume of structured, semi structured and unstructured data that is so large to the point that it is difficult to process using tradi- tional database and software tools and techniques. Currently, technical information, such as patents, is typically stored in data repositories that do not support advanced Big data methods to structure and interpret documents. In the emerging Semantic technology, annotation, Web search, as well as interpretation and aggregation can be addressed by ontology-based seman- tic annotation. This thesis examines semantic annotation and other Big data methodologies, and their basic requirements, and reviews the current generation of semantic annotation and other Big data systems. As a use case, this thesis demonstrates how semantic annotation and other Big data techniques are employed to enhance the human processes whereby peo- ple retrieve information, carry out analysis or discovery within a large collection of patent information. Semanttinen annotaatio ja big data-menetelmiä patentti-informaation prosessointiin. Tämä tutkielma analysoi miten luoda semanttisia annotaatioita patenttietueisiin, tai muuhun ei-strukturoituun dataa, hyödyntämällä tietueiden rakennetta tai semanttista representaatiota. Patenttitietueet sisältävät kokonaisuutena informaation siitä, miten reaalimaailman teknologiat kehittyvät ja muuttuvat, ja vuosittain globaalisti julkaistavat noin 3 miljoonaa uutta patenttihakemusta kuvaavat hyvin globaalin keksintörintaman kehitystä. Tämä informaatio on volyymiltaan liian laaja, jotta sitä voisi tehokkasti analysoida ja käsitellä puhtaasti ihmisvoimin. Tästä syystä sen analysointiin tarvitaan erityisiä Big data lähestymistapoja, jotka hyödyntävät tietokoneavusteisia työkaluja ja -prosesseja. Big data on termi joka kuvaa erittäin suurta volyymia strukturoitua, osittain strukturoitua tai strukturoimatonta dataa, joka on niin suuri että sen prosessointi perinteisin tietokanta- tai ohjelmistoteknisin työkaluin tai tekniikoin on vaivalloista. Nykyisin tekninen informaatio, kuten patentit, säilytetään datakokoelmissa, jotka eivät tue edistyneitä Big data menetelmiä strukturoida ja tulkita dokumentteja. Nousevassa Semanttisessa teknologiassa annotaatio, web-haku, sekä tulkinta ja koostaminen käsitellään ontologia-pohjaisella semanttisella annotaatiolla. Tämä tutkielma käsittelee semanttista annotaatiota ja muita Big data menetelmiä ja niiden perusedellytyksiä, sekä tarkastelee nykyaikaisia semanttisen annotaation ja muiden Big data menetelmien järjestelmiä. Tapaustutkimuksena tämä tutkielma osoittaa, miten semanttista annotaatiota ja muita Big data tekniikoita voidaan hyödyntää parantamaan prosesseja, joiden avulla ihmiset hakevat tietoa, tekevät analyysiä tai hakuja erittäin suuresta patentti-informaation kokoelmasta.
first_indexed 2023-03-22T10:00:38Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Cochez, Michael", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.advisor", "value": "Terziyan, Vagan", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Mwakyusa, Phesto Enock", "language": null, "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2017-10-23T13:27:04Z", "language": "", "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2017-10-23T13:27:04Z", "language": "", "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2017", "language": null, "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.other", "value": "oai:jykdok.linneanet.fi:1726691", "language": null, "element": "identifier", "qualifier": "other", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/55667", "language": "", "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "This thesis analyzes approaches to generate semantic annotations on patent records,\r\nas well as on other structured data, by relying on the structure and semantic representation\r\nof documents. Information in patent records reflects how real-world technologies evolve,\r\nand the approximately 3 million annual new patent applications capture the global inventive\r\nfrontier. The volume of this information is too big to be effectively analyzed purely with\r\nhuman effort, necessitating Big data approaches to analyze it with computer aided tools and\r\ntechniques. Big data is a term that describes a massive volume of structured, semi structured\r\nand unstructured data that is so large to the point that it is difficult to process using tradi-\r\ntional database and software tools and techniques. Currently, technical information, such as\r\npatents, is typically stored in data repositories that do not support advanced Big data methods\r\nto structure and interpret documents. In the emerging Semantic technology, annotation, Web\r\nsearch, as well as interpretation and aggregation can be addressed by ontology-based seman-\r\ntic annotation. This thesis examines semantic annotation and other Big data methodologies,\r\nand their basic requirements, and reviews the current generation of semantic annotation and\r\nother Big data systems. As a use case, this thesis demonstrates how semantic annotation\r\nand other Big data techniques are employed to enhance the human processes whereby peo-\r\nple retrieve information, carry out analysis or discovery within a large collection of patent\r\ninformation.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Semanttinen annotaatio ja big data-menetelmi\u00e4 patentti-informaation prosessointiin. \r\n\r\nT\u00e4m\u00e4 tutkielma analysoi miten luoda semanttisia annotaatioita\r\npatenttietueisiin, tai muuhun ei-strukturoituun dataa, hy\u00f6dynt\u00e4m\u00e4ll\u00e4 tietueiden rakennetta\r\ntai semanttista representaatiota. Patenttitietueet sis\u00e4lt\u00e4v\u00e4t kokonaisuutena informaation\r\nsiit\u00e4, miten reaalimaailman teknologiat kehittyv\u00e4t ja muuttuvat, ja vuosittain globaalisti\r\njulkaistavat noin 3 miljoonaa uutta patenttihakemusta kuvaavat hyvin globaalin keksint\u00f6rintaman\r\nkehityst\u00e4. T\u00e4m\u00e4 informaatio on volyymiltaan liian laaja, jotta sit\u00e4 voisi tehokkasti\r\nanalysoida ja k\u00e4sitell\u00e4 puhtaasti ihmisvoimin. T\u00e4st\u00e4 syyst\u00e4 sen analysointiin tarvitaan erityisi\u00e4\r\nBig data l\u00e4hestymistapoja, jotka hy\u00f6dynt\u00e4v\u00e4t tietokoneavusteisia ty\u00f6kaluja ja -prosesseja.\r\nBig data on termi joka kuvaa eritt\u00e4in suurta volyymia strukturoitua, osittain strukturoitua tai\r\nstrukturoimatonta dataa, joka on niin suuri ett\u00e4 sen prosessointi perinteisin tietokanta- tai\r\nohjelmistoteknisin ty\u00f6kaluin tai tekniikoin on vaivalloista. Nykyisin tekninen informaatio,\r\nkuten patentit, s\u00e4ilytet\u00e4\u00e4n datakokoelmissa, jotka eiv\u00e4t tue edistyneit\u00e4 Big data menetelmi\u00e4\r\nstrukturoida ja tulkita dokumentteja. Nousevassa Semanttisessa teknologiassa annotaatio,\r\nweb-haku, sek\u00e4 tulkinta ja koostaminen k\u00e4sitell\u00e4\u00e4n ontologia-pohjaisella semanttisella annotaatiolla.\r\nT\u00e4m\u00e4 tutkielma k\u00e4sittelee semanttista annotaatiota ja muita Big data menetelmi\u00e4\r\nja niiden perusedellytyksi\u00e4, sek\u00e4 tarkastelee nykyaikaisia semanttisen annotaation ja muiden\r\nBig data menetelmien j\u00e4rjestelmi\u00e4. Tapaustutkimuksena t\u00e4m\u00e4 tutkielma osoittaa, miten semanttista\r\nannotaatiota ja muita Big data tekniikoita voidaan hy\u00f6dynt\u00e4\u00e4 parantamaan prosesseja,\r\njoiden avulla ihmiset hakevat tietoa, tekev\u00e4t analyysi\u00e4 tai hakuja eritt\u00e4in suuresta\r\npatentti-informaation kokoelmasta.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted using Plone Publishing form by Phesto Mwakyusa (phenmwak) on 2017-10-23 13:27:03.498517. Form: Master's Thesis publishing form (https://kirjasto.jyu.fi/publish-and-buy/publishing-forms/masters-thesis-publishing-form). JyX data: [jyx_publishing-allowed (fi) =True]", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by jyx lomake-julkaisija (jyx-julkaisija.group@korppi.jyu.fi) on 2017-10-23T13:27:04Z\r\nNo. of bitstreams: 2\r\nURN:NBN:fi:jyu-201710234047.pdf: 2171254 bytes, checksum: d089a7c4a26de0b892867d97a46b3c3d (MD5)\r\nlicense.html: 4300 bytes, checksum: 743f3d19b422e15d1e0bbdfbd9d15911 (MD5)", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2017-10-23T13:27:04Z (GMT). No. of bitstreams: 2\r\nURN:NBN:fi:jyu-201710234047.pdf: 2171254 bytes, checksum: d089a7c4a26de0b892867d97a46b3c3d (MD5)\r\nlicense.html: 4300 bytes, checksum: 743f3d19b422e15d1e0bbdfbd9d15911 (MD5)\r\n Previous issue date: 2017", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "1 verkkoaineisto (73 sivua)", "language": null, "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "semanttinen annotointi", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Data Mining", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Semantic annotation", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Patent information", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Semantic annotation and big data techniques for patent information processing", "language": null, "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-201710234047", "language": null, "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Tietotekniikan laitos", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietotekniikka", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Mathematical Information Technology", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.date.updated", "value": "2017-10-23T13:27:04Z", "language": "", "element": "date", "qualifier": "updated", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": "fi", "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "602", "language": null, "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "big data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tiedonlouhinta", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "patentit", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "annotointi", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_55667
language eng
last_indexed 2025-03-31T20:02:43Z
main_date 2017-01-01T00:00:00Z
main_date_str 2017
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/21f5a567-8ceb-449b-aac7-ecd1db154c87\/download","text":"URN:NBN:fi:jyu-201710234047.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2017
record_format qdc
source_str_mv jyx
spellingShingle Mwakyusa, Phesto Enock Semantic annotation and big data techniques for patent information processing semanttinen annotointi Data Mining Semantic annotation Patent information Tietotekniikka Mathematical Information Technology 602 big data tiedonlouhinta patentit annotointi
title Semantic annotation and big data techniques for patent information processing
title_full Semantic annotation and big data techniques for patent information processing
title_fullStr Semantic annotation and big data techniques for patent information processing Semantic annotation and big data techniques for patent information processing
title_full_unstemmed Semantic annotation and big data techniques for patent information processing Semantic annotation and big data techniques for patent information processing
title_short Semantic annotation and big data techniques for patent information processing
title_sort semantic annotation and big data techniques for patent information processing
title_txtP Semantic annotation and big data techniques for patent information processing
topic semanttinen annotointi Data Mining Semantic annotation Patent information Tietotekniikka Mathematical Information Technology 602 big data tiedonlouhinta patentit annotointi
topic_facet 602 Data Mining Mathematical Information Technology Patent information Semantic annotation Tietotekniikka annotointi big data patentit semanttinen annotointi tiedonlouhinta
url https://jyx.jyu.fi/handle/123456789/55667 http://www.urn.fi/URN:NBN:fi:jyu-201710234047
work_keys_str_mv AT mwakyusaphestoenock semanticannotationandbigdatatechniquesforpatentinformationprocessing