Smart prototype selection for machine learning based on ignorance zones analysis

The size of databases has been considerably growing over recent decades and Machine Learning algorithms are not ready to process such large volume of information. Being one of the most useful algorithms in Data Mining the Nearest neighbor classifier suffers from high storage requirements and slow re...

Full description

Bibliographic Details
Main Author: Nikulin, Anton
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Information Technology, Informaatioteknologia, University of Jyväskylä, Jyväskylän yliopisto
Format: Master's thesis
Language:eng
Published: 2018
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/57461
_version_ 1828193088753893376
author Nikulin, Anton
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Information Technology Informaatioteknologia University of Jyväskylä Jyväskylän yliopisto
author_facet Nikulin, Anton Informaatioteknologian tiedekunta Faculty of Information Technology Information Technology Informaatioteknologia University of Jyväskylä Jyväskylän yliopisto Nikulin, Anton Informaatioteknologian tiedekunta Faculty of Information Technology Information Technology Informaatioteknologia University of Jyväskylä Jyväskylän yliopisto
author_sort Nikulin, Anton
datasource_str_mv jyx
description The size of databases has been considerably growing over recent decades and Machine Learning algorithms are not ready to process such large volume of information. Being one of the most useful algorithms in Data Mining the Nearest neighbor classifier suffers from high storage requirements and slow response when working with large data sets. Prototype Selection methods help to alleviate this problem by choosing a subset of data with a smaller size. In this thesis, the overview of existing instance selection methods is provided together with the introduction of a new approach. The majority of current methods select a subset experimentally by checking whether certain point affects classification accuracy or not. The new approach, presented in this thesis, is based on analyzing data set instances and choosing prototypes based on discovered ignorance zones. The results obtained from the analysis show that the proposed method can effectively decrease the size of the data set while maintaining the same classification accuracy with the Nearest neighbor classifier. In addition, it allows removing noisy data making the decision boundaries smoother.
first_indexed 2024-09-11T08:51:08Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Terziyan, Vagan", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Nikulin, Anton", "language": null, "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2018-03-28T14:18:19Z", "language": "", "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2018-03-28T14:18:19Z", "language": "", "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2018", "language": null, "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.other", "value": "oai:jykdok.linneanet.fi:1863670", "language": null, "element": "identifier", "qualifier": "other", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/57461", "language": "", "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "The size of databases has been considerably growing over recent decades and Machine Learning algorithms are not ready to process such large volume of information. Being one of the most useful algorithms in Data Mining the Nearest neighbor classifier suffers from high storage requirements and slow response when working with large data sets. Prototype Selection methods help to alleviate this problem by choosing a subset of data with a smaller size. In this thesis, the overview of existing instance selection methods is provided together with the introduction of a new approach. The majority of current methods select a subset experimentally by checking whether certain point affects classification accuracy or not. The new approach, presented in this thesis, is based on analyzing data set instances and choosing prototypes based on discovered ignorance zones. The results obtained from the analysis show that the proposed method can effectively decrease the size of the data set while maintaining the same classification accuracy with the Nearest neighbor classifier. In addition, it allows removing noisy data making the decision boundaries smoother.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted using Plone Publishing form by Anton Nikulin (annikuli) on 2018-03-28 14:18:18.944424. Form: Master's Thesis publishing form (https://kirjasto.jyu.fi/publish-and-buy/publishing-forms/masters-thesis-publishing-form). JyX data: [jyx_publishing-allowed (fi) =True]", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by jyx lomake-julkaisija (jyx-julkaisija.group@korppi.jyu.fi) on 2018-03-28T14:18:19Z\r\nNo. of bitstreams: 2\r\nURN:NBN:fi:jyu-201803281873.pdf: 4622770 bytes, checksum: 6fe843a4f73d1d79365f554d7c286766 (MD5)\r\nlicense.html: 4298 bytes, checksum: bc3b412d4957efce19f26bf0f97dec60 (MD5)", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2018-03-28T14:18:19Z (GMT). No. of bitstreams: 2\r\nURN:NBN:fi:jyu-201803281873.pdf: 4622770 bytes, checksum: 6fe843a4f73d1d79365f554d7c286766 (MD5)\r\nlicense.html: 4298 bytes, checksum: bc3b412d4957efce19f26bf0f97dec60 (MD5)\r\n Previous issue date: 2018", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "1 verkkoaineisto (67 sivua)", "language": null, "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "Prototype selection", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Nearest neighbor", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Ignorance zones", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Data reduction", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Classification", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Smart prototype selection for machine learning based on ignorance zones analysis", "language": null, "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-201803281873", "language": null, "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietotekniikka", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Mathematical Information Technology", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.date.updated", "value": "2018-03-28T14:18:20Z", "language": "", "element": "date", "qualifier": "updated", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": "fi", "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "602", "language": null, "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "prototyypit", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "koneoppiminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_57461
language eng
last_indexed 2025-03-31T20:03:14Z
main_date 2018-01-01T00:00:00Z
main_date_str 2018
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/ca483cc4-be77-416d-8983-2f3806a1cfa1\/download","text":"URN:NBN:fi:jyu-201803281873.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2018
record_format qdc
source_str_mv jyx
spellingShingle Nikulin, Anton Smart prototype selection for machine learning based on ignorance zones analysis Prototype selection Nearest neighbor Ignorance zones Data reduction Classification Tietotekniikka Mathematical Information Technology 602 prototyypit koneoppiminen
title Smart prototype selection for machine learning based on ignorance zones analysis
title_full Smart prototype selection for machine learning based on ignorance zones analysis
title_fullStr Smart prototype selection for machine learning based on ignorance zones analysis Smart prototype selection for machine learning based on ignorance zones analysis
title_full_unstemmed Smart prototype selection for machine learning based on ignorance zones analysis Smart prototype selection for machine learning based on ignorance zones analysis
title_short Smart prototype selection for machine learning based on ignorance zones analysis
title_sort smart prototype selection for machine learning based on ignorance zones analysis
title_txtP Smart prototype selection for machine learning based on ignorance zones analysis
topic Prototype selection Nearest neighbor Ignorance zones Data reduction Classification Tietotekniikka Mathematical Information Technology 602 prototyypit koneoppiminen
topic_facet 602 Classification Data reduction Ignorance zones Mathematical Information Technology Nearest neighbor Prototype selection Tietotekniikka koneoppiminen prototyypit
url https://jyx.jyu.fi/handle/123456789/57461 http://www.urn.fi/URN:NBN:fi:jyu-201803281873
work_keys_str_mv AT nikulinanton smartprototypeselectionformachinelearningbasedonignorancezonesanalysis