Big data challenges, ecosystems and technologies

Tiedonkeruu ja -hallinta ovat kokeneet merkittäviä muutoksia viimeisen 50-vuoden aikana ja ovat tuoneet uusia tapoja ja teknologioita tiedon hallintaan ja tallentamiseen. Tuotamme nykyään valtavia määriä dataa ja käytämme tätä dataa yhä enemmän yhteiskunnan eri alueilla. Kasvava tietomäärä on luo...

Full description

Bibliographic Details
Main Author: Rautiainen, Wiljam
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:eng
Published: 2022
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/81931
_version_ 1826225701367316480
author Rautiainen, Wiljam
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet Rautiainen, Wiljam Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Rautiainen, Wiljam Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort Rautiainen, Wiljam
datasource_str_mv jyx
description Tiedonkeruu ja -hallinta ovat kokeneet merkittäviä muutoksia viimeisen 50-vuoden aikana ja ovat tuoneet uusia tapoja ja teknologioita tiedon hallintaan ja tallentamiseen. Tuotamme nykyään valtavia määriä dataa ja käytämme tätä dataa yhä enemmän yhteiskunnan eri alueilla. Kasvava tietomäärä on luonut uusia ongelmia datan käytössä. Termistä big data on tullut laaja termi viittamaan valtavia datajoukkoja, joita ei voida prosessoida käyttäen hyväksi perinteisiä tietojenkäsittelysovelluksia. Nämä massiiviset datajoukot ovat luoneet uusia teknologioita ja ekosysteemejä näiden tietokokonaisuuksien käsittelemiseksi. Termit tietoallas, tietovarasto Apache Hadoop ja Apache Spark liitetään usein termiin big data. Tämä tutkielma tutkii, mitä big data on ja mistä komponenteista sen ekosysteemi koostuu. Tutkielmassa tarkastellaan ensin, miten tiedonhallinta on kehittynyt historian aikana ja miten olemme päätyneet nykyiseen tilanteeseen. Tämän jälkeen tutkielmassa tarkastellaan, miten big data määritellään tieteellisessä kirjallisuudessa ja mistä osista sen ekosysteemin koostuu. Seuraavaksi tutkielmassa tarkastellaan kahta yleisintä big data teknologiaa, Apache Hadoop, Apache Spark- teknologiaa. Tämän tutkielman tarkoituksena on selventää termiä big data ja tutkia, miten sen eri osat määritellään tieteellisessä kirjallisuudessa, sekä miten sen sisältämät kokonaisuudet ilmaistaan tieteellisessä kirjallisuudessa. Data collection and management have undergone significant changes over the past 50 years, introducing new ways and technologies for data management and data storing. Data has become increasingly more used in various areas of society, and we are now generating enormous amounts of data. This rising amount of data has created new problems when using this vast amount of data. Big data has become a broad term for enormous datasets that traditional data processing applications cannot process. Big data has created new technologies and ecosystems to process these datasets. The terms data lake, data warehouse, Apache Hadoop, and Apache Spark are often linked with big data applications. This thesis explores what big data is and what components its ecosystem consists of. The thesis will first examine how data management has evolved over history and how we have ended up in the current situation. The thesis then examines how big data is defined in the academic literature and what parts its ecosystem consists of. Next, the thesis will examine the two most common ways of big data data processing technologies, Apache Hadoop and Apache Spark. In sum, this thesis aims at clarifying the term big data and studying how its various aspects are defined in the academic literature.
first_indexed 2022-06-21T20:00:30Z
format Pro gradu
fullrecord [{"key": "dc.contributor.advisor", "value": "Saarela, Mirka", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.advisor", "value": "H\u00e4m\u00e4l\u00e4inen, Joonas", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Rautiainen, Wiljam", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2022-06-21T10:36:25Z", "language": "", "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2022-06-21T10:36:25Z", "language": "", "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2022", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/81931", "language": "", "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Tiedonkeruu ja -hallinta ovat kokeneet merkitt\u00e4vi\u00e4 muutoksia\r\nviimeisen 50-vuoden aikana ja ovat tuoneet uusia tapoja ja teknologioita tiedon hallintaan ja\r\ntallentamiseen. Tuotamme nyky\u00e4\u00e4n valtavia m\u00e4\u00e4ri\u00e4 dataa ja k\u00e4yt\u00e4mme t\u00e4t\u00e4 dataa yh\u00e4 enemm\u00e4n\r\nyhteiskunnan eri alueilla. Kasvava tietom\u00e4\u00e4r\u00e4 on luonut uusia ongelmia datan k\u00e4yt\u00f6ss\u00e4.\r\nTermist\u00e4 big data on tullut laaja termi viittamaan valtavia datajoukkoja, joita ei voida prosessoida\r\nk\u00e4ytt\u00e4en hyv\u00e4ksi perinteisi\u00e4 tietojenk\u00e4sittelysovelluksia. N\u00e4m\u00e4 massiiviset datajoukot\r\novat luoneet uusia teknologioita ja ekosysteemej\u00e4 n\u00e4iden tietokokonaisuuksien k\u00e4sittelemiseksi.\r\nTermit tietoallas, tietovarasto Apache Hadoop ja Apache Spark liitet\u00e4\u00e4n usein\r\ntermiin big data. T\u00e4m\u00e4 tutkielma tutkii, mit\u00e4 big data on ja mist\u00e4 komponenteista sen\r\nekosysteemi koostuu. Tutkielmassa tarkastellaan ensin, miten tiedonhallinta on kehittynyt\r\nhistorian aikana ja miten olemme p\u00e4\u00e4tyneet nykyiseen tilanteeseen. T\u00e4m\u00e4n j\u00e4lkeen tutkielmassa\r\ntarkastellaan, miten big data m\u00e4\u00e4ritell\u00e4\u00e4n tieteellisess\u00e4 kirjallisuudessa ja mist\u00e4 osista\r\nsen ekosysteemin koostuu. Seuraavaksi tutkielmassa tarkastellaan kahta yleisint\u00e4 big data\r\nteknologiaa, Apache Hadoop, Apache Spark- teknologiaa. T\u00e4m\u00e4n tutkielman tarkoituksena\r\non selvent\u00e4\u00e4 termi\u00e4 big data ja tutkia, miten sen eri osat m\u00e4\u00e4ritell\u00e4\u00e4n tieteellisess\u00e4 kirjallisuudessa,\r\nsek\u00e4 miten sen sis\u00e4lt\u00e4m\u00e4t kokonaisuudet ilmaistaan tieteellisess\u00e4 kirjallisuudessa.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Data collection and management have undergone significant changes over the\r\npast 50 years, introducing new ways and technologies for data management and data storing.\r\nData has become increasingly more used in various areas of society, and we are now generating\r\nenormous amounts of data. This rising amount of data has created new problems when\r\nusing this vast amount of data. Big data has become a broad term for enormous datasets that\r\ntraditional data processing applications cannot process. Big data has created new technologies\r\nand ecosystems to process these datasets. The terms data lake, data warehouse, Apache\r\nHadoop, and Apache Spark are often linked with big data applications.\r\nThis thesis explores what big data is and what components its ecosystem consists of. The\r\nthesis will first examine how data management has evolved over history and how we have\r\nended up in the current situation. The thesis then examines how big data is defined in the\r\nacademic literature and what parts its ecosystem consists of. Next, the thesis will examine\r\nthe two most common ways of big data data processing technologies, Apache Hadoop and\r\nApache Spark. In sum, this thesis aims at clarifying the term big data and studying how its\r\nvarious aspects are defined in the academic literature.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Miia Hakanen (mihakane@jyu.fi) on 2022-06-21T10:36:25Z\r\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2022-06-21T10:36:25Z (GMT). No. of bitstreams: 0\r\n Previous issue date: 2022", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "56", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "big data ecosystems", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Apache Spark", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Big data : challenges, ecosystems and technologies", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202206213538", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietotekniikka", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Mathematical Information Technology", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "restrictedAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "602", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "big data", "language": "", "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "Apache Hadoop", "language": "", "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "big data", "language": "", "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "Apache Hadoop", "language": "", "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.rights.accessrights", "value": "The author has not given permission to make the work publicly available electronically. Therefore the material can be read only at the archival workstation at Jyv\u00e4skyl\u00e4 University Library (https://kirjasto.jyu.fi/collections/archival-workstation).", "language": "en", "element": "rights", "qualifier": "accessrights", "schema": "dc"}, {"key": "dc.rights.accessrights", "value": "Tekij\u00e4 ei ole antanut lupaa avoimeen julkaisuun, joten aineisto on luettavissa vain Jyv\u00e4skyl\u00e4n yliopiston kirjaston arkistoty\u00f6semalta. Ks. https://kirjasto.jyu.fi/kokoelmat/arkistotyoasema..", "language": "fi", "element": "rights", "qualifier": "accessrights", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_81931
language eng
last_indexed 2025-02-18T10:55:16Z
main_date 2022-01-01T00:00:00Z
main_date_str 2022
publishDate 2022
record_format qdc
source_str_mv jyx
spellingShingle Rautiainen, Wiljam Big data : challenges, ecosystems and technologies big data ecosystems Apache Spark Tietotekniikka Mathematical Information Technology 602 big data Apache Hadoop
title Big data : challenges, ecosystems and technologies
title_full Big data : challenges, ecosystems and technologies
title_fullStr Big data : challenges, ecosystems and technologies Big data : challenges, ecosystems and technologies
title_full_unstemmed Big data : challenges, ecosystems and technologies Big data : challenges, ecosystems and technologies
title_short Big data
title_sort big data challenges ecosystems and technologies
title_sub challenges, ecosystems and technologies
title_txtP Big data : challenges, ecosystems and technologies
topic big data ecosystems Apache Spark Tietotekniikka Mathematical Information Technology 602 big data Apache Hadoop
topic_facet 602 Apache Hadoop Apache Spark Mathematical Information Technology Tietotekniikka big data big data ecosystems
url https://jyx.jyu.fi/handle/123456789/81931 http://www.urn.fi/URN:NBN:fi:jyu-202206213538
work_keys_str_mv AT rautiainenwiljam bigdatachallengesecosystemsandtechnologies