Comparative analysis of data stream processing systems

Big data-käsittelyjärjestelmät ovat tällä hetkellä kehittymässä stream-orientoituneiksi, eli data käsitellään heti saapuessaan. Perinteisemmin data säilöttiin tietokantaan, tiedostopohjaisesti tai muuhun tiedonsäilytysjärjestelmään, ja applikaatiot hakivat datan tarvittaessa. Stream-pohjainen järjes...

Full description

Bibliographic Details
Main Author: Zeb, Mian Shah
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:eng
Published: 2020
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/67932
_version_ 1826225741864370176
author Zeb, Mian Shah
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet Zeb, Mian Shah Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Zeb, Mian Shah Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort Zeb, Mian Shah
datasource_str_mv jyx
description Big data-käsittelyjärjestelmät ovat tällä hetkellä kehittymässä stream-orientoituneiksi, eli data käsitellään heti saapuessaan. Perinteisemmin data säilöttiin tietokantaan, tiedostopohjaisesti tai muuhun tiedonsäilytysjärjestelmään, ja applikaatiot hakivat datan tarvittaessa. Stream-pohjainen järjestelmä käsittelee liikkuvaa dataa, jatkuva-aikaista dataa useasta lähteestä. Sen sijaan, että haetaan ajoittain dataa, stream-pohjaiset frameworkit pystyvät käsittelemään dataa heti kun se on saatavilla, täten vähentäen viivettä. Tässä tutkielmassa tehdään komparatiivinen analyysi eri stream-pohjaisten frameworkien välillä, perustuen valittuihin ominaisuuksiin. Tutkittavat frameworkit ovat Apache Samza, Apache Flink, Apache Storm ja Apache Spark Structured Streaming. Tutkielmassa perehdytään myös Apache Kafkaan, joka on lokiperusteinen tietovarasto, jota laajalti käytetään stream-pohjaisissa frameworkeissa. Big data processing systems are evolving to be more stream oriented where data is processed continuously by processing it as soon as it arrives. Earlier data was often stored in a database, a file system or other form of data storage system. Applications would query the data as needed. Stram processing is the processing of data in motion. It works on continuous data retrieved from different resources. Instead of periodically collecting huge static data, streaming frameworks process data as soon as it becomes available, hence reducing latency. This thesis aims to conduct a comparative analysis of different streaming processors based on selected features. Research focuses on Apache Samza, Apache Flink, Apache Storm and Apache Spark Structured Streaming. Also, this thesis explains Apache Kafka which is a log-based data storage widely used in streaming frameworks.
first_indexed 2020-02-24T21:01:38Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Khriyenko, Oleksiy", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.advisor", "value": "Terziyan, Vagan", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Zeb, Mian Shah", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2020-02-24T11:33:39Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2020-02-24T11:33:39Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2020", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/67932", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Big data-k\u00e4sittelyj\u00e4rjestelm\u00e4t ovat t\u00e4ll\u00e4 hetkell\u00e4 kehittym\u00e4ss\u00e4 stream-orientoituneiksi, eli data k\u00e4sitell\u00e4\u00e4n heti saapuessaan. Perinteisemmin data s\u00e4il\u00f6ttiin tietokantaan, tiedostopohjaisesti tai muuhun tiedons\u00e4ilytysj\u00e4rjestelm\u00e4\u00e4n, ja applikaatiot hakivat datan tarvittaessa. Stream-pohjainen j\u00e4rjestelm\u00e4 k\u00e4sittelee liikkuvaa dataa, jatkuva-aikaista dataa useasta l\u00e4hteest\u00e4.\n \n Sen sijaan, ett\u00e4 haetaan ajoittain dataa, stream-pohjaiset frameworkit pystyv\u00e4t k\u00e4sittelem\u00e4\u00e4n dataa heti kun se on saatavilla, t\u00e4ten v\u00e4hent\u00e4en viivett\u00e4. T\u00e4ss\u00e4 tutkielmassa tehd\u00e4\u00e4n komparatiivinen analyysi eri stream-pohjaisten frameworkien v\u00e4lill\u00e4, perustuen valittuihin ominaisuuksiin. Tutkittavat frameworkit ovat Apache Samza, Apache Flink, Apache Storm ja Apache Spark Structured Streaming. Tutkielmassa perehdyt\u00e4\u00e4n my\u00f6s Apache Kafkaan, joka on lokiperusteinen tietovarasto, jota laajalti k\u00e4ytet\u00e4\u00e4n stream-pohjaisissa frameworkeissa.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Big data processing systems are evolving to be more stream oriented where data is processed continuously by processing it as soon as it arrives.\n Earlier data was often stored in a database, a file system or other form of data storage system. Applications would query the data as needed. Stram processing is the processing of data in motion. It works on continuous data retrieved from different resources. Instead of periodically collecting huge static data, streaming frameworks process data as soon as it becomes available, hence reducing latency. This thesis aims to conduct a comparative analysis of different streaming processors based on selected features. Research focuses on Apache Samza, Apache Flink, Apache Storm and Apache Spark Structured Streaming. Also, this thesis explains Apache Kafka which is a log-based data storage widely used in streaming frameworks.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Miia Hakanen (mihakane@jyu.fi) on 2020-02-24T11:33:39Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2020-02-24T11:33:39Z (GMT). No. of bitstreams: 0\n Previous issue date: 2020", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "48", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "Stream Processing", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Batch Processing", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Apache Kafka", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Apache Samza", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Streaming Engines", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Comparative analysis of data stream processing systems", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202002242154", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietojenk\u00e4sittelytiede", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Computer Science", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "601", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tietojenk\u00e4sittely", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "big data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tietoj\u00e4rjestelm\u00e4t", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tietotekniikka", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "suoratoisto", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "data processing", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "big data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "data systems", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "information technology", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "streaming", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_67932
language eng
last_indexed 2025-02-18T10:54:06Z
main_date 2020-01-01T00:00:00Z
main_date_str 2020
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/3995034b-92ba-4cf4-be23-bfbf2d3c1500\/download","text":"URN:NBN:fi:jyu-202002242154.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2020
record_format qdc
source_str_mv jyx
spellingShingle Zeb, Mian Shah Comparative analysis of data stream processing systems Stream Processing Batch Processing Apache Kafka Apache Samza Streaming Engines Tietojenkäsittelytiede Computer Science 601 tietojenkäsittely big data tietojärjestelmät tietotekniikka data suoratoisto data processing data systems information technology streaming
title Comparative analysis of data stream processing systems
title_full Comparative analysis of data stream processing systems
title_fullStr Comparative analysis of data stream processing systems Comparative analysis of data stream processing systems
title_full_unstemmed Comparative analysis of data stream processing systems Comparative analysis of data stream processing systems
title_short Comparative analysis of data stream processing systems
title_sort comparative analysis of data stream processing systems
title_txtP Comparative analysis of data stream processing systems
topic Stream Processing Batch Processing Apache Kafka Apache Samza Streaming Engines Tietojenkäsittelytiede Computer Science 601 tietojenkäsittely big data tietojärjestelmät tietotekniikka data suoratoisto data processing data systems information technology streaming
topic_facet 601 Apache Kafka Apache Samza Batch Processing Computer Science Stream Processing Streaming Engines Tietojenkäsittelytiede big data data data processing data systems information technology streaming suoratoisto tietojenkäsittely tietojärjestelmät tietotekniikka
url https://jyx.jyu.fi/handle/123456789/67932 http://www.urn.fi/URN:NBN:fi:jyu-202002242154
work_keys_str_mv AT zebmianshah comparativeanalysisofdatastreamprocessingsystems