Comparative analysis of data stream processing systems

Big data-käsittelyjärjestelmät ovat tällä hetkellä kehittymässä stream-orientoituneiksi, eli data käsitellään heti saapuessaan. Perinteisemmin data säilöttiin tietokantaan, tiedostopohjaisesti tai muuhun tiedonsäilytysjärjestelmään, ja applikaatiot hakivat datan tarvittaessa. Stream-pohjainen järjes...

Täydet tiedot

Bibliografiset tiedot
Päätekijä:	Zeb, Mian Shah
Muut tekijät:	Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Aineistotyyppi:	Pro gradu
Kieli:	eng
Julkaistu:	2020
Aiheet:	Stream Processing Batch Processing Apache Kafka Apache Samza Streaming Engines Tietojenkäsittelytiede Computer Science 601 tietojenkäsittely big data tietojärjestelmät tietotekniikka data suoratoisto data processing data systems information technology streaming
Linkit:	https://jyx.jyu.fi/handle/123456789/67932

_version_	1833407647993298944
author	Zeb, Mian Shah
author2	Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet	Zeb, Mian Shah Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Zeb, Mian Shah Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort	Zeb, Mian Shah
datasource_str_mv	jyx
description	Big data-käsittelyjärjestelmät ovat tällä hetkellä kehittymässä stream-orientoituneiksi, eli data käsitellään heti saapuessaan. Perinteisemmin data säilöttiin tietokantaan, tiedostopohjaisesti tai muuhun tiedonsäilytysjärjestelmään, ja applikaatiot hakivat datan tarvittaessa. Stream-pohjainen järjestelmä käsittelee liikkuvaa dataa, jatkuva-aikaista dataa useasta lähteestä. Sen sijaan, että haetaan ajoittain dataa, stream-pohjaiset frameworkit pystyvät käsittelemään dataa heti kun se on saatavilla, täten vähentäen viivettä. Tässä tutkielmassa tehdään komparatiivinen analyysi eri stream-pohjaisten frameworkien välillä, perustuen valittuihin ominaisuuksiin. Tutkittavat frameworkit ovat Apache Samza, Apache Flink, Apache Storm ja Apache Spark Structured Streaming. Tutkielmassa perehdytään myös Apache Kafkaan, joka on lokiperusteinen tietovarasto, jota laajalti käytetään stream-pohjaisissa frameworkeissa. Big data processing systems are evolving to be more stream oriented where data is processed continuously by processing it as soon as it arrives. Earlier data was often stored in a database, a file system or other form of data storage system. Applications would query the data as needed. Stram processing is the processing of data in motion. It works on continuous data retrieved from different resources. Instead of periodically collecting huge static data, streaming frameworks process data as soon as it becomes available, hence reducing latency. This thesis aims to conduct a comparative analysis of different streaming processors based on selected features. Research focuses on Apache Samza, Apache Flink, Apache Storm and Apache Spark Structured Streaming. Also, this thesis explains Apache Kafka which is a log-based data storage widely used in streaming frameworks.
first_indexed	2020-02-24T21:01:38Z
format	Pro gradu
free_online_boolean	1
fullrecord	[{"key": "dc.contributor.advisor", "value": "Khriyenko, Oleksiy", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.advisor", "value": "Terziyan, Vagan", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Zeb, Mian Shah", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2020-02-24T11:33:39Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2020-02-24T11:33:39Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2020", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/67932", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Big data-k\u00e4sittelyj\u00e4rjestelm\u00e4t ovat t\u00e4ll\u00e4 hetkell\u00e4 kehittym\u00e4ss\u00e4 stream-orientoituneiksi, eli data k\u00e4sitell\u00e4\u00e4n heti saapuessaan. Perinteisemmin data s\u00e4il\u00f6ttiin tietokantaan, tiedostopohjaisesti tai muuhun tiedons\u00e4ilytysj\u00e4rjestelm\u00e4\u00e4n, ja applikaatiot hakivat datan tarvittaessa. Stream-pohjainen j\u00e4rjestelm\u00e4 k\u00e4sittelee liikkuvaa dataa, jatkuva-aikaista dataa useasta l\u00e4hteest\u00e4.\n \n Sen sijaan, ett\u00e4 haetaan ajoittain dataa, stream-pohjaiset frameworkit pystyv\u00e4t k\u00e4sittelem\u00e4\u00e4n dataa heti kun se on saatavilla, t\u00e4ten v\u00e4hent\u00e4en viivett\u00e4. T\u00e4ss\u00e4 tutkielmassa tehd\u00e4\u00e4n komparatiivinen analyysi eri stream-pohjaisten frameworkien v\u00e4lill\u00e4, perustuen valittuihin ominaisuuksiin. Tutkittavat frameworkit ovat Apache Samza, Apache Flink, Apache Storm ja Apache Spark Structured Streaming. Tutkielmassa perehdyt\u00e4\u00e4n my\u00f6s Apache Kafkaan, joka on lokiperusteinen tietovarasto, jota laajalti k\u00e4ytet\u00e4\u00e4n stream-pohjaisissa frameworkeissa.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Big data processing systems are evolving to be more stream oriented where data is processed continuously by processing it as soon as it arrives.\n Earlier data was often stored in a database, a file system or other form of data storage system. Applications would query the data as needed. Stram processing is the processing of data in motion. It works on continuous data retrieved from different resources. Instead of periodically collecting huge static data, streaming frameworks process data as soon as it becomes available, hence reducing latency. This thesis aims to conduct a comparative analysis of different streaming processors based on selected features. Research focuses on Apache Samza, Apache Flink, Apache Storm and Apache Spark Structured Streaming. Also, this thesis explains Apache Kafka which is a log-based data storage widely used in streaming frameworks.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Miia Hakanen (mihakane@jyu.fi) on 2020-02-24T11:33:39Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2020-02-24T11:33:39Z (GMT). No. of bitstreams: 0\n Previous issue date: 2020", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "48", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "Stream Processing", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Batch Processing", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Apache Kafka", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Apache Samza", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Streaming Engines", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Comparative analysis of data stream processing systems", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202002242154", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietojenk\u00e4sittelytiede", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Computer Science", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "601", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tietojenk\u00e4sittely", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "big data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tietoj\u00e4rjestelm\u00e4t", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tietotekniikka", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "suoratoisto", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "data processing", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "big data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "data systems", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "information technology", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "data", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "streaming", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}, {"key": "dc.description.accessibilityfeature", "value": "unknown accessibility", "language": "en", "element": "description", "qualifier": "accessibilityfeature", "schema": "dc"}, {"key": "dc.description.accessibilityfeature", "value": "ei tietoa saavutettavuudesta", "language": "fi", "element": "description", "qualifier": "accessibilityfeature", "schema": "dc"}]
id	jyx.123456789_67932
language	eng
last_indexed	2025-05-21T20:05:38Z
main_date	2020-01-01T00:00:00Z
main_date_str	2020
online_boolean	1
online_urls_str_mv	{"url":"https:\/\/jyx.jyu.fi\/bitstreams\/3995034b-92ba-4cf4-be23-bfbf2d3c1500\/download","text":"URN:NBN:fi:jyu-202002242154.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate	2020
record_format	qdc
source_str_mv	jyx
spellingShingle	Zeb, Mian Shah Comparative analysis of data stream processing systems Stream Processing Batch Processing Apache Kafka Apache Samza Streaming Engines Tietojenkäsittelytiede Computer Science 601 tietojenkäsittely big data tietojärjestelmät tietotekniikka data suoratoisto data processing data systems information technology streaming
title	Comparative analysis of data stream processing systems
title_full	Comparative analysis of data stream processing systems
title_fullStr	Comparative analysis of data stream processing systems Comparative analysis of data stream processing systems
title_full_unstemmed	Comparative analysis of data stream processing systems Comparative analysis of data stream processing systems
title_short	Comparative analysis of data stream processing systems
title_sort	comparative analysis of data stream processing systems
title_txtP	Comparative analysis of data stream processing systems
topic	Stream Processing Batch Processing Apache Kafka Apache Samza Streaming Engines Tietojenkäsittelytiede Computer Science 601 tietojenkäsittely big data tietojärjestelmät tietotekniikka data suoratoisto data processing data systems information technology streaming
topic_facet	601 Apache Kafka Apache Samza Batch Processing Computer Science Stream Processing Streaming Engines Tietojenkäsittelytiede big data data data processing data systems information technology streaming suoratoisto tietojenkäsittely tietojärjestelmät tietotekniikka
url	https://jyx.jyu.fi/handle/123456789/67932 http://www.urn.fi/URN:NBN:fi:jyu-202002242154
work_keys_str_mv	AT zebmianshah comparativeanalysisofdatastreamprocessingsystems

Comparative analysis of data stream processing systems

Samankaltaisia teoksia