Computationally efficient tools for big data processing

Data processing is a key ingredient in many disciplines such as image and signal processing, machine learning and data mining. Today, as the data size increases exponentially and massive volumes of data are collected, processing this data is becoming more challenging. This leads to an on-going inter...

Full description

Bibliographic Details
Main Author: Shabat, Gil
Format: Doctoral dissertation
Language:eng
Published: 2014
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/103722
_version_ 1835400262141870080
author Shabat, Gil
author_facet Shabat, Gil Shabat, Gil
author_sort Shabat, Gil
datasource_str_mv jyx
description Data processing is a key ingredient in many disciplines such as image and signal processing, machine learning and data mining. Today, as the data size increases exponentially and massive volumes of data are collected, processing this data is becoming more challenging. This leads to an on-going interest in algorithms that are capable to deal efficiently with large datasets. This thesis investigates low rank methods and how they can be utilized in modern data analysis applications. The idea behind successful low rank methods is the fact that in many cases there are dependencies and redundancies within the data. Therefore, the data can be well approximated and processed by utilizing its low rank property which results in a faster processing of smaller data. In this part, an algorithm for efficient object tracking in videos using particle filter is presented. The algorithm uses matrix decomposition methods applied to a Gaussian kernel with a low numerical rank, to select the most representative particles of the probability density function (PDF). Then, a multi-scale function extension method is applied to obtain a fast restoration of the PDF. Another important tool that uses a low rank assumption is matrix completion. Matrix completion algorithms try to complete a matrix with missing entries by minimizing its nuclear norm. In this part, a new and robust algorithm is presented. This algorithm extends the ability of a matrix completion algorithm to deal with a variety of constraints such as the spectral and weighted nuclear norms. Another algorithm presented is a randomized algorithm for low rank LU decomposition. The randomized LU decomposition algorithm has the following advantages: It is faster than other randomized algorithms, parallelizable and consumes low memory as most of it can be done in place and has an efficient capability to process sparse matrices. The last algorithm presented in this thesis is related to the pseudo polar Fourier transform (PPFT), which is an important tool in many applications such as computerized tomography. We present a new algorithm that inverts the 3D pseudo polar Fourier transform. Its accuracy to machine precision is guaranteed within a fixed number of steps. In addition, it has low memory requirements enabling to process large 3D datasets.
first_indexed 2025-06-18T20:04:50Z
format Väitöskirja
fullrecord [{"key": "dc.contributor.author", "value": "Shabat, Gil", "language": null, "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2025-06-18T09:11:26Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2025-06-18T09:11:26Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2014", "language": null, "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.isbn", "value": "978-952-86-0819-6", "language": null, "element": "identifier", "qualifier": "isbn", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/103722", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Data processing is a key ingredient in many disciplines such as image and signal processing, machine learning and data mining. Today, as the data size increases exponentially and massive volumes of data are collected, processing this data is becoming more challenging. This leads to an on-going interest in algorithms that are capable to deal efficiently with large datasets. This thesis investigates low rank methods and how they can be utilized in modern data analysis applications. The idea behind successful low rank methods is the fact that in many cases there are dependencies and redundancies within the data. Therefore, the data can be well approximated and processed by utilizing its low rank property which results in a faster processing of smaller data. In this part, an algorithm for efficient object tracking in videos using particle filter is presented. The algorithm uses matrix decomposition methods applied to a Gaussian kernel with a low numerical rank, to select the most representative particles of the probability density function (PDF). Then, a multi-scale function extension method is applied to obtain a fast restoration of the PDF. Another important tool that uses a low rank assumption is matrix completion. Matrix completion algorithms try to complete a matrix with missing entries by minimizing its nuclear norm. In this part, a new and robust algorithm is presented. This algorithm extends the ability of a matrix completion algorithm to deal with a variety of constraints such as the spectral and weighted nuclear norms. Another algorithm presented is a randomized algorithm for low rank LU decomposition. The randomized LU decomposition algorithm has the following advantages: It is faster than other randomized algorithms, parallelizable and consumes low memory as most of it can be done in place and has an efficient capability to process sparse matrices. The last algorithm presented in this thesis is related to the pseudo polar Fourier transform (PPFT), which is an important tool in many applications such as computerized tomography. We present a new algorithm that inverts the 3D pseudo polar Fourier transform. Its accuracy to machine precision is guaranteed within a fixed number of steps. In addition, it has low memory requirements enabling to process large 3D datasets.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Harri Hirvi (hirvi@jyu.fi) on 2025-06-18T09:11:26Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2025-06-18T09:11:26Z (GMT). No. of bitstreams: 0\n Previous issue date: 2014", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.relation.ispartofseries", "value": "Jyv\u00e4skyl\u00e4 studies in computing", "language": null, "element": "relation", "qualifier": "ispartofseries", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": null, "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "data", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "big data", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "analyysimenetelm\u00e4t", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "algoritmit", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Monte Carlo -menetelm\u00e4t", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "matriisilaskenta", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "laskennallinen vaativuus", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Fourier'n muunnos", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "particle filter", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "low rank", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "randomized LU", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "polar Fourier transform", "language": null, "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Computationally efficient tools for big data processing", "language": null, "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "doctoral thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:ISBN:978-952-86-0819-6", "language": null, "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_db06", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.relation.numberinseries", "value": "208", "language": null, "element": "relation", "qualifier": "numberinseries", "schema": "dc"}, {"key": "dc.rights.copyright", "value": "\u00a9 The Author & University of Jyv\u00e4skyl\u00e4", "language": null, "element": "rights", "qualifier": "copyright", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "restrictedAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "doctoralThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.rights.accessrights", "value": "Aineistoon p\u00e4\u00e4sy\u00e4 on rajoitettu tekij\u00e4noikeussyist\u00e4. Aineisto on luettavissa Jyv\u00e4skyl\u00e4n yliopiston kirjaston <a href=\"https://www.jyu.fi/fi/osc/kirjasto/tyoskentelytilat/laitteet-ja-tilat#toc-jyx-ty-asema\">arkistoty\u00f6asemalta</a>.", "language": "fi", "element": "rights", "qualifier": "accessrights", "schema": "dc"}, {"key": "dc.rights.accessrights", "value": "<br><br>This material has a restricted access due to copyright reasons. It can be read at the <a href=\"https://www.jyu.fi/fi/osc/kirjasto/tyoskentelytilat/laitteet-ja-tilat#toc-jyx-ty-asema\">workstation</a> at Jyv\u00e4skyl\u00e4 University Library reserved for the use of archival materials.", "language": "en", "element": "rights", "qualifier": "accessrights", "schema": "dc"}, {"key": "dc.date.digitised", "value": "2025", "language": null, "element": "date", "qualifier": "digitised", "schema": "dc"}, {"key": "dc.type.okm", "value": "G4", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_103722
language eng
last_indexed 2025-06-18T20:04:50Z
main_date 2014-01-01T00:00:00Z
main_date_str 2014
publishDate 2014
record_format qdc
source_str_mv jyx
spellingShingle Shabat, Gil Computationally efficient tools for big data processing data big data analyysimenetelmät algoritmit Monte Carlo -menetelmät matriisilaskenta laskennallinen vaativuus Fourier'n muunnos particle filter low rank randomized LU polar Fourier transform
title Computationally efficient tools for big data processing
title_full Computationally efficient tools for big data processing
title_fullStr Computationally efficient tools for big data processing Computationally efficient tools for big data processing
title_full_unstemmed Computationally efficient tools for big data processing Computationally efficient tools for big data processing
title_short Computationally efficient tools for big data processing
title_sort computationally efficient tools for big data processing
title_txtP Computationally efficient tools for big data processing
topic data big data analyysimenetelmät algoritmit Monte Carlo -menetelmät matriisilaskenta laskennallinen vaativuus Fourier'n muunnos particle filter low rank randomized LU polar Fourier transform
topic_facet Fourier'n muunnos Monte Carlo -menetelmät algoritmit analyysimenetelmät big data data laskennallinen vaativuus low rank matriisilaskenta particle filter polar Fourier transform randomized LU
url https://jyx.jyu.fi/handle/123456789/103722 http://www.urn.fi/URN:ISBN:978-952-86-0819-6
work_keys_str_mv AT shabatgil computationallyefficienttoolsforbigdataprocessing