Building a scene-specific synthetic data generator with Omniverse Replicator

In today’s world of AI, the amount of training data is a critical factor in the success of model training. Especially in cases where data acquisition is difficult due to rare occurrence of events or annotation cost, synthetic data can be used to supplement data needs. In computer vision, some tasks...

Täydet tiedot

Bibliografiset tiedot
Päätekijät: Kokko, Aaro, Kuhno, Jani
Muut tekijät: Faculty of Information Technology, Informaatioteknologian tiedekunta, Jyväskylän yliopisto, University of Jyväskylä
Aineistotyyppi: Pro gradu
Kieli:eng
Julkaistu: 2024
Aiheet:
Linkit: https://jyx.jyu.fi/handle/123456789/95348
_version_ 1826225731036774400
author Kokko, Aaro Kuhno, Jani
author2 Faculty of Information Technology Informaatioteknologian tiedekunta Jyväskylän yliopisto University of Jyväskylä
author_facet Kokko, Aaro Kuhno, Jani Faculty of Information Technology Informaatioteknologian tiedekunta Jyväskylän yliopisto University of Jyväskylä Kokko, Aaro Kuhno, Jani Faculty of Information Technology Informaatioteknologian tiedekunta Jyväskylän yliopisto University of Jyväskylä
author_sort Kokko, Aaro
datasource_str_mv jyx
description In today’s world of AI, the amount of training data is a critical factor in the success of model training. Especially in cases where data acquisition is difficult due to rare occurrence of events or annotation cost, synthetic data can be used to supplement data needs. In computer vision, some tasks require pixel-wise annotation which, if done by hand, is labor intensive and error-prone. In this study, we use eDSR methodology to design and evaluate a synthetic data generator, to serve as a reference generator for those who seek to start synthetic visual data generation from scratch. A generator, combining an Omniverse Replicator Python script and 3D assets, is developed and the quality of the synthetic data outputs is measured by training three different neural networks to predict segmentation masks from a real-world scene. In addition to the generator, a model of scene-specific synthetic data generation pipeline is presented, to complement the reference generator as a source of knowledge for newcomers in the field. Two major processes in synthetic data generator building are observed to be domain gap bridging and domain randomization. Domain gap bridging aims to increase the visual similarity in the synthetic scene and the real world, while domain randomization aims to increase the data distribution. Because the main benefit of synthetic data is minimal annotation cost, the optimization of generation speed should be integrated in the development process. The Python code developed is available in: https://github.com/jkuhno/reference-SDGenerator
first_indexed 2024-05-30T20:00:58Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Nurmi, Jarkko", "language": null, "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Kokko, Aaro", "language": null, "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Kuhno, Jani", "language": null, "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2024-05-30T05:38:41Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2024-05-30T05:38:41Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2024", "language": null, "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/95348", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "In today\u2019s world of AI, the amount of training data is a critical factor in the success of model training. Especially in cases where data acquisition is difficult due to rare occurrence of events or annotation cost, synthetic data can be used to supplement data needs. In computer vision, some tasks require pixel-wise annotation which, if done by hand, is labor intensive and error-prone. In this study, we use eDSR methodology to design and evaluate a synthetic data generator, to serve as a reference generator for those who seek to start synthetic visual data generation from scratch. A generator, combining an Omniverse Replicator Python script and 3D assets, is developed and the quality of the synthetic data outputs is measured by training three different neural networks to predict segmentation masks from a real-world scene. In addition to the generator, a model of scene-specific synthetic data generation pipeline is presented, to complement the reference generator as a source of knowledge for newcomers in the field. Two major processes in synthetic data generator building are observed to be domain gap bridging and domain randomization. Domain gap bridging aims to increase the visual similarity in the synthetic scene and the real world, while domain randomization aims to increase the data distribution. Because the main benefit of synthetic data is minimal annotation cost, the optimization of generation speed should be integrated in the development process. The Python code developed is available in: https://github.com/jkuhno/reference-SDGenerator", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by jyx lomake-julkaisija (jyx-julkaisija.group@korppi.jyu.fi) on 2024-05-30T05:38:41Z\r\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2024-05-30T05:38:41Z (GMT). No. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "75", "language": null, "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "CC BY-NC-ND 4.0", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.title", "value": "Building a scene-specific synthetic data generator with Omniverse Replicator", "language": null, "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202405304111", "language": null, "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Information Systems Science", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietoj\u00e4rjestelm\u00e4tiede", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.copyright", "value": "\u00a9 The Author(s)", "language": null, "element": "rights", "qualifier": "copyright", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://creativecommons.org/licenses/by-nc-nd/4.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}]
id jyx.123456789_95348
language eng
last_indexed 2025-02-18T10:54:55Z
main_date 2024-01-01T00:00:00Z
main_date_str 2024
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/2b54c9fe-0c1d-487b-9424-c46b74e92c89\/download","text":"URN:NBN:fi:jyu-202405304111.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2024
record_format qdc
source_str_mv jyx
spellingShingle Kokko, Aaro Kuhno, Jani Building a scene-specific synthetic data generator with Omniverse Replicator Information Systems Science Tietojärjestelmätiede
title Building a scene-specific synthetic data generator with Omniverse Replicator
title_full Building a scene-specific synthetic data generator with Omniverse Replicator
title_fullStr Building a scene-specific synthetic data generator with Omniverse Replicator Building a scene-specific synthetic data generator with Omniverse Replicator
title_full_unstemmed Building a scene-specific synthetic data generator with Omniverse Replicator Building a scene-specific synthetic data generator with Omniverse Replicator
title_short Building a scene-specific synthetic data generator with Omniverse Replicator
title_sort building a scene specific synthetic data generator with omniverse replicator
title_txtP Building a scene-specific synthetic data generator with Omniverse Replicator
topic Information Systems Science Tietojärjestelmätiede
topic_facet Information Systems Science Tietojärjestelmätiede
url https://jyx.jyu.fi/handle/123456789/95348 http://www.urn.fi/URN:NBN:fi:jyu-202405304111
work_keys_str_mv AT kokkoaaro buildingascenespecificsyntheticdatageneratorwithomniversereplicator