Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules

This thesis explores the use of generative artificial intelligence (GenAI), specifically large language models (LLMs), to automate the creation of data quality (DQ) rules. Traditional rule-based systems are difficult to scale in large and dynamic data environments. To address this, the study evaluat...

Täydet tiedot

Bibliografiset tiedot
Päätekijä:	Siyam, Sohag
Muut tekijät:	Informaatioteknologian tiedekunta, Faculty of Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Aineistotyyppi:	Pro gradu
Kieli:	eng
Julkaistu:	2025
Aiheet:	Master's Degree Programme in Artificial Intelligence
Linkit:	https://jyx.jyu.fi/handle/123456789/102965

_version_	1834494319148400640
author	Siyam, Sohag
author2	Informaatioteknologian tiedekunta Faculty of Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet	Siyam, Sohag Informaatioteknologian tiedekunta Faculty of Information Technology Jyväskylän yliopisto University of Jyväskylä Siyam, Sohag Informaatioteknologian tiedekunta Faculty of Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort	Siyam, Sohag
datasource_str_mv	jyx
description	This thesis explores the use of generative artificial intelligence (GenAI), specifically large language models (LLMs), to automate the creation of data quality (DQ) rules. Traditional rule-based systems are difficult to scale in large and dynamic data environments. To address this, the study evaluates three LLMs: GPT-4 Turbo, Gemini 1.5 Pro, and Claude 3.7 Sonnet, using three prompting strategies: zero-shot, few-shot, and prompt-chaining. A total of 216 rule sets were generated from metadata and profiling inputs and evaluated by domain experts. Results show that prompt-chaining significantly improves rule quality over standalone prompting strategies, while model choice has a minor impact. The best-performing combination (Claude with prompt-chaining) achieved high-quality outputs. These findings demonstrate that GenAI can support scalable and adaptive DQ rule generation when paired with effective prompt design, offering a practical solution for enterprise data monitoring.
first_indexed	2025-06-02T20:00:55Z
format	Pro gradu
fullrecord	[{"key": "dc.contributor.advisor", "value": "Khriyenko, Oleksiy", "language": null, "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Siyam, Sohag", "language": null, "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2025-06-02T12:06:50Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2025-06-02T12:06:50Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2025", "language": null, "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/102965", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "This thesis explores the use of generative artificial intelligence (GenAI), specifically large\nlanguage models (LLMs), to automate the creation of data quality (DQ) rules. Traditional\nrule-based systems are difficult to scale in large and dynamic data environments. To\naddress this, the study evaluates three LLMs: GPT-4 Turbo, Gemini 1.5 Pro, and Claude\n3.7 Sonnet, using three prompting strategies: zero-shot, few-shot, and prompt-chaining.\nA total of 216 rule sets were generated from metadata and profiling inputs and evaluated\nby domain experts. Results show that prompt-chaining significantly improves rule quality\nover standalone prompting strategies, while model choice has a minor impact. The\nbest-performing combination (Claude with prompt-chaining) achieved high-quality\noutputs.\nThese findings demonstrate that GenAI can support scalable and adaptive DQ rule\ngeneration when paired with effective prompt design, offering a practical solution for\nenterprise data monitoring.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by jyx lomake-julkaisija (jyx-julkaisija.group@korppi.jyu.fi) on 2025-06-02T12:06:50Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2025-06-02T12:06:50Z (GMT). No. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "37", "language": null, "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": null, "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.title", "value": "Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules", "language": null, "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202506024774", "language": null, "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Master's Degree Programme in Artificial Intelligence", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Master's Degree Programme in Artificial Intelligence", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.copyright", "value": "\u00a9 The Author(s)", "language": null, "element": "rights", "qualifier": "copyright", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "restrictedAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.rights.accessrights", "value": "Tekij\u00e4 ei ole antanut lupaa avoimeen julkaisuun, joten aineisto on luettavissa vain Jyv\u00e4skyl\u00e4n yliopiston kirjaston arkistoty\u00f6semalta. Ks. https://www.jyu.fi/fi/osc/kirjasto/tyoskentelytilat/laitteet-ja-tilat#toc-jyx-ty-asema.", "language": "fi", "element": "rights", "qualifier": "accessrights", "schema": "dc"}, {"key": "dc.rights.accessrights", "value": "The author has not given permission to make the work publicly available electronically. Therefore the material can be read only at the archival workstation at Jyv\u00e4skyl\u00e4 University Library (https://www.jyu.fi/en/osc/library/workspaces/facilities-and-equipment#toc-jyx-workstation).", "language": "en", "element": "rights", "qualifier": "accessrights", "schema": "dc"}, {"key": "dc.description.accessibilityfeature", "value": "ei tietoa saavutettavuudesta", "language": "fi", "element": "description", "qualifier": "accessibilityfeature", "schema": "dc"}, {"key": "dc.description.accessibilityfeature", "value": "unknown accessibility", "language": "en", "element": "description", "qualifier": "accessibilityfeature", "schema": "dc"}]
id	jyx.123456789_102965
language	eng
last_indexed	2025-06-02T20:02:29Z
main_date	2025-01-01T00:00:00Z
main_date_str	2025
publishDate	2025
record_format	qdc
source_str_mv	jyx
spellingShingle	Siyam, Sohag Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules Master's Degree Programme in Artificial Intelligence
title	Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules
title_full	Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules
title_fullStr	Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules
title_full_unstemmed	Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules
title_short	Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules
title_sort	evaluating the effectiveness of llms and prompting techniques in generating data quality rules
title_txtP	Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules
topic	Master's Degree Programme in Artificial Intelligence
topic_facet	Master's Degree Programme in Artificial Intelligence
url	https://jyx.jyu.fi/handle/123456789/102965 http://www.urn.fi/URN:NBN:fi:jyu-202506024774
work_keys_str_mv	AT siyamsohag evaluatingtheeffectivenessofllmsandpromptingtechniquesingeneratingdataqualityrules

Evaluating the Effectiveness of LLMs and Prompting Techniques in Generating Data Quality Rules

Samankaltaisia teoksia