Yhteenveto: | This thesis explores the use of generative artificial intelligence (GenAI), specifically large
language models (LLMs), to automate the creation of data quality (DQ) rules. Traditional
rule-based systems are difficult to scale in large and dynamic data environments. To
address this, the study evaluates three LLMs: GPT-4 Turbo, Gemini 1.5 Pro, and Claude
3.7 Sonnet, using three prompting strategies: zero-shot, few-shot, and prompt-chaining.
A total of 216 rule sets were generated from metadata and profiling inputs and evaluated
by domain experts. Results show that prompt-chaining significantly improves rule quality
over standalone prompting strategies, while model choice has a minor impact. The
best-performing combination (Claude with prompt-chaining) achieved high-quality
outputs.
These findings demonstrate that GenAI can support scalable and adaptive DQ rule
generation when paired with effective prompt design, offering a practical solution for
enterprise data monitoring.
|