Discovering Business Processes from Unstructured Text

Asiakirjojen käsittely manuaalisesti kuluttaa paljon tietotyöntekijän resursseja. Tämä koskee myös liiketoimintaprossien johtamisen asiantuntijoita, joiden työ voi vaatia useiden liiketoimintaprosessien kuvausten lukemista. Tämän tutkielman tavoitteena oli löytää ratkaisuja, jotka vähentävät tietoty...

Full description

Bibliographic Details
Main Author: Pietikäinen, Sampo
Other Authors: Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:eng
Published: 2020
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/69993
_version_ 1826225755537801216
author Pietikäinen, Sampo
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet Pietikäinen, Sampo Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Pietikäinen, Sampo Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort Pietikäinen, Sampo
datasource_str_mv jyx
description Asiakirjojen käsittely manuaalisesti kuluttaa paljon tietotyöntekijän resursseja. Tämä koskee myös liiketoimintaprossien johtamisen asiantuntijoita, joiden työ voi vaatia useiden liiketoimintaprosessien kuvausten lukemista. Tämän tutkielman tavoitteena oli löytää ratkaisuja, jotka vähentävät tietotyöläisen asiakirjojen lukemiseen käyttämää aikaa soveltamalla luonnollisen kielen käsittelyn menetelmiä liiketoimintaprosessien etsimiseen asiakirjoista. Tutkimusmenetelmänä oli suunnittelutieteellinen tutkimus, joka sisälsi useita iteratiivisia vaiheita. Nimetyn kohteen tunnistamista käytettiin ensimmäisen ratkaisun suunnittelemiseen. Se ei kuitenkaan tuottanut toivottuja tuloksia, joten tutkimus siirtyi arvioimaan parempia mahdollisia ratkaisuja genre-teoriaa soveltavalla analyysillä. Tämän analyysin perusteella kehitettiin neljä asiakirjojen otsikkojen luokittelevaa ratkaisua tunnistamaan liiketoimintaprosesseja. Luokitteluratkaisut arvioitiin ristiinvalidoinnilla. Ensimmäinen luokitteluratkaisu suoriutui sattumanvaraisesti jaetusta ristiinvalidoinnista lupaavasti. Validoinnissa, jossa arvioitiin prosessien tunnistamista uusista asiakirjoista, ratkaisu ei kuitenkaan suoriutunut hyvin. Toinen luokitteluratkaisu sovelsi luokittelussa sanaluokkien tunnistamista. Kolmas luokitteluratkaisu hyödynsi listaa joka sisälsi liiketoimintaprosesseissa käytettäviä verbejä. Neljäs luokitteluratkaisu käytti syötteenä otsikon lisäksi kontekstia eli lauseita joissa otsikot esiintyivät asiakirjan tekstissä. Nämä luokitteluratkaisut eivät kuitenkaan tuottaneet merkittävästi ensimmäistä ratkaisua parempia tuloksia. Manual processing of the documents can be a time-taking task for a knowledge worker. This workload can be familiar to Business Process Management professionals who may have to go through multiple process descriptions in their work. This thesis attempts to find a way to mitigate the workload of the knowledge worker by proposing a natural language processing solution for discovering Business Processes from Business Process description documents. The research applied the design science research method and took several steps to produce the solution. The named entity recognition solution provided weak results, and instead of improving the solution, the research utilized genre analysis methods to seek an alternative approach. The classification of the headings of the document was deemed as a possibly viable solution. Four classification pipelines were built for classification of the headings and evaluated with cross-validation. The results of the first pipeline were somewhat promising; however, the cross-validation that was supposed to evaluate the ability to retrieve processes with previously unknown words had a poor performance. The following pipelines were created to improve from the baseline set up by the first pipeline. The second pipeline used part-of-speech tagging, the third used list of verbs relevant to business processes and the fourth pipeline used the context where process names appeared. These pipelines did not, however, make substantial improvements.
first_indexed 2020-06-17T20:02:25Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Pulkkinen, Mirja", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Pietik\u00e4inen, Sampo", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2020-06-17T04:11:06Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2020-06-17T04:11:06Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2020", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/69993", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Asiakirjojen k\u00e4sittely manuaalisesti kuluttaa paljon tietoty\u00f6ntekij\u00e4n resursseja. T\u00e4m\u00e4 koskee my\u00f6s liiketoimintaprossien johtamisen asiantuntijoita, joiden ty\u00f6 voi vaatia useiden liiketoimintaprosessien kuvausten lukemista. T\u00e4m\u00e4n tutkielman tavoitteena oli l\u00f6yt\u00e4\u00e4 ratkaisuja, jotka v\u00e4hent\u00e4v\u00e4t tietoty\u00f6l\u00e4isen asiakirjojen lukemiseen k\u00e4ytt\u00e4m\u00e4\u00e4 aikaa soveltamalla luonnollisen kielen k\u00e4sittelyn menetelmi\u00e4 liiketoimintaprosessien etsimiseen asiakirjoista. Tutkimusmenetelm\u00e4n\u00e4 oli suunnittelutieteellinen tutkimus, joka sis\u00e4lsi useita iteratiivisia vaiheita. Nimetyn kohteen tunnistamista k\u00e4ytettiin ensimm\u00e4isen ratkaisun suunnittelemiseen. Se ei kuitenkaan tuottanut toivottuja tuloksia, joten tutkimus siirtyi arvioimaan parempia mahdollisia ratkaisuja genre-teoriaa soveltavalla analyysill\u00e4. T\u00e4m\u00e4n analyysin perusteella kehitettiin nelj\u00e4 asiakirjojen otsikkojen luokittelevaa ratkaisua tunnistamaan liiketoimintaprosesseja. Luokitteluratkaisut arvioitiin ristiinvalidoinnilla. Ensimm\u00e4inen luokitteluratkaisu suoriutui sattumanvaraisesti jaetusta ristiinvalidoinnista lupaavasti. Validoinnissa, jossa arvioitiin prosessien tunnistamista uusista asiakirjoista, ratkaisu ei kuitenkaan suoriutunut hyvin. Toinen luokitteluratkaisu sovelsi luokittelussa sanaluokkien tunnistamista. Kolmas luokitteluratkaisu hy\u00f6dynsi listaa joka sis\u00e4lsi liiketoimintaprosesseissa k\u00e4ytett\u00e4vi\u00e4 verbej\u00e4. Nelj\u00e4s luokitteluratkaisu k\u00e4ytti sy\u00f6tteen\u00e4 otsikon lis\u00e4ksi kontekstia eli lauseita joissa otsikot esiintyiv\u00e4t asiakirjan tekstiss\u00e4. N\u00e4m\u00e4 luokitteluratkaisut eiv\u00e4t kuitenkaan tuottaneet merkitt\u00e4v\u00e4sti ensimm\u00e4ist\u00e4 ratkaisua parempia tuloksia.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Manual processing of the documents can be a time-taking task for a knowledge worker. This workload can be familiar to Business Process Management professionals who may have to go through multiple process descriptions in their work. This thesis attempts to find a way to mitigate the workload of the knowledge worker by proposing a natural language processing solution for discovering Business Processes from Business Process description documents. The research applied the design science research method and took several steps to produce the solution. The named entity recognition solution provided weak results, and instead of improving the solution, the research utilized genre analysis methods to seek an alternative approach. The classification of the headings of the document was deemed as a possibly viable solution. Four classification pipelines were built for classification of the headings and evaluated with cross-validation. The results of the first pipeline were somewhat promising; however, the cross-validation that was supposed to evaluate the ability to retrieve processes with previously unknown words had a poor performance. The following pipelines were created to improve from the baseline set up by the first pipeline. The second pipeline used part-of-speech tagging, the third used list of verbs relevant to business processes and the fourth pipeline used the context where process names appeared. These pipelines did not, however, make substantial improvements.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Miia Hakanen (mihakane@jyu.fi) on 2020-06-17T04:11:06Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2020-06-17T04:11:06Z (GMT). No. of bitstreams: 0\n Previous issue date: 2020", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "109", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "Business Process Management", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Natural Language Processing", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Information Extraction", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Information Retrieval", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Design Science Research", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Discovering Business Processes from Unstructured Text", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202006174226", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tietoj\u00e4rjestelm\u00e4tiede", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Information Systems Science", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "601", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tiedonhaku", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "liiketoimintaprosessit", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "prosessijohtaminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "luonnollinen kieli", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "information retrieval", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "business processes", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "process management", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "natural language", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_69993
language eng
last_indexed 2025-02-18T10:54:20Z
main_date 2020-01-01T00:00:00Z
main_date_str 2020
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/13927661-34e0-4fc2-94b6-71a0b9840528\/download","text":"URN:NBN:fi:jyu-202006174226.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2020
record_format qdc
source_str_mv jyx
spellingShingle Pietikäinen, Sampo Discovering Business Processes from Unstructured Text Business Process Management Natural Language Processing Information Extraction Information Retrieval Design Science Research Tietojärjestelmätiede Information Systems Science 601 tiedonhaku liiketoimintaprosessit prosessijohtaminen luonnollinen kieli information retrieval business processes process management natural language
title Discovering Business Processes from Unstructured Text
title_full Discovering Business Processes from Unstructured Text
title_fullStr Discovering Business Processes from Unstructured Text Discovering Business Processes from Unstructured Text
title_full_unstemmed Discovering Business Processes from Unstructured Text Discovering Business Processes from Unstructured Text
title_short Discovering Business Processes from Unstructured Text
title_sort discovering business processes from unstructured text
title_txtP Discovering Business Processes from Unstructured Text
topic Business Process Management Natural Language Processing Information Extraction Information Retrieval Design Science Research Tietojärjestelmätiede Information Systems Science 601 tiedonhaku liiketoimintaprosessit prosessijohtaminen luonnollinen kieli information retrieval business processes process management natural language
topic_facet 601 Business Process Management Design Science Research Information Extraction Information Retrieval Information Systems Science Natural Language Processing Tietojärjestelmätiede business processes information retrieval liiketoimintaprosessit luonnollinen kieli natural language process management prosessijohtaminen tiedonhaku
url https://jyx.jyu.fi/handle/123456789/69993 http://www.urn.fi/URN:NBN:fi:jyu-202006174226
work_keys_str_mv AT pietikäinensampo discoveringbusinessprocessesfromunstructuredtext