Predicting high-growth firms with machine learning methods

Kiinnostus nopeakasvuisia yrityksiä kohtaan on viime aikoina kasvanut politiikantekijöiden sekä sijoittajien keskuudessa. Tässä maisterin tutkielmassa tutkin, ovatko koneoppimismenetelmät hyödyllisiä tulevaisuuden nopeakasvuisten yrityksien ennustamisessa. Tutkin tätä kysymystä laajalla 13602:n suom...

Full description

Bibliographic Details
Main Author: Virtanen, Joosua
Other Authors: Kauppakorkeakoulu, School of Business and Economics, Taloustieteet, Business and Economics, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:eng
Published: 2019
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/63260
_version_ 1826225764745347072
author Virtanen, Joosua
author2 Kauppakorkeakoulu School of Business and Economics Taloustieteet Business and Economics Jyväskylän yliopisto University of Jyväskylä
author_facet Virtanen, Joosua Kauppakorkeakoulu School of Business and Economics Taloustieteet Business and Economics Jyväskylän yliopisto University of Jyväskylä Virtanen, Joosua Kauppakorkeakoulu School of Business and Economics Taloustieteet Business and Economics Jyväskylän yliopisto University of Jyväskylä
author_sort Virtanen, Joosua
datasource_str_mv jyx
description Kiinnostus nopeakasvuisia yrityksiä kohtaan on viime aikoina kasvanut politiikantekijöiden sekä sijoittajien keskuudessa. Tässä maisterin tutkielmassa tutkin, ovatko koneoppimismenetelmät hyödyllisiä tulevaisuuden nopeakasvuisten yrityksien ennustamisessa. Tutkin tätä kysymystä laajalla 13602:n suomalaisen liikeyrityksen paneeliaineistolla vuosilta 2005–2016 hyödyntäen Eurostat-OECD:n nopeakasvuisen yrityksen määritelmää. Tällä määritelmällä aineistossa noin 5% yrityksistä sijoittuu nopeakasvuisiksi. Tutkin myös, mitkä yhteensä 24:stä ennustavasta muuttujasta myötävaikuttavat ennusteisiin eniten. Viimeiseksi tarkastelen, onko vaihtoehtoisella nopean kasvun määritelmällä, asiantuntijainformaatiota sisältävillä lisämuuttujilla tai vain nuorten yrityksien aineiston käyttämisellä vaikutusta ennustetarkkuuteen. Lähestyn kysymyksiä soveltamalla kehikkoa, joka muistuttaa todellista ennustusskenaariota, missä historiatietoihin perustuvalla aineistolla pyritään ennustamaan tulevaisuuden lopputulemia. Ennustetarkkuutta arvioidaan erillisessä testiaineistossa. Tuloksieni perusteella useimmat koneoppimismenetelmät mahdollistavat lieviä ja tilastollisesti merkitseviä parannuksia ennustetarkkuudessa verrattuna tavanomaisiin menetelmiin. Random forest (RF) -algoritmin opettama luokittelija toimii tässä kontekstissa parhaiten opetusaineiston ulkopuolisella AUC (ROC käyrän rajaaman pinta-alan) -arvolla 0,6422 (mikä vastaa 9,4% parannusta vertailuarvoon) ja tunnistaa 17,07% nopeakasvuisista yrityksistä vain 2,19% riskillä luokitella ei-nopeakasvuinen yritys nopeakasvuiseksi. Yrityksen koon nykyisen hetken ja menneen muutoksen indikaattorit yrityksen iän kanssa myötävaikuttavat eniten ennusteiden muodostamisessa. Kasvun mittaaminen käyttäen liikevaihdon kasvua henkilöstön kasvun sijasta parantaa ennustetarkkuutta. Toisaalta pääomasijoituksien ja yritystukien informaatiota sisältävien muuttujien lisääminen malliin ei paranna tuloksia. Viimeiseksi ennustusongelma osoittautuu vaikeammaksi nuorten yrityksien aineistossa. Yhteenvetona koneoppimismenetelmien soveltamista tulisi harkita nopeakasvuisten yrityksien ennustamisen haastavaan tehtävään, kun ennustetarkkuus on ensisijainen tavoite. Mikäli laskennallisilla kustannuksilla ja mallin tulkittavuudella on painoarvoa, koneoppimismenetelmät eivät välttämättä ole ylivertaisia tässä kontekstissa. Motivated by the recently grown political and commercial interest in high-growth firms (HGF)—in this master’s thesis—I study whether common machine learning (ML) techniques are useful in predicting which privately owned companies become HGFs in the near future. I employ the Eurostat-OECD definition of HGFs and study this question with a high-dimensional 2005–2016 panel data set of 13,602 unique Finnish firms, of which roughly 5% are defined as HGFs. I also study, which of the 24 predictors included matter the most for prediction. Finally, I examine whether an alternative definition of HGFs, predictors of expert information or studying a sample of young firms only will make a difference in predictive performance. I tackle the questions by developing a predictive scheme similar to a real forecasting scenario, where past values are used to train a set of classifiers, that can be employed to predict unknown future outcomes. Predictive performance is assessed in a separate test sample. My findings indicate that most ML methods offer moderate but statistically significant improvements over benchmarks, depending on the measure of interest. With an out-of-sample area under the ROC curve (AUC) of 0.6422 (equivalent to a 9.4% improvement over benchmark), the best working ML classifier—random forest (RF)—identifies 17.07% of the HGFs with only a 2.19% chance of misclassifying a non-HGF as an HGF. My analysis on variable importance and partial dependence suggests that the current values and past changes in firm size indicators alongside with firm age, contribute the most to predictive performance. Measuring the target variable in turnover rather than in employment improves prediction accuracy, where adding indicators of expert investor information as predictors does not yield any improvements. Finally, the prediction task seems to be considerably more difficult in a sample of young firms. In conclusion, ML methods should be considered for the challenging task of identifying HGFs, when computational costs and model interpretation are of secondary interest to prediction accuracy.
first_indexed 2019-08-19T08:21:16Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Hyytinen, Ari", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Virtanen, Joosua", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2019-03-25T11:13:44Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2019-03-25T11:13:44Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2019", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/63260", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Kiinnostus nopeakasvuisia yrityksi\u00e4 kohtaan on viime aikoina kasvanut politiikantekij\u00f6iden sek\u00e4 sijoittajien keskuudessa. T\u00e4ss\u00e4 maisterin tutkielmassa tutkin, ovatko koneoppimismenetelm\u00e4t hy\u00f6dyllisi\u00e4 tulevaisuuden nopeakasvuisten yrityksien ennustamisessa. Tutkin t\u00e4t\u00e4 kysymyst\u00e4 laajalla 13602:n suomalaisen liikeyrityksen paneeliaineistolla vuosilta 2005\u20132016 hy\u00f6dynt\u00e4en Eurostat-OECD:n nopeakasvuisen yrityksen m\u00e4\u00e4ritelm\u00e4\u00e4. T\u00e4ll\u00e4 m\u00e4\u00e4ritelm\u00e4ll\u00e4 aineistossa noin 5% yrityksist\u00e4 sijoittuu nopeakasvuisiksi. Tutkin my\u00f6s, mitk\u00e4 yhteens\u00e4 24:st\u00e4 ennustavasta muuttujasta my\u00f6t\u00e4vaikuttavat ennusteisiin eniten. Viimeiseksi tarkastelen, onko vaihtoehtoisella nopean kasvun m\u00e4\u00e4ritelm\u00e4ll\u00e4, asiantuntijainformaatiota sis\u00e4lt\u00e4vill\u00e4 lis\u00e4muuttujilla tai vain nuorten yrityksien aineiston k\u00e4ytt\u00e4misell\u00e4 vaikutusta ennustetarkkuuteen. L\u00e4hestyn kysymyksi\u00e4 soveltamalla kehikkoa, joka muistuttaa todellista ennustusskenaariota, miss\u00e4 historiatietoihin perustuvalla aineistolla pyrit\u00e4\u00e4n ennustamaan tulevaisuuden lopputulemia. Ennustetarkkuutta arvioidaan erillisess\u00e4 testiaineistossa. Tuloksieni perusteella useimmat koneoppimismenetelm\u00e4t mahdollistavat lievi\u00e4 ja tilastollisesti merkitsevi\u00e4 parannuksia ennustetarkkuudessa verrattuna tavanomaisiin menetelmiin. Random forest (RF) -algoritmin opettama luokittelija toimii t\u00e4ss\u00e4 kontekstissa parhaiten opetusaineiston ulkopuolisella AUC (ROC k\u00e4yr\u00e4n rajaaman pinta-alan) -arvolla 0,6422 (mik\u00e4 vastaa 9,4% parannusta vertailuarvoon) ja tunnistaa 17,07% nopeakasvuisista yrityksist\u00e4 vain 2,19% riskill\u00e4 luokitella ei-nopeakasvuinen yritys nopeakasvuiseksi. Yrityksen koon nykyisen hetken ja menneen muutoksen indikaattorit yrityksen i\u00e4n kanssa my\u00f6t\u00e4vaikuttavat eniten ennusteiden muodostamisessa. Kasvun mittaaminen k\u00e4ytt\u00e4en liikevaihdon kasvua henkil\u00f6st\u00f6n kasvun sijasta parantaa ennustetarkkuutta. Toisaalta p\u00e4\u00e4omasijoituksien ja yritystukien informaatiota sis\u00e4lt\u00e4vien muuttujien lis\u00e4\u00e4minen malliin ei paranna tuloksia. Viimeiseksi ennustusongelma osoittautuu vaikeammaksi nuorten yrityksien aineistossa. Yhteenvetona koneoppimismenetelmien soveltamista tulisi harkita nopeakasvuisten yrityksien ennustamisen haastavaan teht\u00e4v\u00e4\u00e4n, kun ennustetarkkuus on ensisijainen tavoite. Mik\u00e4li laskennallisilla kustannuksilla ja mallin tulkittavuudella on painoarvoa, koneoppimismenetelm\u00e4t eiv\u00e4t v\u00e4ltt\u00e4m\u00e4tt\u00e4 ole ylivertaisia t\u00e4ss\u00e4 kontekstissa.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Motivated by the recently grown political and commercial interest in high-growth firms (HGF)\u2014in this master\u2019s thesis\u2014I study whether common machine learning (ML) techniques are useful in predicting which privately owned companies become HGFs in the near future. I employ the Eurostat-OECD definition of HGFs and study this question with a high-dimensional 2005\u20132016 panel data set of 13,602 unique Finnish firms, of which roughly 5% are defined as HGFs. I also study, which of the 24 predictors included matter the most for prediction. Finally, I examine whether an alternative definition of HGFs, predictors of expert information or studying a sample of young firms only will make a difference in predictive performance. I tackle the questions by developing a predictive scheme similar to a real forecasting scenario, where past values are used to train a set of classifiers, that can be employed to predict unknown future outcomes. Predictive performance is assessed in a separate test sample. My findings indicate that most ML methods offer moderate but statistically significant improvements over benchmarks, depending on the measure of interest. With an out-of-sample area under the ROC curve (AUC) of 0.6422 (equivalent to a 9.4% improvement over benchmark), the best working ML classifier\u2014random forest (RF)\u2014identifies 17.07% of the HGFs with only a 2.19% chance of misclassifying a non-HGF as an HGF. My analysis on variable importance and partial dependence suggests that the current values and past changes in firm size indicators alongside with firm age, contribute the most to predictive performance. Measuring the target variable in turnover rather than in employment improves prediction accuracy, where adding indicators of expert investor information as predictors does not yield any improvements. Finally, the prediction task seems to be considerably more difficult in a sample of young firms. In conclusion, ML methods should be considered for the challenging task of identifying HGFs, when computational costs and model interpretation are of secondary interest to prediction accuracy.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Paivi Vuorio (paelvuor@jyu.fi) on 2019-03-25T11:13:44Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2019-03-25T11:13:44Z (GMT). No. of bitstreams: 0\n Previous issue date: 2019", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "67", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "high growth firms", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "Finland", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Predicting high-growth firms with machine learning methods", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-201903251944", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Jyv\u00e4skyl\u00e4 University School of Business and Economics", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Jyv\u00e4skyl\u00e4n yliopiston kauppakorkeakoulu", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Taloustieteet", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Business and Economics", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Taloustiede", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Economics", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "2041", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "ennusteet", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "ennustettavuus", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "kasvu", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "yritykset", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "koneoppiminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "forecasts", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "predictability", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "growth", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "enterprises", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "machine learning", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_63260
language eng
last_indexed 2025-02-18T10:54:38Z
main_date 2019-01-01T00:00:00Z
main_date_str 2019
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/af1ab0a1-63b0-4798-88bd-c250858e616f\/download","text":"URN:NBN:fi:jyu-201903251944.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2019
record_format qdc
source_str_mv jyx
spellingShingle Virtanen, Joosua Predicting high-growth firms with machine learning methods high growth firms Finland Taloustiede Economics 2041 ennusteet ennustettavuus kasvu yritykset koneoppiminen forecasts predictability growth enterprises machine learning
title Predicting high-growth firms with machine learning methods
title_full Predicting high-growth firms with machine learning methods
title_fullStr Predicting high-growth firms with machine learning methods Predicting high-growth firms with machine learning methods
title_full_unstemmed Predicting high-growth firms with machine learning methods Predicting high-growth firms with machine learning methods
title_short Predicting high-growth firms with machine learning methods
title_sort predicting high growth firms with machine learning methods
title_txtP Predicting high-growth firms with machine learning methods
topic high growth firms Finland Taloustiede Economics 2041 ennusteet ennustettavuus kasvu yritykset koneoppiminen forecasts predictability growth enterprises machine learning
topic_facet 2041 Economics Finland Taloustiede ennusteet ennustettavuus enterprises forecasts growth high growth firms kasvu koneoppiminen machine learning predictability yritykset
url https://jyx.jyu.fi/handle/123456789/63260 http://www.urn.fi/URN:NBN:fi:jyu-201903251944
work_keys_str_mv AT virtanenjoosua predictinghighgrowthfirmswithmachinelearningmethods