Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation

Yhteisöekologian alalla tutkijat ovat usein kiinnostuneita yhden tai useamman kasvi- tai eläinlajin välisistä esiintyvyyssuhteista eri mittauspaikoilla tai ekosysteemeissä. Tämänkaltaiset tutkimuskysymykset johtavat luonnostaan moniulotteisen runsausdatan keräämiseen. Kasvi- tai eläinlajin ekologist...

Full description

Bibliographic Details
Main Author: Korhonen, Pekka
Other Authors: Matemaattis-luonnontieteellinen tiedekunta, Faculty of Sciences, Matematiikan ja tilastotieteen laitos, Department of Mathematics and Statistics, Jyväskylän yliopisto, University of Jyväskylä
Format: Master's thesis
Language:eng
Published: 2020
Subjects:
Online Access: https://jyx.jyu.fi/handle/123456789/72890
_version_ 1828193076641792000
author Korhonen, Pekka
author2 Matemaattis-luonnontieteellinen tiedekunta Faculty of Sciences Matematiikan ja tilastotieteen laitos Department of Mathematics and Statistics Jyväskylän yliopisto University of Jyväskylä
author_facet Korhonen, Pekka Matemaattis-luonnontieteellinen tiedekunta Faculty of Sciences Matematiikan ja tilastotieteen laitos Department of Mathematics and Statistics Jyväskylän yliopisto University of Jyväskylä Korhonen, Pekka Matemaattis-luonnontieteellinen tiedekunta Faculty of Sciences Matematiikan ja tilastotieteen laitos Department of Mathematics and Statistics Jyväskylän yliopisto University of Jyväskylä
author_sort Korhonen, Pekka
datasource_str_mv jyx
description Yhteisöekologian alalla tutkijat ovat usein kiinnostuneita yhden tai useamman kasvi- tai eläinlajin välisistä esiintyvyyssuhteista eri mittauspaikoilla tai ekosysteemeissä. Tämänkaltaiset tutkimuskysymykset johtavat luonnostaan moniulotteisen runsausdatan keräämiseen. Kasvi- tai eläinlajin ekologista runsautta tietyssä ekosysteemissä voidaan kuvata esimerkiksi suoraan lajiyksilöiden lukumääränä tai binäärisenä esiintyvyysindikaattorina. Runsausvasteen tyyppi on otettava huomioon tilastollista mallia sovittaessa. Yleistetyt lineaariset latenttimuuttujamallit tarjoavat joustavan tavan mallintaa moniulotteista runsautta olettamalla yhden tai useamman latentin muuttujan olemassaolon. Latentit muuttujat ovat luonteeltaan satunnaisia ja havaitsemattomia. Niiden voidaan tulkita kuvaavan esimerkiksi havaitsematta jääneitä ympäristötekijöitä. Latentit muuttujat ovat hyödyllisiä, sillä niiden avulla voidaan mallintaa eri lajien välistä korrelaatiorakennetta. Latenttimuuttujamallien sovittaminen ei kuitenkaan ole erityisen suoraviivaista latenttien muuttujien havaitsemattomuudesta johtuen. Latenttimuuttujamallia vastaava marginaalinen uskottavuusfunktio sisältää integraalin, jolla ei yleisessä tapauksessa ole analyyttistä ratkaisua. Mallin sovituksessa joudutaan tämän vuoksi käyttämään jotakin approksimatiivista menetelmää. Eräs varteenotettava vaihtoehto on niin sanottu variaatiomenetelmä, joka esitellään tämän tutkielman alussa. Menetelmän etuna on sekä estimointitarkkuus että laskennallinen tehokkuus. Variaatiomenetelmän selvänä heikkoutena on sen huono yleistyvyys, sillä se ei suoraan sovellu käytettäväksi kaikkien tavanomaisten vastejakauma-linkkifunktio -parien yhteydessä. Tämän vuoksi tässä tutkielmassa esitetään nyt laajennettuksi variaatiomenetelmäksi nimetty menetelmä. Esitettyä laajennosta verrataan sekä tavanomaiseen variaatiomenetelmään että Laplace-approksimaatioon perustuvaan kilpailevaan menetelmään aineistopohjaisten simulointikokeiden avulla. Lisäksi esitellään laajennetun variaatiomenetelmän käyttöä suoaineistolle tehtävässä ordinaatiossa. Suoaineisto on peräisin Jyväskylän yliopiston Bio- ja ympäristötieteen laitokselta. Laajennettu variaatiomenetelmä implementoitiin ohjelmointikieliä R ja C++ käyttäen muutaman tyypillisimmän latenttumuuttujamallin tapauksessa. Generalized Linear Latent Variable Models (GLLVM), a family of statistical models developed on recent years, has gained a lot of attraction in applications, in particular in the field of community ecology. Ecologists are often concerned with the relationships between two or more species across a multiple test sites. Such situations naturally lead to the collection of multivariate abundance data and call for appropriate statistical methods to analyze such data. GLLVMs offer a model-based approach for such analyses that is also flexible in the terms of the type of abundance response at question, i.e., species count, presence/absence, biomass, and such. As their namesake implies, GLLVMs generally assume the presence of some unobserved, latent variables as predictors. These latent variables are useful, for example in the modelling of the between-species correlation, but they also introduce some computational challenges into the model fitting. In its general form, the GLLVM marginal likelihood involves an integral over the aforementioned latent variables. Under the standard assumptions this integral cannot be solved analytically, when dealing with other than normally distributed response variables. Thus some form of numerical approximation technique is often needed. This thesis starts by introducing a variational approximation (VA) approach for fitting GLLVMs, which has shown to be an attractive choice in terms of both the computational efficiency and estimation precision. From there we introduce a recently proposed method of extended variational approximation (EVA), which extends upon the standard VA approach by allowing a wider set of response distributions and link functions to be used in modelling. Then the comparative performance of these two approaches and a popular alternative, Laplace approximation (LA), is addressed in simulation studies. Additionally, an example study concerning the use of EVA in ordination of plant cover data is conducted. Lastly we discuss some ideas for further development regarding the EVA approach. The VA and LA approaches to estimation of GLLVMs are readily available in the R package gllvm, which has been used in this thesis. An implementation of the EVA approach for a few types of common response distributions was developed as a part of this thesis in R and C++ using the package TMB.
first_indexed 2020-12-01T21:04:27Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Taskinen, Sara", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.advisor", "value": "Niku, Jenni", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Korhonen, Pekka", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2020-12-01T10:33:13Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2020-12-01T10:33:13Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2020", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/72890", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Yhteis\u00f6ekologian alalla tutkijat ovat usein kiinnostuneita yhden tai useamman kasvi- tai el\u00e4inlajin v\u00e4lisist\u00e4 esiintyvyyssuhteista eri mittauspaikoilla tai ekosysteemeiss\u00e4. T\u00e4m\u00e4nkaltaiset tutkimuskysymykset johtavat luonnostaan moniulotteisen runsausdatan ker\u00e4\u00e4miseen. Kasvi- tai el\u00e4inlajin ekologista runsautta tietyss\u00e4 ekosysteemiss\u00e4 voidaan kuvata esimerkiksi suoraan lajiyksil\u00f6iden lukum\u00e4\u00e4r\u00e4n\u00e4 tai bin\u00e4\u00e4risen\u00e4 esiintyvyysindikaattorina. Runsausvasteen tyyppi on otettava huomioon tilastollista mallia sovittaessa. Yleistetyt lineaariset latenttimuuttujamallit tarjoavat joustavan tavan mallintaa moniulotteista runsautta olettamalla yhden tai useamman latentin muuttujan olemassaolon. Latentit muuttujat ovat luonteeltaan satunnaisia ja havaitsemattomia. Niiden voidaan tulkita kuvaavan esimerkiksi havaitsematta j\u00e4\u00e4neit\u00e4 ymp\u00e4rist\u00f6tekij\u00f6it\u00e4. Latentit muuttujat ovat hy\u00f6dyllisi\u00e4, sill\u00e4 niiden avulla voidaan mallintaa eri lajien v\u00e4list\u00e4 korrelaatiorakennetta. Latenttimuuttujamallien sovittaminen ei kuitenkaan ole erityisen suoraviivaista latenttien muuttujien havaitsemattomuudesta johtuen. \n\nLatenttimuuttujamallia vastaava marginaalinen uskottavuusfunktio sis\u00e4lt\u00e4\u00e4 integraalin, jolla ei yleisess\u00e4 tapauksessa ole analyyttist\u00e4 ratkaisua. Mallin sovituksessa joudutaan t\u00e4m\u00e4n vuoksi k\u00e4ytt\u00e4m\u00e4\u00e4n jotakin approksimatiivista menetelm\u00e4\u00e4. Er\u00e4s varteenotettava vaihtoehto on niin sanottu variaatiomenetelm\u00e4, joka esitell\u00e4\u00e4n t\u00e4m\u00e4n tutkielman alussa. Menetelm\u00e4n etuna on sek\u00e4 estimointitarkkuus ett\u00e4 laskennallinen tehokkuus. Variaatiomenetelm\u00e4n selv\u00e4n\u00e4 heikkoutena on sen huono yleistyvyys, sill\u00e4 se ei suoraan sovellu k\u00e4ytett\u00e4v\u00e4ksi kaikkien tavanomaisten vastejakauma-linkkifunktio -parien yhteydess\u00e4. T\u00e4m\u00e4n vuoksi t\u00e4ss\u00e4 tutkielmassa esitet\u00e4\u00e4n nyt laajennettuksi variaatiomenetelm\u00e4ksi nimetty menetelm\u00e4. Esitetty\u00e4 laajennosta verrataan sek\u00e4 tavanomaiseen variaatiomenetelm\u00e4\u00e4n ett\u00e4 Laplace-approksimaatioon perustuvaan kilpailevaan menetelm\u00e4\u00e4n aineistopohjaisten simulointikokeiden avulla. Lis\u00e4ksi esitell\u00e4\u00e4n laajennetun variaatiomenetelm\u00e4n k\u00e4ytt\u00f6\u00e4 suoaineistolle teht\u00e4v\u00e4ss\u00e4 ordinaatiossa. Suoaineisto on per\u00e4isin Jyv\u00e4skyl\u00e4n yliopiston Bio- ja ymp\u00e4rist\u00f6tieteen laitokselta. Laajennettu variaatiomenetelm\u00e4 implementoitiin ohjelmointikieli\u00e4 R ja C++ k\u00e4ytt\u00e4en muutaman tyypillisimm\u00e4n latenttumuuttujamallin tapauksessa.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Generalized Linear Latent Variable Models (GLLVM), a family of statistical models developed on recent years, has gained a lot of attraction in applications, in particular in the field of community ecology. Ecologists are often concerned with the relationships between two or more species across a multiple test sites. Such situations naturally lead to the collection of multivariate abundance data and call for appropriate statistical methods to analyze such data. GLLVMs offer a model-based approach for such analyses that is also flexible in the terms of the type of abundance response at question, i.e., species count, presence/absence, biomass, and such. As their namesake implies, GLLVMs generally assume the presence of some unobserved, latent variables as predictors. These latent variables are useful, for example in the modelling of the between-species correlation, but they also introduce some computational challenges into the model fitting.\n\nIn its general form, the GLLVM marginal likelihood involves an integral over the aforementioned latent variables. Under the standard assumptions this integral cannot be solved analytically, when dealing with other than normally distributed response variables. Thus some form of numerical approximation technique is often needed. This thesis starts by introducing a variational approximation (VA) approach for fitting GLLVMs, which has shown to be an attractive choice in terms of both the computational efficiency and estimation precision. From there we introduce a recently proposed method of extended variational approximation (EVA), which extends upon the standard VA approach by allowing a wider set of response distributions and link functions to be used in modelling. Then the comparative performance of these two approaches and a popular alternative, Laplace approximation (LA), is addressed in simulation studies. Additionally, an example study concerning the use of EVA in ordination of plant cover data is conducted. Lastly we discuss some ideas for further development regarding the EVA approach.\n\nThe VA and LA approaches to estimation of GLLVMs are readily available in the R package gllvm, which has been used in this thesis. An implementation of the EVA approach for a few types of common response distributions was developed as a part of this thesis in R and C++ using the package TMB.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Paivi Vuorio (paelvuor@jyu.fi) on 2020-12-01T10:33:13Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2020-12-01T10:33:13Z (GMT). No. of bitstreams: 0\n Previous issue date: 2020", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "46", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "generalized linear latent variable models", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "variational inference", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "abundance data", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "ordination", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202012016851", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Matemaattis-luonnontieteellinen tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Sciences", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Matematiikan ja tilastotieteen laitos", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Department of Mathematics and Statistics", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Tilastotiede", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Statistics", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "4043", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "simulointi", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "tilastomenetelm\u00e4t", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "simulation", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "statistical methods", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_72890
language eng
last_indexed 2025-03-31T20:01:24Z
main_date 2020-01-01T00:00:00Z
main_date_str 2020
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/25c2034e-568f-414d-a0c3-b876f4ef6f6b\/download","text":"URN:NBN:fi:jyu-202012016851.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2020
record_format qdc
source_str_mv jyx
spellingShingle Korhonen, Pekka Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation generalized linear latent variable models variational inference abundance data ordination Tilastotiede Statistics 4043 simulointi tilastomenetelmät simulation statistical methods
title Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation
title_full Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation
title_fullStr Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation
title_full_unstemmed Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation
title_short Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation
title_sort fitting generalized linear latent variable models using the method of extended variational approximation
title_txtP Fitting Generalized Linear Latent Variable Models using the method of Extended Variational Approximation
topic generalized linear latent variable models variational inference abundance data ordination Tilastotiede Statistics 4043 simulointi tilastomenetelmät simulation statistical methods
topic_facet 4043 Statistics Tilastotiede abundance data generalized linear latent variable models ordination simulation simulointi statistical methods tilastomenetelmät variational inference
url https://jyx.jyu.fi/handle/123456789/72890 http://www.urn.fi/URN:NBN:fi:jyu-202012016851
work_keys_str_mv AT korhonenpekka fittinggeneralizedlinearlatentvariablemodelsusingthemethodofextendedvariationalapp