Machine learning based ISA detection for short shellcodes

Hyökkäyskoodi (engl. shellcode) on usein käytössä kyberrikollisuudessa, kun tarkoituksena on tunkeutua erilaisiin tietoteknisiin järjestelmiin. Koodi-injektio on yhä toimiva hyökkäysmenetelmä, sillä ohjelmistohaavoittuvuudet eivät ole kadonneet mihinkään. Tyypillisesti tällainen koodi kirjoitetaan k...

Täydet tiedot

Bibliografiset tiedot
Päätekijä: Niiranen, Antti
Muut tekijät: Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Aineistotyyppi: Pro gradu
Kieli:eng
Julkaistu: 2021
Aiheet:
Linkit: https://jyx.jyu.fi/handle/123456789/76761
_version_ 1826225755991834624
author Niiranen, Antti
author2 Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet Niiranen, Antti Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Niiranen, Antti Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort Niiranen, Antti
datasource_str_mv jyx
description Hyökkäyskoodi (engl. shellcode) on usein käytössä kyberrikollisuudessa, kun tarkoituksena on tunkeutua erilaisiin tietoteknisiin järjestelmiin. Koodi-injektio on yhä toimiva hyökkäysmenetelmä, sillä ohjelmistohaavoittuvuudet eivät ole kadonneet mihinkään. Tyypillisesti tällainen koodi kirjoitetaan konekielellä. Perinteisesti näitä hyökkäyskoodeja on analysoitu takaisinmallintamalla, mutta menetelmän vaikeuden takia on ryhdytty turvautumaan koneoppimiseen, jotta prosessista tulisi helpompi. Tutkielmassa tehdyn kirjallisuuskatsauksen avulla hankittiin tietoa hyökkäyskoodeista, tekoälystä ja koneoppimisesta. Tässä tutkielmassa selvitettiin, kuinka tarkasti viimeisintä tekniikkaa edustava koneoppimispohjainen sovellus havaitsee hyökkäyskoodin käskykanta-arkkitehtuurin. Tutkimus oli kokeellinen ja se suoritettiin virtuaaliympäristössä muun muassa turvallisuuden takia. Työssä rakennettiin reaalimaailmaan perustuva hyökkäyskooditietokanta, joka sisältää noin 20000 hyökkäyskooditiedostoa 15 eri arkkitehtuurille. Koodit hankittiin kolmesta eri lähteestä, jotka ovat Exploit Database, Shell-Storm ja MSFvenom. Näistä koodeista koostettiin pienempi joukko testaamista varten. Tutkimuksen rajoituksia pohdittaessa todettiin, että testitietokanta saattaa olla liian suppea, mutta sen avulla kuitenkin pystyttiin kartoittamaan sovelluksen tämänhetkinen toiminta. Testeissä selvisi, että sovellus ei tällä hetkellä kykene havaitsemaan hyökkäyskoodin käskykanta-arkkitehtuuria riittävällä tarkkuudella. Kahta eri skannausasetusta testattiin, joista molemmat saavuttivat noin 30% tarkkuuden. Sovelluksen luokittelijat testattiin myös, niistä satunnaismetsä toimi parhaiten. Shellcodes are often used by cybercriminals in order to breach computer systems. Code injection is still a viable attack method because software vulnerabilities have not ceased to exist. Typically these codes are written in assembly language. Traditional method of analysis has been reverse engineering, but as it can be difficult and time-consuming, machine learning has been utilized to make the process easier. A literature review was performed to gain an understanding about shellcodes, artificial intelligence and machine learning. This thesis explores how accurately a state-of-the-art machine learning ISA detection tool can detect the instruction set architecture from short shellcodes. The used method was experimental research, and the research was conducted in a virtual environment mainly for safety reasons. Using three different sources which were Exploit Database, Shell-Storm and MSFvenom, approximately 20000 shellcodes for 15 different architectures were collected. Using these files, a smaller set of shellcodes was created in order to test the performance of a machine learning based ISA detection tool. When limitations were identified, it was noted that the test set may not be diverse or large enough. Nevertheless, with this set it was possible to gain an understanding on how the program currently handles shellcodes. The study found that with the current training, the program is not able to reliably detect ISA from the shellcodes of the database. Two different detection options were used and they both achieved the accuracy of approximately 30%. The different classifiers were tested as well and random forest had the best performance.
first_indexed 2024-09-11T08:51:22Z
format Pro gradu
free_online_boolean 1
fullrecord [{"key": "dc.contributor.advisor", "value": "Costin, Andrei", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Niiranen, Antti", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2021-06-21T11:04:43Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2021-06-21T11:04:43Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2021", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/76761", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Hy\u00f6kk\u00e4yskoodi (engl. shellcode) on usein k\u00e4yt\u00f6ss\u00e4 kyberrikollisuudessa, kun tarkoituksena on tunkeutua erilaisiin tietoteknisiin j\u00e4rjestelmiin. Koodi-injektio on yh\u00e4 toimiva hy\u00f6kk\u00e4ysmenetelm\u00e4, sill\u00e4 ohjelmistohaavoittuvuudet eiv\u00e4t ole kadonneet mihink\u00e4\u00e4n. Tyypillisesti t\u00e4llainen koodi kirjoitetaan konekielell\u00e4. Perinteisesti n\u00e4it\u00e4 hy\u00f6kk\u00e4yskoodeja on analysoitu takaisinmallintamalla, mutta menetelm\u00e4n vaikeuden takia on ryhdytty turvautumaan koneoppimiseen, jotta prosessista tulisi helpompi. Tutkielmassa tehdyn kirjallisuuskatsauksen avulla hankittiin tietoa hy\u00f6kk\u00e4yskoodeista, teko\u00e4lyst\u00e4 ja koneoppimisesta. T\u00e4ss\u00e4 tutkielmassa selvitettiin, kuinka tarkasti viimeisint\u00e4 tekniikkaa edustava koneoppimispohjainen sovellus havaitsee hy\u00f6kk\u00e4yskoodin k\u00e4skykanta-arkkitehtuurin. Tutkimus oli kokeellinen ja se suoritettiin virtuaaliymp\u00e4rist\u00f6ss\u00e4 muun muassa turvallisuuden takia. Ty\u00f6ss\u00e4 rakennettiin reaalimaailmaan perustuva hy\u00f6kk\u00e4yskooditietokanta, joka sis\u00e4lt\u00e4\u00e4 noin 20000 hy\u00f6kk\u00e4yskooditiedostoa 15 eri arkkitehtuurille. Koodit hankittiin kolmesta eri l\u00e4hteest\u00e4, jotka ovat Exploit Database, Shell-Storm ja MSFvenom. N\u00e4ist\u00e4 koodeista koostettiin pienempi joukko testaamista varten. Tutkimuksen rajoituksia pohdittaessa todettiin, ett\u00e4 testitietokanta saattaa olla liian suppea, mutta sen avulla kuitenkin pystyttiin kartoittamaan sovelluksen t\u00e4m\u00e4nhetkinen toiminta. Testeiss\u00e4 selvisi, ett\u00e4 sovellus ei t\u00e4ll\u00e4 hetkell\u00e4 kykene havaitsemaan hy\u00f6kk\u00e4yskoodin k\u00e4skykanta-arkkitehtuuria riitt\u00e4v\u00e4ll\u00e4 tarkkuudella. Kahta eri skannausasetusta testattiin, joista molemmat saavuttivat noin 30% tarkkuuden. Sovelluksen luokittelijat testattiin my\u00f6s, niist\u00e4 satunnaismets\u00e4 toimi parhaiten.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Shellcodes are often used by cybercriminals in order to breach computer systems. Code injection is still a viable attack method because software vulnerabilities have not ceased to exist. Typically these codes are written in assembly language. Traditional method of analysis has been reverse engineering, but as it can be difficult and time-consuming, machine learning has been utilized to make the process easier. A literature review was performed to gain an understanding about shellcodes, artificial intelligence and machine learning. This thesis explores how accurately a state-of-the-art machine learning ISA detection tool can detect the instruction set architecture from short shellcodes. The used method was experimental research, and the research was conducted in a virtual environment mainly for safety reasons. Using three different sources which were Exploit Database, Shell-Storm and MSFvenom, approximately 20000 shellcodes for 15 different architectures were collected. Using these files, a smaller set of shellcodes was created in order to test the performance of a machine learning based ISA detection tool. When limitations were identified, it was noted that the test set may not be diverse or large enough. Nevertheless, with this set it was possible to gain an understanding on how the program currently handles shellcodes. The study found that with the current training, the program is not able to reliably detect ISA from the shellcodes of the database. Two different detection options were used and they both achieved the accuracy of approximately 30%. The different classifiers were tested as well and random forest had the best performance.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Miia Hakanen (mihakane@jyu.fi) on 2021-06-21T11:04:43Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2021-06-21T11:04:43Z (GMT). No. of bitstreams: 0\n Previous issue date: 2021", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "94", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "shellcode", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "code analysis", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Machine learning based ISA detection for short shellcodes", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202106213954", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Kyberturvallisuus", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Kyberturvallisuus", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "601", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "kyberturvallisuus", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "koneoppiminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "teko\u00e4ly", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "cyber security", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "machine learning", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "artificial intelligence", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}]
id jyx.123456789_76761
language eng
last_indexed 2025-02-18T10:56:38Z
main_date 2021-01-01T00:00:00Z
main_date_str 2021
online_boolean 1
online_urls_str_mv {"url":"https:\/\/jyx.jyu.fi\/bitstreams\/baf29b0e-1b58-40de-b6ab-3f7ee7c9dd1a\/download","text":"URN:NBN:fi:jyu-202106213954.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate 2021
record_format qdc
source_str_mv jyx
spellingShingle Niiranen, Antti Machine learning based ISA detection for short shellcodes shellcode code analysis Kyberturvallisuus 601 kyberturvallisuus koneoppiminen tekoäly cyber security machine learning artificial intelligence
title Machine learning based ISA detection for short shellcodes
title_full Machine learning based ISA detection for short shellcodes
title_fullStr Machine learning based ISA detection for short shellcodes Machine learning based ISA detection for short shellcodes
title_full_unstemmed Machine learning based ISA detection for short shellcodes Machine learning based ISA detection for short shellcodes
title_short Machine learning based ISA detection for short shellcodes
title_sort machine learning based isa detection for short shellcodes
title_txtP Machine learning based ISA detection for short shellcodes
topic shellcode code analysis Kyberturvallisuus 601 kyberturvallisuus koneoppiminen tekoäly cyber security machine learning artificial intelligence
topic_facet 601 Kyberturvallisuus artificial intelligence code analysis cyber security koneoppiminen kyberturvallisuus machine learning shellcode tekoäly
url https://jyx.jyu.fi/handle/123456789/76761 http://www.urn.fi/URN:NBN:fi:jyu-202106213954
work_keys_str_mv AT niiranenantti machinelearningbasedisadetectionforshortshellcodes