Machine learning based ISA detection for short shellcodes

Hyökkäyskoodi (engl. shellcode) on usein käytössä kyberrikollisuudessa, kun tarkoituksena on tunkeutua erilaisiin tietoteknisiin järjestelmiin. Koodi-injektio on yhä toimiva hyökkäysmenetelmä, sillä ohjelmistohaavoittuvuudet eivät ole kadonneet mihinkään. Tyypillisesti tällainen koodi kirjoitetaan k...

Täydet tiedot

Bibliografiset tiedot
Päätekijä:	Niiranen, Antti
Muut tekijät:	Informaatioteknologian tiedekunta, Faculty of Information Technology, Informaatioteknologia, Information Technology, Jyväskylän yliopisto, University of Jyväskylä
Aineistotyyppi:	Pro gradu
Kieli:	eng
Julkaistu:	2021
Aiheet:	shellcode code analysis Kyberturvallisuus 601 kyberturvallisuus koneoppiminen tekoäly cyber security machine learning artificial intelligence
Linkit:	https://jyx.jyu.fi/handle/123456789/76761

_version_	1833407644515172352
author	Niiranen, Antti
author2	Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_facet	Niiranen, Antti Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä Niiranen, Antti Informaatioteknologian tiedekunta Faculty of Information Technology Informaatioteknologia Information Technology Jyväskylän yliopisto University of Jyväskylä
author_sort	Niiranen, Antti
datasource_str_mv	jyx
description	Hyökkäyskoodi (engl. shellcode) on usein käytössä kyberrikollisuudessa, kun tarkoituksena on tunkeutua erilaisiin tietoteknisiin järjestelmiin. Koodi-injektio on yhä toimiva hyökkäysmenetelmä, sillä ohjelmistohaavoittuvuudet eivät ole kadonneet mihinkään. Tyypillisesti tällainen koodi kirjoitetaan konekielellä. Perinteisesti näitä hyökkäyskoodeja on analysoitu takaisinmallintamalla, mutta menetelmän vaikeuden takia on ryhdytty turvautumaan koneoppimiseen, jotta prosessista tulisi helpompi. Tutkielmassa tehdyn kirjallisuuskatsauksen avulla hankittiin tietoa hyökkäyskoodeista, tekoälystä ja koneoppimisesta. Tässä tutkielmassa selvitettiin, kuinka tarkasti viimeisintä tekniikkaa edustava koneoppimispohjainen sovellus havaitsee hyökkäyskoodin käskykanta-arkkitehtuurin. Tutkimus oli kokeellinen ja se suoritettiin virtuaaliympäristössä muun muassa turvallisuuden takia. Työssä rakennettiin reaalimaailmaan perustuva hyökkäyskooditietokanta, joka sisältää noin 20000 hyökkäyskooditiedostoa 15 eri arkkitehtuurille. Koodit hankittiin kolmesta eri lähteestä, jotka ovat Exploit Database, Shell-Storm ja MSFvenom. Näistä koodeista koostettiin pienempi joukko testaamista varten. Tutkimuksen rajoituksia pohdittaessa todettiin, että testitietokanta saattaa olla liian suppea, mutta sen avulla kuitenkin pystyttiin kartoittamaan sovelluksen tämänhetkinen toiminta. Testeissä selvisi, että sovellus ei tällä hetkellä kykene havaitsemaan hyökkäyskoodin käskykanta-arkkitehtuuria riittävällä tarkkuudella. Kahta eri skannausasetusta testattiin, joista molemmat saavuttivat noin 30% tarkkuuden. Sovelluksen luokittelijat testattiin myös, niistä satunnaismetsä toimi parhaiten. Shellcodes are often used by cybercriminals in order to breach computer systems. Code injection is still a viable attack method because software vulnerabilities have not ceased to exist. Typically these codes are written in assembly language. Traditional method of analysis has been reverse engineering, but as it can be difficult and time-consuming, machine learning has been utilized to make the process easier. A literature review was performed to gain an understanding about shellcodes, artificial intelligence and machine learning. This thesis explores how accurately a state-of-the-art machine learning ISA detection tool can detect the instruction set architecture from short shellcodes. The used method was experimental research, and the research was conducted in a virtual environment mainly for safety reasons. Using three different sources which were Exploit Database, Shell-Storm and MSFvenom, approximately 20000 shellcodes for 15 different architectures were collected. Using these files, a smaller set of shellcodes was created in order to test the performance of a machine learning based ISA detection tool. When limitations were identified, it was noted that the test set may not be diverse or large enough. Nevertheless, with this set it was possible to gain an understanding on how the program currently handles shellcodes. The study found that with the current training, the program is not able to reliably detect ISA from the shellcodes of the database. Two different detection options were used and they both achieved the accuracy of approximately 30%. The different classifiers were tested as well and random forest had the best performance.
first_indexed	2024-09-11T08:51:22Z
format	Pro gradu
free_online_boolean	1
fullrecord	[{"key": "dc.contributor.advisor", "value": "Costin, Andrei", "language": "", "element": "contributor", "qualifier": "advisor", "schema": "dc"}, {"key": "dc.contributor.author", "value": "Niiranen, Antti", "language": "", "element": "contributor", "qualifier": "author", "schema": "dc"}, {"key": "dc.date.accessioned", "value": "2021-06-21T11:04:43Z", "language": null, "element": "date", "qualifier": "accessioned", "schema": "dc"}, {"key": "dc.date.available", "value": "2021-06-21T11:04:43Z", "language": null, "element": "date", "qualifier": "available", "schema": "dc"}, {"key": "dc.date.issued", "value": "2021", "language": "", "element": "date", "qualifier": "issued", "schema": "dc"}, {"key": "dc.identifier.uri", "value": "https://jyx.jyu.fi/handle/123456789/76761", "language": null, "element": "identifier", "qualifier": "uri", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Hy\u00f6kk\u00e4yskoodi (engl. shellcode) on usein k\u00e4yt\u00f6ss\u00e4 kyberrikollisuudessa, kun tarkoituksena on tunkeutua erilaisiin tietoteknisiin j\u00e4rjestelmiin. Koodi-injektio on yh\u00e4 toimiva hy\u00f6kk\u00e4ysmenetelm\u00e4, sill\u00e4 ohjelmistohaavoittuvuudet eiv\u00e4t ole kadonneet mihink\u00e4\u00e4n. Tyypillisesti t\u00e4llainen koodi kirjoitetaan konekielell\u00e4. Perinteisesti n\u00e4it\u00e4 hy\u00f6kk\u00e4yskoodeja on analysoitu takaisinmallintamalla, mutta menetelm\u00e4n vaikeuden takia on ryhdytty turvautumaan koneoppimiseen, jotta prosessista tulisi helpompi. Tutkielmassa tehdyn kirjallisuuskatsauksen avulla hankittiin tietoa hy\u00f6kk\u00e4yskoodeista, teko\u00e4lyst\u00e4 ja koneoppimisesta. T\u00e4ss\u00e4 tutkielmassa selvitettiin, kuinka tarkasti viimeisint\u00e4 tekniikkaa edustava koneoppimispohjainen sovellus havaitsee hy\u00f6kk\u00e4yskoodin k\u00e4skykanta-arkkitehtuurin. Tutkimus oli kokeellinen ja se suoritettiin virtuaaliymp\u00e4rist\u00f6ss\u00e4 muun muassa turvallisuuden takia. Ty\u00f6ss\u00e4 rakennettiin reaalimaailmaan perustuva hy\u00f6kk\u00e4yskooditietokanta, joka sis\u00e4lt\u00e4\u00e4 noin 20000 hy\u00f6kk\u00e4yskooditiedostoa 15 eri arkkitehtuurille. Koodit hankittiin kolmesta eri l\u00e4hteest\u00e4, jotka ovat Exploit Database, Shell-Storm ja MSFvenom. N\u00e4ist\u00e4 koodeista koostettiin pienempi joukko testaamista varten. Tutkimuksen rajoituksia pohdittaessa todettiin, ett\u00e4 testitietokanta saattaa olla liian suppea, mutta sen avulla kuitenkin pystyttiin kartoittamaan sovelluksen t\u00e4m\u00e4nhetkinen toiminta. Testeiss\u00e4 selvisi, ett\u00e4 sovellus ei t\u00e4ll\u00e4 hetkell\u00e4 kykene havaitsemaan hy\u00f6kk\u00e4yskoodin k\u00e4skykanta-arkkitehtuuria riitt\u00e4v\u00e4ll\u00e4 tarkkuudella. Kahta eri skannausasetusta testattiin, joista molemmat saavuttivat noin 30% tarkkuuden. Sovelluksen luokittelijat testattiin my\u00f6s, niist\u00e4 satunnaismets\u00e4 toimi parhaiten.", "language": "fi", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.abstract", "value": "Shellcodes are often used by cybercriminals in order to breach computer systems. Code injection is still a viable attack method because software vulnerabilities have not ceased to exist. Typically these codes are written in assembly language. Traditional method of analysis has been reverse engineering, but as it can be difficult and time-consuming, machine learning has been utilized to make the process easier. A literature review was performed to gain an understanding about shellcodes, artificial intelligence and machine learning. This thesis explores how accurately a state-of-the-art machine learning ISA detection tool can detect the instruction set architecture from short shellcodes. The used method was experimental research, and the research was conducted in a virtual environment mainly for safety reasons. Using three different sources which were Exploit Database, Shell-Storm and MSFvenom, approximately 20000 shellcodes for 15 different architectures were collected. Using these files, a smaller set of shellcodes was created in order to test the performance of a machine learning based ISA detection tool. When limitations were identified, it was noted that the test set may not be diverse or large enough. Nevertheless, with this set it was possible to gain an understanding on how the program currently handles shellcodes. The study found that with the current training, the program is not able to reliably detect ISA from the shellcodes of the database. Two different detection options were used and they both achieved the accuracy of approximately 30%. The different classifiers were tested as well and random forest had the best performance.", "language": "en", "element": "description", "qualifier": "abstract", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Submitted by Miia Hakanen (mihakane@jyu.fi) on 2021-06-21T11:04:43Z\nNo. of bitstreams: 0", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.description.provenance", "value": "Made available in DSpace on 2021-06-21T11:04:43Z (GMT). No. of bitstreams: 0\n Previous issue date: 2021", "language": "en", "element": "description", "qualifier": "provenance", "schema": "dc"}, {"key": "dc.format.extent", "value": "94", "language": "", "element": "format", "qualifier": "extent", "schema": "dc"}, {"key": "dc.format.mimetype", "value": "application/pdf", "language": null, "element": "format", "qualifier": "mimetype", "schema": "dc"}, {"key": "dc.language.iso", "value": "eng", "language": null, "element": "language", "qualifier": "iso", "schema": "dc"}, {"key": "dc.rights", "value": "In Copyright", "language": "en", "element": "rights", "qualifier": null, "schema": "dc"}, {"key": "dc.subject.other", "value": "shellcode", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.subject.other", "value": "code analysis", "language": "", "element": "subject", "qualifier": "other", "schema": "dc"}, {"key": "dc.title", "value": "Machine learning based ISA detection for short shellcodes", "language": "", "element": "title", "qualifier": null, "schema": "dc"}, {"key": "dc.type", "value": "master thesis", "language": null, "element": "type", "qualifier": null, "schema": "dc"}, {"key": "dc.identifier.urn", "value": "URN:NBN:fi:jyu-202106213954", "language": "", "element": "identifier", "qualifier": "urn", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Pro gradu -tutkielma", "language": "fi", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.type.ontasot", "value": "Master\u2019s thesis", "language": "en", "element": "type", "qualifier": "ontasot", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Informaatioteknologian tiedekunta", "language": "fi", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.faculty", "value": "Faculty of Information Technology", "language": "en", "element": "contributor", "qualifier": "faculty", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Informaatioteknologia", "language": "fi", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.department", "value": "Information Technology", "language": "en", "element": "contributor", "qualifier": "department", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "Jyv\u00e4skyl\u00e4n yliopisto", "language": "fi", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.contributor.organization", "value": "University of Jyv\u00e4skyl\u00e4", "language": "en", "element": "contributor", "qualifier": "organization", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Kyberturvallisuus", "language": "fi", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "dc.subject.discipline", "value": "Kyberturvallisuus", "language": "en", "element": "subject", "qualifier": "discipline", "schema": "dc"}, {"key": "yvv.contractresearch.funding", "value": "0", "language": "", "element": "contractresearch", "qualifier": "funding", "schema": "yvv"}, {"key": "dc.type.coar", "value": "http://purl.org/coar/resource_type/c_bdcc", "language": null, "element": "type", "qualifier": "coar", "schema": "dc"}, {"key": "dc.rights.accesslevel", "value": "openAccess", "language": null, "element": "rights", "qualifier": "accesslevel", "schema": "dc"}, {"key": "dc.type.publication", "value": "masterThesis", "language": null, "element": "type", "qualifier": "publication", "schema": "dc"}, {"key": "dc.subject.oppiainekoodi", "value": "601", "language": "", "element": "subject", "qualifier": "oppiainekoodi", "schema": "dc"}, {"key": "dc.subject.yso", "value": "kyberturvallisuus", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "koneoppiminen", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "teko\u00e4ly", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "cyber security", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "machine learning", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.subject.yso", "value": "artificial intelligence", "language": null, "element": "subject", "qualifier": "yso", "schema": "dc"}, {"key": "dc.format.content", "value": "fulltext", "language": null, "element": "format", "qualifier": "content", "schema": "dc"}, {"key": "dc.rights.url", "value": "https://rightsstatements.org/page/InC/1.0/", "language": null, "element": "rights", "qualifier": "url", "schema": "dc"}, {"key": "dc.type.okm", "value": "G2", "language": null, "element": "type", "qualifier": "okm", "schema": "dc"}, {"key": "dc.description.accessibilityfeature", "value": "unknown accessibility", "language": "en", "element": "description", "qualifier": "accessibilityfeature", "schema": "dc"}, {"key": "dc.description.accessibilityfeature", "value": "ei tietoa saavutettavuudesta", "language": "fi", "element": "description", "qualifier": "accessibilityfeature", "schema": "dc"}]
id	jyx.123456789_76761
language	eng
last_indexed	2025-05-21T20:06:52Z
main_date	2021-01-01T00:00:00Z
main_date_str	2021
online_boolean	1
online_urls_str_mv	{"url":"https:\/\/jyx.jyu.fi\/bitstreams\/baf29b0e-1b58-40de-b6ab-3f7ee7c9dd1a\/download","text":"URN:NBN:fi:jyu-202106213954.pdf","source":"jyx","mediaType":"application\/pdf"}
publishDate	2021
record_format	qdc
source_str_mv	jyx
spellingShingle	Niiranen, Antti Machine learning based ISA detection for short shellcodes shellcode code analysis Kyberturvallisuus 601 kyberturvallisuus koneoppiminen tekoäly cyber security machine learning artificial intelligence
title	Machine learning based ISA detection for short shellcodes
title_full	Machine learning based ISA detection for short shellcodes
title_fullStr	Machine learning based ISA detection for short shellcodes Machine learning based ISA detection for short shellcodes
title_full_unstemmed	Machine learning based ISA detection for short shellcodes Machine learning based ISA detection for short shellcodes
title_short	Machine learning based ISA detection for short shellcodes
title_sort	machine learning based isa detection for short shellcodes
title_txtP	Machine learning based ISA detection for short shellcodes
topic	shellcode code analysis Kyberturvallisuus 601 kyberturvallisuus koneoppiminen tekoäly cyber security machine learning artificial intelligence
topic_facet	601 Kyberturvallisuus artificial intelligence code analysis cyber security koneoppiminen kyberturvallisuus machine learning shellcode tekoäly
url	https://jyx.jyu.fi/handle/123456789/76761 http://www.urn.fi/URN:NBN:fi:jyu-202106213954
work_keys_str_mv	AT niiranenantti machinelearningbasedisadetectionforshortshellcodes

Machine learning based ISA detection for short shellcodes

Samankaltaisia teoksia