Sammanfattning
The increasing influence of artificial intelligence (AI), the availability of textual data, and large language models (LLMs) over the past decade is evident in the growth of scholarly work on identifying skills from job advertisements. In this work, we examine the detection of sentences that express skills as well as the
explainability of model decisions with respect to their dependence on skill related tokens. We compare traditional machine learning (ML) approaches with a pretrained multilingual model and domain-adapted models for the task of English skill identification, and we assess the role of skill tokens in the classification process. We also investigate the ability of these models to generalize from English (EN) to Danish (DA) in both few-shot and zero-shot settings. Our findings indicate that both models achieve high performance in sentence classification achieving an F1-score of 94% for EN and overall accuracy between 93%–94% for both EN and DA. The results show that traditional ML methods can remain relevant under certain circumstances reinforcing the importance of realistic baselines in the context of skill identification.
explainability of model decisions with respect to their dependence on skill related tokens. We compare traditional machine learning (ML) approaches with a pretrained multilingual model and domain-adapted models for the task of English skill identification, and we assess the role of skill tokens in the classification process. We also investigate the ability of these models to generalize from English (EN) to Danish (DA) in both few-shot and zero-shot settings. Our findings indicate that both models achieve high performance in sentence classification achieving an F1-score of 94% for EN and overall accuracy between 93%–94% for both EN and DA. The results show that traditional ML methods can remain relevant under certain circumstances reinforcing the importance of realistic baselines in the context of skill identification.
| Originalspråk | Engelska |
|---|---|
| Titel på värdpublikation | Proceedings of the 8th International Conference on Natural Language and Speech Processing (ICNLSP-2025) |
| Utgivningsort | Odense, Denmark |
| Förlag | Association for Computational Linguistics |
| Sidor | 410-415 |
| ISBN (elektroniskt) | 979-8-89176-297-8 |
| Status | Publicerad - 2025 |
| MoE-publikationstyp | A4 Artikel i en konferenspublikation |
| Evenemang | International Conference on Natural Language and Speech Processing - Odense, Danmark Varaktighet: 25 aug. 2025 → 27 aug. 2025 Konferensnummer: 8th https://www.icnlsp.org/ |
Konferens
| Konferens | International Conference on Natural Language and Speech Processing |
|---|---|
| Förkortad titel | ICNLSP |
| Land/Territorium | Danmark |
| Ort | Odense |
| Period | 25/08/25 → 27/08/25 |
| Internetadress |
FN:s SDG:er
Detta resultat bidrar till följande hållbara utvecklingsmål:
-
SDG 4 – God utbildning
-
SDG 8 – Anständiga arbetsvillkor och ekonomisk tillväxt
Fingeravtryck
Fördjupa i forskningsämnen för ”Cross-Lingual Sentence-Level Skill Identification in English and Danish Job Advertisements”. Tillsammans bildar de ett unikt fingeravtryck.Citera det här
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver