YEAR 2024 / VOL. 1 / NO. 1

Natural Language Processing

Published on June 15, 2024

5 articles
Automatic Speech Recognition for Spontaneous Czech Speech

By Jan Novák

Abstract
We evaluate end-to-end ASR architectures (Wav2Vec 2.0, Whisper) on a corpus of spontaneous spoken Czech including telephone conversations and broadcast news. We report word error rates and analyse common failure modes in disfluent speech.
Coreference Resolution in Cross-Lingual Settings

By Jan Novák

Abstract
This study extends a neural coreference resolver trained on English OntoNotes to Czech and Slovak through zero-shot and few-shot transfer. We show that language-specific morphological features significantly reduce errors in pronoun resolution.
Named Entity Recognition in Czech Legal Documents

By Jan Novák

Abstract
We introduce a domain-specific NER corpus of 12 000 annotated Czech legal sentences and fine-tune a Czech BERT model to recognize person names, organization names, dates, and legal references. F1 score of 91.3% on the held-out test set.
Sentiment Analysis of Parliamentary Debates Using Contextual Embeddings

By Jan Novák

Abstract
We apply RoBERTa-based contextual embeddings to classify sentiment in transcripts of Czech parliamentary debates (2010–2023). The model identifies polarization trends and correlates sentiment shifts with key legislative events.
Transformer Models for Low-Resource Language Translation

By Jan Novák

Abstract
This paper presents a transfer-learning framework for adapting large multilingual transformer models (mBERT, XLM-R) to low-resource language pairs with fewer than 50 000 parallel sentences. Experiments on Czech–Slovak and Slovak–Polish pairs demonstrate a 4.2 BLEU improvement over the baseline.