YEAR 2024 / VOL. 1 / NO. 1
Natural Language Processing
Published on June 15, 2024
5 articles
Automatic Speech Recognition for Spontaneous Czech Speech
By Jan Novák
Abstract
We evaluate end-to-end ASR architectures (Wav2Vec 2.0, Whisper) on a corpus of spontaneous spoken Czech including telephone conversations and broadcast news. We report word error rates and analyse common failure modes in disfluent speech.Coreference Resolution in Cross-Lingual Settings
By Jan Novák
Abstract
This study extends a neural coreference resolver trained on English OntoNotes to Czech and Slovak through zero-shot and few-shot transfer. We show that language-specific morphological features significantly reduce errors in pronoun resolution.Named Entity Recognition in Czech Legal Documents
By Jan Novák
Abstract
We introduce a domain-specific NER corpus of 12 000 annotated Czech legal sentences and fine-tune a Czech BERT model to recognize person names, organization names, dates, and legal references. F1 score of 91.3% on the held-out test set.Sentiment Analysis of Parliamentary Debates Using Contextual Embeddings
By Jan Novák
Abstract
We apply RoBERTa-based contextual embeddings to classify sentiment in transcripts of Czech parliamentary debates (2010–2023). The model identifies polarization trends and correlates sentiment shifts with key legislative events.Transformer Models for Low-Resource Language Translation
By Jan Novák