De grootste kennisbank van het HBO

Inspiratie op jouw vakgebied

Vrij toegankelijk

Terug naar zoekresultatenDeel deze publicatie

Samenvatting

In this paper we present the first freely available corpus of Dutch text messages containing data originating from the Netherlands and Flanders. This corpus has been collected in the framework of the SoNaR project and constitutes a viable part of this 500-million-word corpus. About 53,000 text messages were collected on a large scale, based on voluntary donations. These messages will be distributed as such. In this paper we focus on the data collection processes involved and after studying the effect of media coverage we show that especially free publicity in newspapers and on social media networks results in more contributions. All SMS are provided with metadata information. Looking at the composition of the corpus, it becomes visible that a small number of people have contributed a large amount of data, in total 272 people have contributed to the corpus during three months. The number of women contributing to the corpus is larger than the number of men, but male contributors submitted larger amounts of data. This corpus will be of paramount importance for sociolinguistic research and normalisation studies.

Toon meer
Jaar2012
TypeArtikel
TaalNederlands

Op de HBO Kennisbank vind je publicaties van 26 hogescholen

De grootste kennisbank van het HBO

Inspiratie op jouw vakgebied

Vrij toegankelijk