Main Page
From cmc Wiki
Welcome to the wiki of the scientific network "Building & Annotating CMC Corpora". The network is focused on the development of schemata for the annotation of discourse retrieved from genres of computer-mediated communication (CMC) for the use in corpora and on the improvement of NLP tools for the automatic processing of CMC-related phenomena. The network has been launched on the occasion of an international workshop held at TU Dortmund University in February 2013. This page offers a documentation of the talks held at the workshop. Access to other parts of the wiki is restricted to the members of the network. |
Wed 19:00- | Pre-workshop warm-up ( Restaurant Bodega del Sol, Tryp Hotel, Dortmund ) |
Thu 09:00-09:10 | Opening / Introduction
|
[ Section I ] Overview of CMC Corpus Projects
Thu 09:10-09:30 | Computer-Mediated Communication in SoNaR: Design & Collection
|
Thu 09:30-09:50 | CMC Data in Learning and Teaching (LETEC) Corpora
|
Thu 09:50-10:10 | The Project DIDI (Digital Natives and Digital Immigrants): Writing on Social Network Sites – A Corpus-based Observation of the Current Language Use in South Tyrol, with Particular Consideration of the Writers' Age
|
Thu 10:10-10:30 | Project Cybercreole/RomWeb: An Overview
|
Thu 10:30-11:00 | Coffee Break |
Thu 11:00-11:20 | Web2Corpus_it: A Balanced Pilot Corpus of Conversational Computer-Mediated Communication
|
Thu 11:20-11:40 | Building a Reference Corpus of German Computer-Mediated Communication: Introducing the DeRiK project
|
Thu 11:40-12:00 | Building and Annotating Corpora of Collaborative Authoring in Wikipedia
|
Thu 12:00-12:20 | Wikipedia as a Linguistic Resource
|
Thu 12:20-14:00 | Lunch ( Restaurant La Calla, TU Dortmund ) |
[ Section II ] Special Topics in Building and Annotating CMC Corpora
Thu 14:00-14:35 | Technical Aspects in Harvesting Data from Social Network Sites
|
Thu 14:35-15:10 | Anonymising CMC Corpora: A Reasonable Way to Share Corpora
|
Thu 15:10-15:45 | Challenges and Solutions in Automatically Annotating CMC Data
|
Thu 15:45-16:15 | Coffee Break |
Thu 16:15-16:50 | A TEI Schema for the Annotation of CMC Genres
|
Thu 16:50-17:25 | Web Forum Corpora on Globalized Varieties: Notes from the User Perspective
|
Thu 19:00- | Workshop dinner ( Restaurant Bodega del Sol, Tryp Hotel, Dortmund ) |
Fri 09:00-09:35 | Spelling Variation in Social Media
|
Fri 09:35-10:10 | Experiments with Tokenization and Part-of-speech Tagging for German CMC Discourse
|
Fri 10:10-10:45 | Tackling Diachronic Network Representation of Wikipedia and Preprocessing
|
Fri 10:45-11:15 | Coffee Break |
Fri 11:15-13:00 | Discussion: issues of joint interest / perspectives for further cooperation |
Fri 13:00 | End of workshop |