This page gives an overview of the 34 spoken language corpora of the Archive for Spoken German which are available online.

...
AD
Australian German

The corpus "Australian German" was created as part of a project at Monash University in Melbourne. The project manager was Michael Clyne. The corpus contains 220 sound recordings from the period 1966 to 1973, with a total duration of 64 hours and 19 minutes. The recordings, restored by the AGD, were made in South Australia and Victoria. They contain narratives, interviews and visual descriptions with/by 333 older women and men, some of whose families have lived in South Australia for three generations. For 168 of these recordings, transcripts had been made. The transcripts (orthographic transcription ; orthographic normalization ; lemmatization ; POS tagging) were revised and aligned to the audio at the AGD. Based on the metadata a list of topics was created. Also available are word and lemma lists, ordered alphabetically and by frequency. The corpus AD-- is made available through the Database for spoken German (DGD). Individual sound recordings and transcripts can also be provided for download or on physical media via the personal service of the AGD.

220 Events 220 Speech events 333 Speakers
...
BB
German Dialects: Böblingen district

The corpus "German Dialects: Böblingen district" (BB--) was created by Ulrich Engel. Based on the data he collected, he carried out investigations on the dialect stratification and the dissolution of the dialect, among other things. The corpus BB-- comprises 73 sound recordings from the period 1965 to 1967 with a total duration of 42 hours and 28 minutes. These are recordings of narratives, conversations and the reading of dialect speakers, women and men of different ages from the Böblingen district. The recordings have been digitized in the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). Transcripts are not archived. A list of topics was created based on the metadata. The sound recordings of the corpus BB-- that are archived by the AGD are made available through the Database for Spoken German (DGD). Individual sound recordings and transcripts can also be provided for download or on physical media via the personal service of the AGD.

73 Events 73 Speech events 0 Speakers
...
BETV
Belgian TV debates

The recordings of the corpus "Belgian TV debates" were handed to the Archive for Spoken German in November 2016 by Prof. Dr. Kurt Feyaerts (KU Leuven) who had previously obtained permission from the Belgische Rundfunk broadcasting station to use them for his own research. The corpus consists of ten one hour video recordings of pre-election debates, which were televised by the TV of the German speaking community (DG) before municipal elections in 2012. All of the German speaking municipalities (Amel, Büllingen, Burg Reuland, Bütgenbach, Eupen, Kelmis, Lontzen, Raeren, Sankt Vith) are represented in this series. In addition, one debate with German speaking candidates for the provincial council (province of Lüttich/Liège) is included. For all 10 recordings, transcripts were generated using automatic speech recognition.

10 Events 10 Speech events 46 Speakers
...
BR
Biographic and Travel Narratives

The corpus Biographic and Travel Narratives (BR--) comprises 7 sound recordings from the period 1985 to 1990 with a total duration of 5 hours and 30 minutes. They are recordings of narratives of, and interviews with, 24 mostly young women and men from East Germany, Poland and Czechoslovakia. The recordings were made in the German Democratic Republic [then East Germany] under the direction of Katharina Meng (Central Institute for Linguistics of the Academy of Sciences of the GDR) and digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). 7 transcripts of different types are archived. The version of the corpus BR-- that is archived at the IDS is made available via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

7 Events 7 Speech events 24 Speakers
...
BW
Berlin Wende Corpus

The Berlin Wende Corpus (BW--) was compiled in the project "Kollektives Gedächtnis – sozialer und sprachlicher Wandel in der Nachwendezeit" ("Collective memory - social and linguistic change after the peaceful revolution") at the Institute for German and Dutch philology at the Freie Universität Berlin. Norbert Dittmar was the principal investigator. The project's aim was to document the social upheaval after the fall of the Berlin wall as a collection of individual and group specific experiences. The focus of the investigation were narrations of East and West Berliners about the fall of the wall and about individual, social and economic aspects of daily life between 1992 and 1995. The interviews were conducted by participants of a continuing education course for elementary school teachers from East Berlin in their respective circles of friends and acquaintances. Typical questions were "How did you experience the fall of the wall?" - "How are you doing today, X years later?". Often, experiences specific to East and West Berlin are shared and worked on together. The recordings document the 'typical' Berlin variety of the Eastern part of the city (at the end of the GDR era) and colloquial language with moderate Berlin influence (of West German speakers who moved to Berlin during the time of the "wall"). Urban spoken language patterns coexist with supraregional colloquial constructions. Among many other research questions, the corpus lends itself especially to the description of discoursive contrasts with which two groups in social conflict (stereotypes "Ossis" vs. "Wessis") perceive themselves during a time of upheaval and crisis. Speakers of both groups contextualize their social identities in all discourses with many variants. The corpus archived at the IDS consists of 50 audio recordings from the time between 1992 and 1996 with a total duration of 26 hours and 15 minutes. 30 speakers from East Berlin and 26 speakers from West Berlin (women and men), aged between 19 and 55 years, were interviewed. The recordings were digitised by the Archive for Spoken German. Transcription for all recordings (literary transcription with prosodic annotation, orthographic normalisation, lemmatization and POS tagging), aligned with the recordings, are available. The corpus BW is made available via the Database for Spoken German (DGD), individual recordings can also be ordered through the archive's personal service.

50 Events 50 Speech events 56 Speakers
...
DH
German today

The corpus "German today" (DH-- ) was recorded in the years 2006-2009 supported by the SAW third-party funds of the Leibniz-Association within the project "Variation in spoken German" (Principal Investigator: Nina Berend). The recordings took place in 195 towns in the entire area where German is official language and the language of instruction (Germany, Austria, Switzerland, South Tyrol, Luxemburg, East Belgium, Liechtenstein). Most of the recorded participants were pupils in senior classes in secondary schools (late teenagers), while a smaller portion of participants were aged between 50 and 60. Alltogether there were 671 pupils (usually four per recording place, with balanced numbers according to sex), as well as 158 persons from the mid-aged generation (mostly two, rarely one person per recording place, again with balanced numbers according to sex.). For each participant there are approximately 90 minutes of recorded speeech. One half of each recording consists of reading tasks, including a wordlist of 1000 words, texts (the fable "The North Wind and The Sun", a text of popular science, and constructed sentences) as well as picture naming and a translation task (English-German). The other half consists of a speech-biographic interview (approx. 30 minutes long) and a MapTask (approx. 15 minutes long), where two pupils interact with each other using speech. The corpus data are currently analysed in the project "Spoken German" in terms of regional pronunciation variation. Since 2011, the results are continuously published online in the Atlas of everyday standard German pronunciation (http://prowiki.ids-mannheim.de/bin/view/AADG/).

250 Events 7012 Speech events 834 Speakers
...
© Christian Zimmer
DNAM
German in Namibia

The Corpus German in Namibia (DNam) was collected within the project "Namdeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias" ["Namdeutsch: The Dynamics of German in the Multilingual Context of Namibia"] (2016 - 2020), which was jointly run by the University of Potsdam, Humboldt University of Berlin (HU), the Free University of Berlin (FU) and the University of Namibia, Windhoek. The project's leaders were Heike Wiese (Potsdam, HU) and Horst Simon (FU). Their cooperation partners were Marianne Zappen-Thomson (Windhoek) and Hans Boas (Austin). Research staff members were Christian Zimmer, Janosch Leugner, Laura Perlitz, Yannic Bracke and Britta Stuhl. The corpus documents the language use and language attitudes of the German-speaking minority in Namibia. The recordings were made in July/August as well as November 2017 in classrooms and boarding schools of partly German-speaking schools, on farms, in private buildings and public spaces in Namibia. The corpus contains 227 recordings featuring 110 participating speakers. Its total length is 18 hours and 39 minutes. The recordings were made in three different set-ups: semi-structured interviews(on language biography, attitudes, perceptual dialectological aspects etc.), free conversations (involving two to five people in the absence of the researchers) and "language situations" (simulations of a formal or informal communication situation). The interviews comprise seven recordings with 15 participating speakers, running for a total length of 4 hours and 42 minutes. The free conversations cover 22 recordings with 65 participating speakers, with a total length of 9 hours and 15 minutes, The "speech situations" include 198 recordings with 103 participating speakers, for a total length of 4 hours and 42 minutes. All recordings are available as audio files and in transcribed form. The transcriptions were made according to the cGAT conventions and using the Partitur-editor that is part of the EXMARaLDA tool suite. The transcripts are provided with four levels of annotation: orthographic normalization, lemmatization, part-of-speech tagging (following STTS 2.0), identification of contact language tokens. The speakers participating in the corpus are young people aged 14 to 18 and adults aged 26 to 75. They are first language speakers of German who were born in Namibia or who had (in some few cases) immigrated to Namibia in early childhood. For further information on the speakers, numerous items of metadata are available, which were collected by questionnaire. Some of the metadata categories can be used as filters (year of birth, gender, place of birth, etc.) when querying the corpus via the DGD. The remaining metadata (e.g. language biographical information) is available as additional material to the corpus in the form of a table. Beyond the availability of the corpus in the Database for Spoken German (DGD), individual sample audio recordings are available for download. Further information can be found in the following publication. Please quote this article if you use data from the DNam corpus: Zimmer, Christian, Heike Wiese, Horst J. Simon, Marianne Zappen-Thomson, Yannic Bracke, Britta Stuhl & Thomas Schmidt. (2020): Das Korpus Deutsch in Namibia (DNam): Eine Ressource für die Kontakt-, Variations- und Soziolinguistik. [The corpus German in Namibia (DNam): A resource for contact, variation and sociolinguistics.] [On the Internet at: www.geisteswissenschaften.fu-berlin.de/v/namdeutsch/Publikationen/]

179 Events 227 Speech events 117 Speakers
...
DR
German Dialects: GDR

The corpus DR-- was compiled by staff members of the Institute for German Language and Literature of the Academy of Sciences of the German Democratic Republic [then East Germany]. The project's manager was Hans-Joachim Schädlich. Following the recording campaign of the then German Language Archive (DSAv) in the Federal Republic of Germany [then West Germany] (cf. corpus ZW--, Deutsche Mundarten: Zwirner-Korpus), samples of the dialects of East Germany were to be recorded and a corpus of comparable material, obtained according to uniform criteria, was to be compiled. Further information about the project is published in: Hans-Joachim Schädlich, Heinrich Eras (1965): Bericht über die Tonbandaufnahmen der deutschen Mundarten in der Deutschen Demokratischen Republik [Report on tape recordings of German dialects in the German Democratic Republic]. In: Berichte über dialektologische Forschungen in der Deutschen Demokratischen Republik. Berlin, S. 24-27. The corpus DR-- comprises 1642 sound recordings from the period 1960 to 1968 with a total duration of 385 hours and 13 minutes. These recordings feature narratives, conversations and standard texts (comparative texts, word lists) by 1582 speakers from the GDR and former eastern territories of Germany. The comparative texts are based on the word material of the Wenkersätze. The speakers were free to choose the words and word order for the comparative texts. In many cases, written translations of the texts were made, occasionally also of parts of the narratives. The recordings were digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). Through a cooperation between the MPI for Evolutionary Anthropology (Leipzig) and the Archive for Spoken German (AGD), 117 transcripts (with standard orthographic transcription and punctuation marks according to orthography; explanatory commentaries; partly prosodic annotation; annotation of incomplete words) were produced. These transcripts are archived at the AGD. Based on the metadata, a list of topics, a list of linguistic peculiarities and a list of the professions of the speakers were created. Also availabe are written versions of the comparative texts and the word lists. The corpus DR-- is available online via the Database for Spoken German (DGD). Individual sound recordings can also be obtained for download or on physical media through the personal service of the AGD.

444 Events 1642 Speech events 1580 Speakers
...
DS
Dialog Structures

The corpus "Dialog structures" (DS--) was created within a joint project involving the following institutions: IDS Research Centre Freiburg; University of Freiburg, German Department; University of Giessen, Chair of Psychology. The project leader was Hugo Steger. The project continued on with questions of spoken language research, as they had arisen, for example, in the work of the project "Basic Structures of the German Language", from which the corpus "Grundstrukturen: Freiburger Korpus" ["Basic Structures: Freiburg Corpus"] (FR--) had emerged. By analyzing the organization of natural dialogues, the aim was now to describe the rules and regularities of how conversations are organized, both with respect to individual dialogues but also, in general, for dialogue types. On an exploratory basis, attempts were also made to ascertain to what extent and with what functions non-verbal behavioural elements are used in communication. Further project information is published in: Franz-Josef Berens, Karl-Heinz Jäger, Gerd Schank, Johannes Schwitalla (1976): Projekt Dialogstrukturen [Project Dialog Structures]. Ein Arbeitsbericht. Heutiges Deutsch I/12. München: Hueber. The DS-- corpus comprises 70 audio recordings from various sources, made in the years 1960 to 1977 with a total duration of 15 hours and 18 minutes. 51 of the recordings were taken from the corpus "Grundstrukturen: Freiburger Korpus" (FR--) and re-transcribed according to project-specific conventions. The recordings in DS-- involve 152 speakers (women and men) of the standard language or of colloquial language close to the standard in public and non-public communication. The recordings cover speech events of various kinds (registration, questioning, consultation, discussion, explanation, interview, examination, conversation, appointment). Some of them took place in the context of radio broadcasts. The recordings were digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). 70 transcripts (with orthographic wording, additional notation, lemmatization, POS tagging) are held by the Archiv. These transcripts were synchronized (aligned) with the audio by the AGD. Based on the metadata, a list of topics and a list of the speaker's professions were created. Also available word and lemma lists ordered alphabetically and by frequency. The corpus DS-- is provided online via the Database for Spoken German (DGD). Individual sound recordings can also be obtained for download or on physical media through the personal service of the AGD.

70 Events 70 Speech events 152 Speakers
...
DTRK
Deutsch von Türkeirückkehrern

The corpus "Deutsch von Türkeirückkehrern" (DTRK) was developed within an internally funded project conducted by the Department of German Language and Literature at the Faculty of Natural Sciences and Philosophy of Marmara University in Instanbul, with support by the Archiv für Gesprochenes Deutsch. The project's manager was Serap Devran. The data was mainly collected through so-called "autobiographical-narrative interviews", supplemented by ethnographic-historical material, if necessary. The aim of the project was the biographical and interactional analysis of the migration experiences of students of German Studies at Marmara University in Istanbul, who had been born in Germany or come to Germany as infants and grown up there, and who had passed through the German education system, either completely or in part. The focus of the project lay on the different life stories, social and linguistic experiences of the students in Germany and in Turkey. The aim was to analyze the linguistic means that the informants used to represent, evaluate and phrase the experiences and events in their old and new environments. These are linguistic actions and communicative practices with which the narrators represent themselves, their narrated self and other persons of their lived history in order to position them as socially determinable persons.

12 Events 12 Speech events 13 Speakers
...
EK
Elicited Conflict Talk between Mothers and their Adolescent Daughers

Das Korpus Elizitierte Konfliktgespräche zwischen Müttern und jugendlichen Töchtern (EK--) wurde im Teilprojekt C2: Argumente in Konfliktgesprächen zwischen Eltern und Jugendlichen des Sonderforschungsbereichs 245: Sprechen und Sprachverstehen im sozialen Kontext (Mannheim, Heidelberg) erstellt. Projektleiter war Manfred Hofer. Im ersten Projektabschnitt richtete sich die Arbeit v.a. auf eine Beschreibung der in Konfliktgesprächen zwischen Müttern und ihren jugendlichen Töchtern vorgebrachten Argumente und deren Verläufe im Gespräch. Dazu wurde ein integriertes sprachpsychologisches und linguistisches Kategoriensystem zur Klassifikation von Gesprächen entwickelt. Mithilfe dieses Systems war es möglich, Unterschiede in der Häufigkeit des Auftretens einzelner Argumentationselemente zwischen Müttern und Töchtern zu identifizieren und unterschiedliche "Niveaus" des Argumentierens und deren Abhängigkeit vom Alter der Töchter festzustellen sowie eine Typologie von Gesamtgesprächen zu erstellen. In den weiteren zwei Jahren standen erklärende Fragen im Mittelpunkt. Zum einen wurden sequenziell argumentative Abhängigkeiten im Gesprächsverlauf ermittelt, zum anderen wurde der Zusammenhang zwischen situativen Absichten (motivationalen Tendenzen) der beiden Partner und deren Vorbringen von Argumenten im Gespräch untersucht. Weitere Projektinformationen sind enthalten in: Manfred Hofer, Birgit Pikowsky, Thomas Spranz-Fogasy (1992): Projekt "Argumente in Konfliktgesprächen zwischen Eltern und Jugendlichen." Abschlussbericht an die Deutsche Forschungsgemeinschaft. Das Korpus EK-- umfasst 138 Tonaufnahmen mit 214 Sprecherinnen aus den Jahren 1988 und 1990 mit einer Gesamtdauer von 12 Stunden und 23 Minuten. Die Mütter waren zum Zeitpunkt der Aufnahmen zwischen 31 und 58 Jahren alt, die Töchter zwischen 12 und 24 Jahren. Die Aufnahmen wurden im Archiv für Gesprochenes Deutsch (AGD) (früher: Deutsches Spracharchiv) digitalisiert. 138 Transkripte (Wortlaut orthographisch, zusätzlichen Notationen) sind archiviert. Anhand der Metadaten wurde eine Themenliste erstellt. Das Korpus EK-- wird in der Datenbank für Gesprochenes Deutsch (DGD) bereitgestellt, einzelne Tonaufnahmen können auch im persönlichen Service AGD weitergegeben werden.

107 Events 138 Speech events 0 Speakers
...
FOLK
Research and Teaching Corpus of Spoken German

The Research and Teaching Corpus of Spoken German (FOLK) is being built up in the pragmatics department of the IDS Mannheim since 2008. FOLK primarily addresses researchers, teachers and students in conversion analysis, corpus linguistics and related fields. The overall aim of the project is to provide to the scientific community a corpus of interactions in German speaking countries which covers a maximally broad spectrum of interaction types. To this end, audio and video recordings of verbal interactions in different private (e.g. table talk, game interactions), institutional (e.g. classroom discourse, professional communication) and public (e.g. panel discussions, public arbitrations) contexts are made. Additional stratification parameters like regional provenance, age or education level of speakers are taken into account in corpus compilation in order to enable the creation of virtual subcorpora which are balanced with respect to these parameters. Using the editor FOLKER, recordings are transcribed in modified orthography ('literal transcription') according to the cGAT conventions for minimal transcripts. The transcripts are time-aligned in segments no longer than 5 seconds. To optimize searchability of the corpus, three annotation levels are added to the literary transcription: an orthographic normalisation, a lemmatisation and a part-of-speech tagging according to a version of the Stuttgart-Tübingen-Tagset (STTS) optimized for interaction data. FOLK comprises data collected by the project itself as well as data from external collaborators. The current version of FOLK (version 2.14 from April 2020) comprises audio and video recordings and transcripts of 332 interactions with 1131 documented speakers. The overall duration of the recordings is 285 hours and 39 minutes. The transcripts amount to 2,719,948 verbal tokens. Out of the 332 interactions, 105 were recorded on videos with a total duration of 124 hours and 27 minutes; the remaining 227 interactions (161 hours, 11 minutes) were recorded on audio only. The corpus also contains relevant additional materials: information on interaction setting and course of events, word lists ordered alphabetically and by frequency, transcription conventions, documentation of metadata systematics and further interaction specific materials. The corpus is extended continuously. New data are published via the Database for Spoken German (DGD) in regular intervals. For further information, see also the project's website at http://agd.ids-mannheim.de/folk.shtml. Instructions on proper citation of the FOLK corpus are provided via the Help menu of the DGD.

332 Events 332 Speech events 1131 Speakers
...
FR
Basic Structures: Freiburg Corpus

The corpus "Basic Structures: Freiburg Corpus" (FR--) was created by the former Freiburg-based Research Unit of the IDS. The project's leader was Hugo Steger. Within the framework of the project "Basic Structures of the German Language", the Freiburg Research Unit had the task of describing grammatical and stylistic features of the spoken standard language. Towards this end, the project staff created an extensive archive of audio recordings and transcripts of about 500,000 words. These materials were used to carry out grammatical and stylistic analyses of the peculiarities of spoken language, which, among other things, were also intended to enable statements about the connection between speech constellations and the use of specific means of expression. The areas of subjunctive and mood, passive, future and present tenses, past tenses, morphology and word length were studied. Further project information is published in: Gesprochene Sprache. Bericht der Forschungsstelle Freiburg [Spoken Language. Report of the Freiburg Research Centre] (Forschungsberichte des Instituts für deutsche Sprache 7) 2. Auflage, 1975, Tübingen: Narr. The FR-- corpus comprises 222 sound recordings from the period 1960 to 1974 with a total duration of 68 hours and 6 minutes. These are recordings with 812 speakers (women and men) of the standard language or of colloquial language close to the standard in public and non-public communication. Speech events of various kinds were recorded, among them consultations, reports, meetings, discussions, explanations, narrations, interviews, sermons, press conferences, conversations, lectures). Some of these took place in the context of radio broadcasts. The recordings were digitized at the Archive for Spoken German (AGD) (formerly: Deutsches Spracharchiv). 221 transcripts (with orthographic wording, additional notations, partly intonation notations, lemmatization, POS tagging) are held by the AGD. The transcripts were also synchronized (aligned) with the audio in the AGD. Based on the metadata, a list of topics and a list of the speaker's professions were created. Also available are also word and lemma lists ordered alphabetically and by frequency. The corpus FR-- is available online via the Database for Spoken German (DGD). Individual sound recordings can also be obtained for download or on physical media through the personal service of the AGD.

222 Events 222 Speech events 812 Speakers
...
© Christian Zimmer
GDSA
Spoken German in Southern Africa

The project "Spoken German in Southern Africa" aimed at systematically documenting the varieties of spoken German in Namibia and South Africa, making them accessible via the Internet and describing them using examples. Towards this end, sound recordings were made in Namibia and South Africa of German speakers producing various text types and interacting in various communicative settings. The recordings are archived in the Language Archive of the Institute for the German Language (IDS) and made accessible in the archive's database. During an exploratory and networking trip in April 2005, an initial set of sound recordings was made in Namibia as a pre-test. Most of the material was collected during recording trips in South Africa in February/March 2012 and in Namibia in February/March 2013. The recording locations in South Africa are various places in KwaZulu-Natal, Mpumalanga, Gauteng, Western Cape. The content of the recordings includes biographical information, various language data questionnaires, as well as the reading aloud of texts and lists of words and sentences. The materials used for reading aloud were "Nordwind und Sonne", Wenkersätze, the "Deutsch heute" word list, "Deutsch heute" sentence list, "Niederdeutsche Phonologie" word list and "Niederdeutsche Phonologie" sentence list, Niederdeutscher Wortatlas. (These materials are available as additional materials with the exception of the "Niederdeutsche Phonologie" word list and "Niederdeutsche Phonologie" sentence list). The recording locations in Namibia are various places, regionally scattered. The recordings consist of biographical information and several language data questionnaires: "Nordwind und Sonne", Wenkersätze, "Deutsch heute" word list and "Deutsch heute" sentence list. A total of 66 different speakers were recorded. As requested, the material of one speaker who was a minor at the time of recording will not be made publicly available. Accordingly,65 speakers are represented in the GDSA corpus as presented in the database for Spoken German .

65 Events 155 Speech events 75 Speakers
...
GWSS
Spoken Academic Language

GeWiss is a research project in spoken academic language. It provides a multilingual (German/English/Polish/Italian) corpus of audio recordings and transcriptions of academic communications, as an empirical foundation for comparative research. To this end, the GeWiss corpus focusses on two main genres of spoken adademic language: talks including discussions, and oral exams, and it explicitly distinguishes between L1 and L2 subcorpora. The corpus is enlarged and developed continuously.

417 Events 436 Speech events 735 Speakers
...
HL
German Standard Pronunciation

The corpus "German Standard Pronunciation" (HL--) was produced at the German Language Archive (DSAv) as part of the project "Hochlautung" (Standard pronuniation). The project leaders were Gerold Ungeheuer, Werner Besch and Edeltraud Knetschke. The externally accessible version of the corpus HL-- comprises 27 audio recordings (partly in variants of different lengths) from the period 1971 to 1975, with a total duration of 1 hour and 57 minutes. These are recordings of 9 television journalists, news presenters and government spokespersons (women and men) in speech events of various kinds (including report, discussion contribution, interview, commentary, moderation, report, statement), which took place in the context of radio broadcasts and press conferences. The recordings were digitized at the Archive for Spoken German (AGD) (formerly: Deutsches Spracharchiv). The transcripts of the corpus HL-- are published in: Edeltraud Knetschke, Margret Sperlbaum (1987): Zur Orthoepie der Plosiva in der deutschen Hochsprache [On the Orthoepy of Plosives in the German Standard Language]. Phonai vol. 33. The Archive for Spoken German holds 27 transcripts (with orthographic transcription, lemmatization, POS-tagging) that were digitized at the AGD and which have been synchronized with the sound (aligned). Further available are word and lemma lists ordered alphabetically and by frequency. The parts of the corpus HL-- that can be made accessible to external users are provided via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

27 Events 27 Speech events 9 Speakers
...
© Anne Betten
IS
Emigrant German in Israel

Most of the corpus Emigrant German in Israel (IS--) was produced in the framework of a DFG project on the language and cultural identity of emigrants 50 to 60 years after their immigration to Palestine/Israel.The projects leader was Anne Betten (initially at University of Eichstätt, later at University of Salzburg). In the first phase of the study, the main focus of analysis lay on syntactic-stylistic studies of the largely written-oriented "Bildungsbürgerdeutsch" (the German of the educated citizenry) of the interviewees and on sociolinguistic studies of the variables that influenced linguistic competence in German and in the second languages Hebrew and English, as well as on the forms and functions of code-switching. This first phase was followed by a large number of predominantly conversation-analytical studies on language and identity, interaction and metaphorization by a larger circle of other conversation researchers. Starting in 2000, the first part of the collection was supplemented by some video recordings with former interviewees, a video interview with a new interviewee, some mostly thematic additional interviews (e.g. on the subject of childhood by Michaela Metz / Salzburg), additional audio recordings with former interviewees (20 years after the first interviews), further initial interviews with new interviewees conducted by Johannes Schwitalla (University of Würzburg), among others, and a discussion group with participants from all three Israel Corpora (IS, ISW, ISZ). References to further project information can be found on the homepage of Anne Betten, German Studies, University of Salzburg: http://www.uni-salzburg.at/ger/anne.betten (link under "Research areas - Information on Israel projects"). The version of the corpus Emigrant German in Israel (IS--) that is archived at the IDS includes audio and video recordings of 178 conversational events involving 181 documented speakers. The recordings date from the years 1989 to 2011 and have a total duration of 284 hours and 40 minutes. They are mainly recordings of argumentative-narrative autobiographical interviews with many monologic, but also strongly dialogical passages with 181 Jewish emigrants ("Jeckes") from all German-speaking regions of Central Europe. The interviews were mostly conducted in the private homes of the interviewees. Themes running through all the interviews are childhood and youth in Germany/Austria, experiences of anti-Semitism, flight/emigration, new beginnings, cultural reorientation; in addition, there are many individual reports. Most of the interviewees speak standard German with at best slight regional touches. The recordings were digitized and technically re-engineered at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). 104 transcripts of various types (including uncorrected transcripts) are held at the archive. 16 of these transcripts were synchronized (aligned) with the audio in the AGD. Based on the metadata, a list of linguistic peculiarities was created. Also available are detailed tables of contents, linguistic comments and original questionnaires. The recordings and 22 transcripts are available online via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or in physical form through the AGD's personal service. 82 uncorrected transcripts are only accessible via the AGD's personal service.

187 Events 187 Speech events 185 Speakers
...
© Anne Betten
ISW
Emigrant German in Israel: Viennese in Jerusalem

The recordings collected for the corpus "Emigrantendeutsch in Israel: Wiener in Jerusalem" ["Emigrant German in Israel: Viennese in Jerusalem"] (ISW-) were intended to serve as a supplement to those collected for the earlier corpus "Emigrantendeutsch in Israel" ["Emigrant German in Israel"] (IS--). As in the case of the corpus IS--, the collection of recordings for ISW was directed by Anne Betten from the University of Salzburg, Austria. The largest part of the recordings making up the ISW corpus was collected during an excursion to Israel in December 1998 by students and teachers of German Studies. One exception to this rule consists of Anne Betten's one-on-one interview with Ari Rath, which was begun during the aforementioned excursion but continued in later sessions in Salzburg and Jerusalem. In 2010/11 the corpus was expanded by Michaela Metz (Salzburg) with three additional interviews with partners who had already been interviewed in 1998. The version of the corpus "Emigrant German in Israel: Viennese in Jerusalem" (ISW-) that archived at the Institute for the German Language comprises 28 sound recordings from 1998 to 2011. The speakers on the recordings are 24 Jewish men and women who were between 69 and 90 years old at the time of recording, were born or had grown up in Austria (mostly in Vienna), and lived in Jerusalem. The majority of the speakers left Austria after the "Anschluss" (Annexation of Austria into Nazi Germany in 1938) without their parents, with the support by the Jugendalija organization (Youth Aliyah). All of the speakers speak standard German (if more or less strongly Austrian-coloured), using dialect mostly only in quotations or in personal remarks addressed to their Austrian interviewers. The narrative biographical interviews focus on the biographies of the speakers before and after emigration and the associated change of language and culture, but also allow for spontaneous thematic developments. On the model of the recordings in the Corpus IS--, some of the recordings were made in the homes of these former Austrians. The recordings were digitized and / or sound-engineered at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). 20 transcripts (orthographic raw transcripts, additional notations) and 4 uncorrected transcripts are archived. Detailed tables of contents are available as well. The recordings and the 20 transcripts are made available in the database for spoken German (DGD). Individual sound recordings and transcripts can also be provided for download or on physical media via the personal service of the AGD. The uncorrected (raw) transcripts are only accessible via the personal service of the AGD.

28 Events 28 Speech events 24 Speakers
...
© Anne Betten
ISZ
Second generation German-speaking Migrants in Israel

The corpus Second generation German-speaking Migrants in Israel (ISZ-) was launched following the projects “Language preservation after emigration - German in the 20s in Israel” (corpus IS--) and “Austrian emigrants in Jerusalem” (corpus ISW-). The recordings contained in the ISZ- corpus were made in two recording phases, 1999/2000 and 2004-2006, by Anne Betten in Israel. In 2010-2012 the material was further expanded by two additional interviews recorded by Michaela Metz (Salzburg) and a discussion round with interview partners. The recodings in the ISZ- corpus are interviews with 63 descendants of German-speaking Jews ("Jeckes"), especially the children of the interview partners represented in the corpora IS and ISW. The majority of the 67 interviews were conducted entirely or largely in German, but a smaller part was conducted mainly in English. In some cases, the matrix language changes several times between German and English, with the interviewer often trying to initiate a switch back to German, which often succeeds when the conversation is about memories of the childhood home, visits to German-speaking countries and similar topics related to the German language. Thematically, the interviews focused on the question of how the interviewees felt as children of German-speaking Jews (“Jeckes”) from their childhood through the present and how growing up in two cultures affected the formation of their identity. The ISZ- corpus provides material for research on the connection between language skills, language attitudes and social experiences as well as for work on functional code switching. The version of the corpus ISZ- that is archived at the IDS comprises 70 sound and video recordings made in the period from 1999 to 2012, with a total duration of 125 hours and 27 minutes. The recordings were digitized and technically re-engineered at the Archive for Spoken German (AGD) (formerly: German Language Archive). 64 uncorrected transcripts are stored in the archive. In addition, there are detailed tables of contents available as well as a list of linguistic peculiarities. The recordings and additional materials are provided online through the Database for Spoken German (DGD. Individual recordings can also be obtained for download or on physical media through the personal service of the AGD. The uncorrected transcripts are only accessible via the personal service of the AGD.

68 Events 68 Speech events 63 Speakers
...
© Projekt 'Jugend, Kommunikation, Medien'
JK
Youth communication

The corpus of youth communication was collected in the Rhine-Main area of Germany as part of a project at the University of Frankfurt. The project's manager was Klaus Neumann-Braun. The aim of the project was to investigate the everyday communication culture of young people by means of an ethnographic conversation analysis study. The focus of interest was on linguistic-interactive in-group methods used by young people to form their group into a community and the forms of social categorization that they use to make sense of themselves and their social environment. Both formal and functional aspects of communicative practices were to be analysed. The ethnographic perspective aimed to gain insight into the variation and changes in practices over time and their dependence on different participation structures and situational contexts. The part of the corpus JK that is currently accessible to external users comprises 6 sound recordings with a total duration of 4 hours and 42 minutes from the period 1996 to 1999. These recordings involve adolescents or (in the later phases) young adults who lived in a small rural town in the Rhine-Main area and regularly visited the youth centre there. The speakers were accompanied during some of their activities by the recording managers, who simultaneously also acted as supervisors. This resulted in recordings of conversations in which plans were made as well as of informal conversations. The recordings were digitized at the Archive for Spoken German. The parts of the corpus JK-- that are available to external parties are made available online via the Database for Spoken German (DGD). Individual recordings may also obtained for download or on physical media through the personal service of the AGD. Translated with www.DeepL.com/Translator (free version)

6 Events 6 Speech events 17 Speakers
...
KN
German standard language: König-Korpus

The corpus "German standard language: König-Korpus" (KN--) was created by Werner König (University of Augsburg). His analyses of the corpus are published, among others, in: Werner König (1989): Atlas zur Aussprache des Schriftdeutschen in der Bundesrepublik Deutschland. 2 Bde. Ismaning: Hueber. [Werner König (1989): Atlas on the pronunciation of Standard German in the Federal Republic of Germany. 2 vols Ismaning: Hueber.] For legal reasons, only a small section of the KN-- corpus can be made available to external parties. 43 audio recordings from 1975 with a total duration of 5 hours and 48 minutes are accessible. These are recordings of 43 students and academics (women and men) aged 17 to 27, from 43 locations that were relatively evenly spaved across the old federal states of then West Germany. The speakers were born and raised in the selected locations. Moreover, in all cases, at least one parent of the speaker comes from the same location or its vicinity. On the recordings which are accessible to external parties, the reading language of the informants can be heard as they read out a section of Germany's Basic Law (its constitution). The recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). The recordings were transcribed and the transcripts synchronized (aligned) with the audio. There are also word and lemma lists available. The parts of the KN-- corpus for which dissemination rights could be secured are made available in the database for spoken German (DGD). Individual sound recordings and transcripts can also be provided for download or on physical media via the personal service of the AGD.

43 Events 43 Speech events 70 Speakers
...
© IFM, LMU München
MEKI
Multilingual Children at Pre-school Age (MEKI)

The Corpus Multilingual Children at Pre-school Age (MEKI) was created as part of a study that accompanied the implementation of a language support program. The aim of the study was to examine the linguistic development of children aged 5-7 years who had not yet entered primary school under the conditions of a language support program over a period of nine months. The data was collected over a period of 10 months in the context of language didactic offerings made at day-care centres or as part of a preparatory course at a primary school. The groups of children observed consisted of eight to twelve children. The group events were documented with video cameras. As a participating observer, the researcher was marginally involved in the events. The video equipment was set up and disassembled together with the children, so that the children experienced the recording situation as part of the overall situation. In addition to the events in the language support group, stories were recorded in elicitation settings. The version of the MEKI corpus that is archived at the IDS contains 85 recordings with a total duration of 3 hours and 8 minutes. 82 transcripts are available as well as word and lemma lists ordered alphabetically and according to frequencies. The recordings have been sound-edited and anonymized at the Archive for Spoken German (AGD). The transcripts (orthographic transcription; orthographic normalization; lemmatization; POS tagging) were checked in the AGD, amended as necessary, corrected and synchronized with the audio signal (aligned). The version of the corpus MEKI archived at the IDS is available online via the Database for Spoken German (DGD).

55 Events 86 Speech events 28 Speakers
...
© Aaron Schmidt-Riese
MEND
Mennonite Plautdietsch in North and South America

The MEND corpus, which was collected by Göz Kaufmann in the years from 1999 to 2002, consists of the plautdiet translations of 46 stimulus sentences by 321 Mennonite informants. In total, there are about 14,500 usable sentence translations with a total recording time of about 40 hours. This corpus was prepared by the AGD in collaboration with Göz Kaufmann and Aaron Schmidt-Riese. As a rule, Spanish stimuli were used in Mexico, Paraguay and Bolivia, while Portuguese stimuli were used in Brazil and English stimuli in the US. The number of interviews per colony fluctuates considerably: Mexico: 103 informants from the area around Ciudad Cuauhtémoc / Chihuahua --- Paraguay: 42 informants from the Menno colony with its central town of Loma Plata; 37 informants from the Fernheim colony with the central location Filadelfia; 2 informants from the colony Neuland --- Bolivia: 8 informants from Colonia Canadiense in the area of ​​Santa Cruz de la Sierra --- Brazil: 56 informants from Colônia Nova / Rio Grande do Sul --- USA: 67 informants from Seminole / Texas and 6 informants who lived in Seminole / Texas at the time of recording but had lived there for less than 5 years. These informants had mainly lived in Canada (USA-9 / USA-23 / USA-24 / USA-45), Mexico (USA-26) or in other parts of the USA (USA-18). The 46 stimulus sentences (archived as part of the additional material) cover different sentence types. In addition to six main sentences (sentences 41-46), the following futher sentence types were queried: ten complement clauses in postposition (sentences 1-10), ten conditional clauses in pre-position (sentences 11-20), ten causal clauses in post-position (sentences 21-30) and ten sentence-medial or sentence-final relative clauses (sentences 31-40). All subordinate clauses require the additional translation of a matrix sentence. Main clause 42 also contains a preceding temporal clause. Eighteen stimulus sentences aim at translations with a (particle) verb (sentences 1-4, 11-14, 21-24, 31-34, 41 + 42), eighteen stimulus sentences for translation with two verbs (modal verb + infinitive, or tense auxiliary + participle; sentences 5-8, 15-18, 25-28, 35-38, 43 + 44) and ten stimulus sentences for a translation using three verbs (9 sentences with counterfactual proposition and modal verb and 1 sentence (sentence 9) with an epistemic modal verb with infinitive perfect; sentences 9 + 10, 19 + 20, 29 + 30, 39 + 40, 45 +46). Naturally, deviations from these expected productions were common (especially tun-periphrases and four-part verb complexes in counterfactual propositions). The main verbs in the sentences almost always govern a direct object, which allows for easier determination of the positions of the finite and non-finite verbal elements. Adverbs or markers of negation were also included in some sentences. The sentences were read to the respective informant individually and then immediately translated without the help of a written version. This oral translation of sentences, some of which are of considerable complexity, naturally requires the informants to have good competence in the source and target languages. This automatically excluded some groups of people, namely those who have little command of the majority language of their home country (especially the (older) women in Mexico and Bolivia), and those who have largely lost Plautdietsch (some younger Mennonites in the United States and Brazil). In the event of translation problems, the stimulus in question was repeated either immediately or at the end of the interview.

321 Events 321 Speech events 322 Speakers
...
MV
Domestic German Varieties: Varia

The corpus MV-- was established by the German Language Archive (DSAv) as an archive corpus under the name "Domestic and foreign German dialects: Varia" with the code VII to store sound recordings of varied provenance which did not originate from the DSAv's own projects. Among these were some larger recording collections, many small collections and numerous individual recordings that were transferred to the DSAv by external scholars, e.g. from collaborators in the USA or Australia, for publication in the DSAv's PHONAI series, from projects developed in cooperation with the DSAv, as a gift or from the estate of researchers. Between about 1955 and 1985, a total of 360 sound recordings or sound carriers were placed in corpus VII. In the main, these were dialect and colloquial language recordings from the entire German-speaking area, from German-speaking enclaves in Europe and from the non-European German language enclaves that were hardly represented in the large variety corpora held by the DSAv (e.g. ZW, OS) . 184 sound recordings, whose processing status corresponded to the then common standard of the DSAv, were published in 1992 in the DSAv's general catalogue of sound recordings (PHONAI Vol. 38/39). The same recordings were included in the first Database for spoken German (DGD) of the DSAv. 109 recordings remained in the corpus, which was henceforth labelled MV. 75 recordings with speakers from the USA, Canada and Mexico were moved to a new corpus "Deutsch in Nordamerika" (NA), which was later to be expanded to include further recordings from this area. These plans were not pursued further when the Archive for Spoken German (AGD, the successor organization of DSAv) developed DGD2, the updated second version of the Database for spoken German. Accordingly, the recordings from the NA corpus were reintegrated into the MV corpus. In order to be able to present the Archive's recordings from linguistic enclaves, two new corpora were split off from the MV corpus in 2019. The first corpus, "Extraterritoriale Varietäten Varia" (MVEX), consists of 46 re-engineered recordings from linguistic enclaves that were previously part of MV as well as 48 other recordings from linguistic islands that until then had been part of the old corpus VII and which were newly worked up for inclusion in DGD2. The second corpus split off from MV, "German in Wisconsin" (WISC), consists of a closed set of 64 recordings from Wisconsin/USA. At this time, only 72 standard language recordings from the contiguous German-speaking area in Europe remain as part of MV. In the future, these are to form a new corpus together with further domestic German variety recordings from the old corpus VII which are still to be worked up. The 72 recordings currently in MV have a total duration of 20 hours and 8 minutes. They were collected by the US-American scientist Carol Tokosh in 1972 for an investigation of the current language in Germany. 6 speakers (3 women/3 men, younger and middle-aged) were recorded in each of eight smaller cities in different linguistic regions of the old Federal Republic of Germany [then West Germany]. In the sam way, 6 speakers were recorded as well as in two cities in Austria and Switzerland, respectively, for a total of 72 speakers. Each speaker produces a list of words and a narrative on prespecified topics in standard and colloquial language, respectively, for a total of 72 speech events. Copies of the sound recordings were transferred to the DSAv and digized later by the AGD. Digital transcripts are not available. The sound recordings of the corpus MV-- are available online via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

72 Events 72 Speech events 72 Speakers
...
MVEX
Extraterritorial Varieties: Varia

The corpus "Extraterritorial varieties: Varia" (MVEX) consists of 94 recordings representing varieties from German language enclaves in Europe (Poland, Ukraine, Romania, Serbia, Italy) and outside of Europe (Canada, USA, Mexico, South Africa, Australia, New Zealand). The speakers represented in the corpus are mainly Mennonite Germans, Pennsylvania Germans, Australian Germans and Danube Swabians. These sound recordings from the period between 1958 and 1983 were taken in by the Deutsches Spracharchiv (DSAv) from foreign scholars and archived in the corpus "Inland and foreign German dialects: Varia" (VII, later MV). 46 of the recordings that are now in MVEX were selected in 1992 for the DSAv general catalogue (PHONAI vol. 38/39) and included in the database for spoken German (DGD) of the DSAv. 9 of the 46 recordings were integrated into the corpus "Deutsch in Nordamerika" (NA). In the second major version of the Database for Spoken German, DGD2, the 46 recordings were initally accessible as part of the MV-- corpus (see there). They have since been technically re-engineered and transferred to the corpus MVEX (while retaining the numeric part of the ID they had as part of MV). The remaining group of 48 recordings in MVEX are accessible via the DGD for the first time, after technical editing was performed and metadata compiled. The total duration of the recordings in MVEX is 28 hours and 31 minutes. The recordings are with 126 speakers (women and men) from Australia, Germany, Italy, Canada, Mexico, New Zealand, Austria, Poland, South Africa and the USA. The speech events captured are of various kinds, mainly narratives, conversations, translations and standard texts (control sentences, Wenker sentences, word lists, numbers, weekdays, months). The recordings have been digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). The transcripts of 17 recordings from Australia, Italy, Canada and the USA were published in volumes 6, 10, 18, 21 and 31 of the PHONAI series. No digital transcripts are held by AGD. The sound recordings of the MVEX corpus are available via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media by way of the personal service of the AGD. Similar recordings to those in MVEX can be found in the DGD as part of the corpora "Australiendeutsch" (AD), "Mennonitenplautdietsch in North and South America" (MEND), "Spoken German in Southern Africa (GDSA), "Russlanddeutsche Dialekte" (RUDI) and "German in Wisconsin" (WISC). (WISC also originated from the old corpus MV--.) Data that is relevant for German linguistic enclaves in Europe can also be found in the Zwirner corpus (ZW) as well as in the corpus "German dialects: former German Eastern territories" (OS). Further similar recordings are part of AGD corpora which are not (yet) prepared for distribution through the DGD, in particular "German in New Zealand" (NZ). Translated with www.DeepL.com/Translator (free version)

94 Events 94 Speech events 126 Speakers
...
OS
German Dialects: Former German Eastern Territories

The corpus "Deutsche Mundarten: ehemalige deutsche Ostgebiete" ["German Dialects: Former German Eastern Territories"] (OS--) was compiled by the German Language Archive (DSAv) in cooperation with the Research Institute for the German Language "German Language Atlas" (Marburg). In order to complement the corpus "Deutsche Mundarten: Zwirner-Korpus" ["German Dialects: Zwirner-Korpus"] (ZW--), the two institutions undertook a data collection effort with the aim of documenting, to the extent possible, the dialects of the contiguous German language area in the former German East as well as the German language enclaves in Eastern and South-Eastern Europe. Additional project information is published in: Bellmann, Günter / Göschel, Joachim (1970): Tonbandaufnahme ostdeutscher Mundarten 1962-1965. Gesamtkatalog. Marburg (= DDG 73). [Bellmann, Günter / Göschel, Joachim (1970): Tape recording of East German dialects 1962-1965, complete catalogue. Marburg (= DDG 73).] The corpus OS-- comprises 981 sound recordings made in the years from 1962 to 1965 with a total duration of 462 hours and 5 minutes. These are recordings with 987 elderly resettled ethnic Germans (women and men) from the former German Eastern territories who are speakers of East and Southeast German dialects, representing the state of their language varieties of German before 1945. Recordings were made of various types of speech events, especially narratives, conversations and standard texts (weekdays, numbers, phrases of remembrance). The recordings were digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). The AGD archive holds 280 transcripts that it digitized and synchronized (aligned) to the audio. The textual transcriptions hew close to the standard language and follow the old orthography. They come with additional notes by the transcribers, lemmatization, and POS tagging. Based on the metadata, a list of topics, a list of linguistic peculiarities and a list of the professions of the speakers were created. Also available are word and lemma lists ordered alphabetically and by frequency. The corpus OS-- can be accessed via the Database for Spoken German (DGD). Individual sound recordings and transcripts can also be provided for download or on physical media via the personal service of the AGD.

981 Events 981 Speech events 987 Speakers
...
PF
German colloquial languages: Pfeffer corpus

The corpus "Deutsche Umgangssprachen: Pfeffer-Korpus" ["German colloquial languages: Pfeffer corpus"] (PF--) was created as part of a project undertaken by the Institute for Basic German at Stanford University in collaboration with the German Language Archives (DSAv), the Institute for German Language and Literature at the Academy of Sciences of the German Democratic Republic, and further institutions. The project's leaders were Alan Pfeffer and Walter F. W. Lohnes. The PF-- corpus comprises 398 sound recordings made in 1961 with a total duration of 79 hours and 15 minutes. These are recordings of reports, stories and lectures by 402 speakers (women and men) of different ages, with different levels of education and different professions from the Federal Republic of Germany [then West Germany], the German Democratic Republic [then East Germany], Austria and Switzerland. The recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). The transcripts of the corpus PF-- and further project information are published in: J. Alan Pfeffer, Walter F. W. Lohnes (Hrsg.)(1984): Grunddeutsch. Texte zur gesprochenen deutschen Gegenwartssprache. (Phonai, Bde. 28-30) Tübingen: Niemeyer. [J. Alan Pfeffer, Walter F. W. Lohnes (ed.) (1984): Basic German. Texts from the contemporary spoken German language. (Phonai, Vol. 28-30) Tübingen: Niemeyer]. The template for the publication was created by the former Linguistic Data Processing Unit of the Institute for the German Language (IDS). The corpus as archived is made up of 398 transcripts, with text spelled according to the old German orthography rules, punctuation marks following old standard punctuation practices, additional notations, lemmatization, and POS tagging. The transcripts were also synchronized (aligned) to the audio by the AGD. Along with the transcripts, further resources are available: namely a list of topics, a list of linguistic peculiarities, a list of speakers' professions, created on the basis of the metadata, as well as alphabetical and frequency-based word and lemma lists. The corpus PF-- is provided in the database for spoken German (DGD). Individual sound recordings and transcripts can also be obtained for download or on physical media via the personal service of the AGD.

398 Events 398 Speech events 402 Speakers
...
RUDI
Russian-German Dialects

Das Korpus Russlanddeutsche Dialekte entstand im Rahmen von mehreren Projekten der Universitäten Tomsk und Omsk (Sibirien) im Zeitraum von 1959 bis 1989. Der besondere Wert des Korpus besteht darin, dass es authentische Sprachaufnahmen aus den heute nicht mehr existierenden deutschen Sprachinseln im östlichen Teil der ehemaligen Sowjetunion erfasst. Es handelt sich um Erzählungen, Interviews und Erinnerungen der Russlanddeutschen. Das Korpus repräsentiert den Sprachgebrauch kompetenter Mundartträger in damals intakten Sprachinseln und veranschaulicht die sieben Haupttypen der russlanddeutschen Dialekte: Hessisch, Schwäbisch, Bairisch, Südfränkisch, Wolhyniendeutsch, Niederdeutsch und Pfälzisch. Das Korpus ist nicht nur aus sprachlicher Sicht auf deutsche Mundarten in diesen Sprachinseln interessant, sondern spiegelt auch viele Facetten des Alltagslebens und der Kultur der Russlanddeutschen in diesem Zeitraum wider.

20 Events 286 Speech events 20 Speakers
...
SA
Children's language: Saarbrücken Corpus

The Corpus "Children's language: Saarbrücken" (SA--) was created under the auspices of a project at the University of Saarbrücken. The project's leaders were Rainer Rath, Hubert Immesberger and Josef Schu. The corpus was used to investigate the late-stage, undirected language acquisition of children of Turkish and Italian descent. The corpus SA-- comprises 65 sound recordings from the period between 1982 and 1984 with a total duration of 4 hours and 33 minutes. They recordings were made in situations of participatory observation in Saarland. The recordings are of child-adult interactions. The recordings focus on two Turkish, two Italian and two German children aged 9 to 13 years. Recordings were made of various types of speech events (among them descriptions, narratives, summaries, retellings, planning, games, conversations, appointments and directions). The recordings were digitized at the Archive for Spoken German (AGD) (formerly: Deutsches Spracharchiv). The transcripts of the corpus SA-- are published in book form in: Rainer Rath, Hubert Immesberger, Josef Schu, (1987): Kindersprache - Texte italienischer und türkischer Kinder zum ungesteuerten Zweitspracherwerb. Mit Vergleichstexten deutscher Kinder. [Children's language - Texts of Italian and Turkish children with relation to undirected second language acquisition. With comparative texts of German children ]. Phonai, vol. 32. Tübingen: Niemeyer. Digital transcripts of the recorings are not available. The recordings of the corpus SA-- are made available online via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

48 Events 48 Speech events 46 Speakers
...
SR
Slavic Dialects in the Ruhr Valley

The Corpus Slavic Dialects in the Ruhr Valley (SR--) was created within a project of the Department of Slavic Studies at the University of Bochum in cooperation with the German Language Archive (DSAv). The project's manager was Christian A. van den Berk. The corpus SR-- comprises 31 audio recordings from 1969. 23 recordings with a total duration of 6 hours and 40 minutes are made accessible. These are recordings of narratives by 23 men and women between 17 and 78 years of age. In addition to the Slavic languages of interest (Polish, Slovenian, Ukrainian, Russian), German was also spoken during the sessions. The recordings, which were made in Bochum, Bottrop, Essen, Herne and Recklinghausen, have been digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). No transcripts are available for the recordings. We do, however, make available a list of topics created on the basis of the metadata. The corpus SR-- is available online via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

23 Events 23 Speech events 23 Speakers
...
SV
German Dialects: Southwest Germany and Vorarlberg

The corpus "German Dialects: Southwest Germany and Vorarlberg" (SV--) was compiled within the framework of a project at the Tübingen Centre for Language in Southwest Germany. (At the time, the Centre was a branch of the German Language Archive). The project's leaders were Arno Ruoff and Eugen Gabriel (the latter in charge of Vorarlberg). The sound recordings were intended to supplement the corpus of "German dialects: Zwirner-Korpus" (ZW--) so that the combination of SV and ZW would yield denser spatial coverage of Southwest Germany and Vorarlberg than ZW alone did. Accordingly, the recording campaign was based on the recordings made for the corpus ZW-- in southwest Germany. Further information about the project is published in: Ruoff, Arno (1973): Grundlagen und Methoden der Untersuchung gesprochener Sprache [Fundamentals and Methods of the Study of Spoken Language]. Tübingen. (Idiomatica Vol. 1). The SV-- corpus comprises 242 sound recordings made in the years from 1966 to 1970 with a total duration of 72 hours and 6 minutes. The recordings feature 242 different speakers, both women and men. Some of them were informants for the linguistic atlas of Vorarlberg and Liechtenstein and some of them were persons from Bessarabia (Ukraine) who had settled in Württemberg. Tales and standard texts (weekdays, numbers) were recorded. The recordings were digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). Based on the metadata, a list of topics and a list of the professions of the speakers were created. The corpus SV-- is available online via the Database for Spoken German (DGD). In addition, individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

242 Events 242 Speech events 242 Speakers
...
SW
German Dialects: Black Forest

The corpus "German dialects: Black Forest" (SW--) was created as part of a project at the Tübingen Center for the Language of Southwestern Germany. (At the time, the Center was a branch of the German Language Archives DSAv). The project's manager was Arno Ruoff. The recordings for SW were intended to supplement the corpus "German dialects: Zwirner corpus" (ZW--), leading to denser spatial coverage of the Black Forest area. The recording campaign thus took the recordings made for the ZW-- corpus as a point of reference. The data should, among other things, allow analyses of the local languages of three hamlets (Schönmünz, Romishorn and St. Roman). Further information about the project is published in: Ruoff, Arno (1973): Grundlagen und Methoden der Untersuchung gesprochener Sprache [Fundamentals and Methods of the Investigation of Spoken Language]. Tübingen. (Idiomatica vol. 1). The corpus SW-- is made up of 126 audio recordings made in the years 1964 and 1974 with a total duration of 37 hours and 31 minutes. It consists of recordings with 126 speakers (women and men) from the districts of Freudenstadt and Wolfach. Recordings were made of readings, narratives and standard texts (weekdays, numbers). The recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). Based on the metadata, a list of topics and a list of the professions of the speakers were created. The corpus SW-- is available online via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

126 Events 126 Speech events 122 Speakers
...
WISC
German in Wisconsin

The corpus "German in Wisconsin" (WISC) consists of 120 sound recordings covering different German varieties found in the state of Wisconsin in the Midwest of the US. The recordings were collected in 1968/1969 by Jürgen Eichhoff from Madison, Wisconsin, who had moved to the US from Germany . The recordings represent mainly Low German varieties of German. The speakers recorded are mostly 3rd generation descendants of German immigrants from Pomerania, Mecklenburg, Schleswig-Holstein, and Lower Saxony). In addition, Central German ("Dane County Kölsch") and Upper German varieties (Bavarian/Austrian) and Schwyzerdytsch, as well as various standard language speakers figure in the corpus. Copies of 64 sound recordings were made by the Deutsches Spracharchiv (DSAv). These were later incorporated into the corpus "Binnen- und auslandsdeutsche Mundarten: Varia" (VII, later MV), which also held further recordings by other researchers from outside Germany. The 64 copied recordings were selected in 1992 for the DSAv catalogue (PHONAI vol. 38/39) and made publicly available through the Database for spoken German (DGD) of the DSAv as part of the corpus "Deutsch in Nordamerika" (NA). In the second version of the Database for spoken German, DGD2, these recordings were until recently accessible as part of the corpus MV-- (see there). After technical re-editing they were subsequently separated out from MV as a new corpus WISC. The remaining batch of 56 recordings that had not been copied earlier were finally taken over as digital copies from the Max Kade Institute at Madison in 2004. These 56 recordings are accessible for the first time via the DGD. The WISC corpus in the DGD consists of 65 recording sessions with 99 speech events, with a total duration of 79 hours. The corpus features recordings of 63 distinct speakers (women and men, mostly from the older generation) who come from different regions of Wisconsin. The recordings cover speech events of various kinds: narratives, standard texts (phrases, word list, numbers, weekdays) and free interviews. 64 recordings (narratives, Wenker sentences) have been digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv), 56 recordings (word lists and interviews) were taken in as digital versions from Madison/Wisconsin and technically edited in the AGD. No digital transcripts are available for any of the recordings. The audio recordings of the corpus WISC are made available via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD. Similar recordings to those in WISC can be found in the DGD as part of the the corpus "Extraterritoriale Varietäten: Varia" (MVEX).

65 Events 99 Speech events 63 Speakers
...
ZW
German dialects: Zwirner corpus

The corpus "Deutsche Mundarten: Zwirner-Korpus" (German dialects: Zwirner corpus (ZW--)) was created as part of a project whose goal it was to document the German dialects as completely as possible. This project was carried out at the Deutsches Spracharchiv (German Language Archive (DSAv)), a forerunner of today's Archive for Spoken German, and managed by Eberhard Zwirner, after whom the corpus later came to be named. Data collection took place mainly in the period from 1955 to 1961 by way of a survey that was conducted in the states of then West Germany as well as in German-speaking areas of France, Liechtenstein, the Netherlands and Austria. Within the individual regions, the selection of informants and the collection of data was performed by dialectologists with relevant expertise or by members of the project teams for the respective regional dialect dictionaries. While these researchers were responsible for the preparation and realization of the recordings, the sound engineering was carried out by a DSAv sound engineer. For the selection of recording locations, a square grid with a side length of sixteen kilometers was laid over the area and at least one location was selected in each grid square. As a rule, three autochthonous speakers were interviewed at each location, if possible one from the younger, the middle and the older generation (around 20 years, 40 years and over 60 years of age, respectively). Thanks to the inclusion of refugees, displaced persons and immigrants from the former German eastern territories and from Eastern and Southeastern Europe, numerous samples of dialects from these areas could be recorded, which in the 1950s - shortly after the speakers' displacement - were still largely unaffected by the new language environments that the speakers found themselves in. In addition to dialectal contributions, colloquial and standard-language contributions were also included. The ZW-- corpus comprises 5796 audio recordings (including approximately 90 recordings in Dutch) from the period between 1955 and 1972, with a total duration of 1077 hours and 15 minutes. 5887 persons (women and men) are documented as speakers in the transcripts. Speech events of various types were recorded, among them narratives and standard texts (days of the week, numbers, Wenker sentences, sentences from the Pfälzisches Wörterbuch (Palatinate Dictionary) , dialect-geographical test sentences by Theodor Baader). The recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). At the time of writing, there are 2495 digital transcripts archived as part of the Zwirner corpus. The transcripts contain a standard language word-by-word transcription, transcribers' notes, lemmatisation, and POS tagging. (Note that some of the transcripts follow older German orhographic conventions while others follow the newer rules that have been in place since the mid-1990s.) All transcripts were synchronized (aligned) with the audio. Based on the metadata, a list of topics, a list of notable linguistic features and a list of the professions of the speakers were created. There are also word and lemma lists, arranged alphabetically and according to frequency. The corpus ZW-- is provided for use via the online platform Database for Spoken German (DGD), but individual sound recordings may also be obtained through the personal service of the AGD.

5796 Events 5796 Speech events 5887 Speakers