This page gives an overview of the 41 spoken language corpora of the Archive for Spoken German which are available online.

...
AD
Australian German

The corpus Australian German ("Australiendeutsch") consists of audio recordings from several projects at Monash University in Melbourne on the state and use of the German language among ethnic German immigrants and their descendants. The project leader in each case was the Australian linguist Michael Clyne. The audio recordings and other materials such as metadata and transcripts were transferred from Michael Clyne's estate to the Archive for Spoken German (AGD) at the IDS in 2012/13, where they were restored and digitized. 220 sound recordings with a total duration of 64 hours and 19 minutes from the period 1966 to 1973 could be made accessible externally in four subcorpora. The recordings in three of the four sub-corpora originate from rural areas in South Australia (Barossa Valley et al.: 46 recordings; AD--_E_00001 - AD--_E_00049; 1967) and Victoria (Western District of Victoria: 54 recordings; AD--_E_00051 - AD--_E_00106; 1966, 1970; Wimmera: 91 recordings; AD--_E_00108 - AD--_E_00198; 1969, 1972, 1973). The speakers are mostly third-generation Australian residents; their ancestors emigrated from Silesia or Mecklenburg in the mid-19th century, mostly for religious reasons. In the fourth subcorpus, the so-called "Pre-War Speakers" represent the metropolitan, educated middle-class population of Melbourne (AD--_E_00199 - AD--_E_00227; 29 recordings; 1969). These speakers are first--generation immigrants (or their descendants) who immigrated to Australia before World War II primarily from large German and Austrian cities. The recordings contain narratives, interviews, and pictorial descriptions by or with 333 mostly older women and men. 168 of these recordings have been transcribed. The transcripts (orthographic transcription ; orthographic normalization ; lemmatization ; POS tagging) were revised and aligned to the audio at AGD. A list of topics was created based on the metadata. Word and lemma lists are available, arranged alphabetically or by frequency. The corpus AD-- is made available via the Database for Spoken German (DGD), and individual sound recordings can also be obtained through the AGD's personal service. Additional recordings from German language islands in Australia (Barossa Valley and Queensland) are available in the DGD as part of the corpus "Extraterritorial Varieties: Varia" (MVEX).

220 Events 220 Speech events 333 Speakers
...
BB
German Dialects: Böblingen district

The corpus "German Dialects: Böblingen district" (BB--) was created by Ulrich Engel. Based on the collected data, he carried out investigations on the dialect stratification and the dissolution of the dialect, among other things. The corpus BB-- comprises 73 sound recordings from the period 1963 to 1967 with a total duration of 42 hours and 28 minutes. These are recordings of narratives, conversations and the reading of dialect speakers, women and men of different ages from 35 places in the Böblingen district (according to the layout before the reform of the local government). 1994, Ulrich Engel consigned the recordings to the German Speech Archive (DSAv), where they were copied over. The recordings have been digitized in the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). Three recordings are transcribed. A list of topics was created based on the metadata. The sound recordings and transcripts of the corpus BB-- that are archived by the AGD are made available through the Database for Spoken German (DGD). Individual sound recordings and transcripts can also be provided for download or on physical media via the personal service of the AGD.

73 Events 73 Speech events 193 Speakers
...
BETV
Belgian TV debates

The recordings of the corpus "Belgian TV debates" were handed to the Archive for Spoken German in November 2016 by Prof. Dr. Kurt Feyaerts (KU Leuven) who had previously obtained permission from the Belgische Rundfunk broadcasting station to use them for his own research. The corpus consists of ten one hour video recordings of pre-election debates, which were televised by the TV of the German speaking community (DG) before municipal elections in 2012. All of the German speaking municipalities (Amel, Büllingen, Burg Reuland, Bütgenbach, Eupen, Kelmis, Lontzen, Raeren, Sankt Vith) are represented in this series. In addition, one debate with German speaking candidates for the provincial council (province of Lüttich/Liège) is included. For all 10 recordings, transcripts were generated using automatic speech recognition.

10 Events 10 Speech events 46 Speakers
...
BR
Biographic and Travel Narratives

The corpus Biographic and Travel Narratives (BR--) comprises 7 sound recordings from the period 1985 to 1990 with a total duration of 5 hours and 30 minutes. They are recordings of narratives of, and interviews with, 24 mostly young women and men from East Germany, Poland and Czechoslovakia. The recordings were made in the German Democratic Republic [then East Germany] under the direction of Katharina Meng (Central Institute for Linguistics of the Academy of Sciences of the GDR). Some recordings were made before and others after the Fall of the Berlin Wall and German Reunification. The recordings that were made prior to 1989 concern reports about travel within the GDR and to the so-called sister countries Czechoslovakia, Poland, and Hungary. The narratives and interviews recorded in 1989 and 1990 contain reports of personal recollections about the period surrounding German reuinification in the GDR; recollections about first trips taken to West Germany as well as about German Reunification on October 3, 1990; and also further retrospective material about experiences with politics in the GDR. The audio recordings (in copy), transcripts and further materials were given by Katerina Meng to the Archive for Spoken German (AGD) [successor of the erstwhile German Speech Archive], where they were digitized. 7 transcripts of different types are archived at AGD. These transcripts were remastered by AGD so as to provide orthographic transcription, additional notations, lemmatization, and POS-tagging. In addition, the transcripts were aligned to the audio recordings. The version of the corpus BR-- that is archived at IDS is made available via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

7 Events 7 Speech events 24 Speakers
...
BW
Berlin Wende Corpus

The Berlin Wende Corpus (BW--) was compiled in the project "Kollektives Gedächtnis – sozialer und sprachlicher Wandel in der Nachwendezeit" ("Collective memory - social and linguistic change after the peaceful revolution") at the Institute for German and Dutch philology at the Freie Universität Berlin. Norbert Dittmar was the principal investigator. The project's aim was to document the social upheaval after the fall of the Berlin wall as a collection of individual and group specific experiences. The focus of the investigation were narrations of East and West Berliners about the fall of the wall and about individual, social and economic aspects of daily life between 1992 and 1995. The interviews were conducted by participants of a continuing education course for elementary school teachers from East Berlin in their respective circles of friends and acquaintances. Typical questions were "How did you experience the fall of the wall?" - "How are you doing today, X years later?". Often, experiences specific to East and West Berlin are shared and worked on together. The recordings document the 'typical' Berlin variety of the Eastern part of the city (at the end of the GDR era) and colloquial language with moderate Berlin influence (of West German speakers who moved to Berlin during the time of the "wall"). Urban spoken language patterns coexist with supraregional colloquial constructions. Among many other research questions, the corpus lends itself especially to the description of discoursive contrasts with which two groups in social conflict (stereotypes "Ossis" vs. "Wessis") perceive themselves during a time of upheaval and crisis. Speakers of both groups contextualize their social identities in all discourses with many variants. The corpus archived at the IDS consists of 50 audio recordings from the time between 1992 and 1996 with a total duration of 26 hours and 15 minutes. 30 speakers from East Berlin and 26 speakers from West Berlin (women and men), aged between 19 and 55 years, were interviewed. The recordings were digitised by the Archive for Spoken German. Transcription for all recordings (literary transcription with prosodic annotation, orthographic normalisation, lemmatization and POS tagging), aligned with the recordings, are available. The corpus BW is made available via the Database for Spoken German (DGD), individual recordings can also be ordered through the archive's personal service.

50 Events 50 Speech events 56 Speakers
...
DH
German today

The corpus "German today" (DH-- ) was recorded in the years 2006-2009 supported by the SAW third-party funds of the Leibniz-Association within the project "Variation in spoken German" (Principal Investigator: Nina Berend). The recordings took place in 194 towns in the entire area where German is official language and the language of instruction (Germany, Austria, Switzerland, South Tyrol, Luxemburg, East Belgium, Liechtenstein). Most of the recorded participants were pupils in senior classes in secondary schools (late teenagers), while a smaller portion of participants were aged between 50 and 60. Alltogether there were 671 pupils (usually four per recording place, with balanced numbers according to sex), as well as 155 persons from the mid-aged generation (mostly two, rarely one person per recording place, again with balanced numbers according to sex.). For each participant there are approximately 90 minutes of recorded speeech. One half of each recording consists of reading tasks, including a wordlist of 1000 words, texts (the fable "The North Wind and The Sun", a text of popular science, and constructed sentences) as well as picture naming and a translation task (English-German). The other half consists of a speech-biographic interview (approx. 30 minutes long) and a MapTask (approx. 15 minutes long), where two pupils interact with each other using speech. The corpus data are currently analysed in the project "Spoken German" in terms of regional pronunciation variation. Since 2011, the results are continuously published online in the Atlas of everyday standard German pronunciation (http://prowiki.ids-mannheim.de/bin/view/AADG/).

249 Events 6988 Speech events 831 Speakers
...
© Christian Zimmer
DNAM
German in Namibia

The Corpus German in Namibia (DNam) was collected within the project "Namdeutsch: Die Dynamik des Deutschen im mehrsprachigen Kontext Namibias" ["Namdeutsch: The Dynamics of German in the Multilingual Context of Namibia"] (2016 - 2020), which was jointly run by the University of Potsdam, Humboldt University of Berlin (HU), the Free University of Berlin (FU) and the University of Namibia, Windhoek. The project's leaders were Heike Wiese (Potsdam, HU) and Horst Simon (FU). Their cooperation partners were Marianne Zappen-Thomson (Windhoek) and Hans Boas (Austin). Research staff members were Christian Zimmer, Janosch Leugner, Laura Perlitz, Yannic Bracke and Britta Stuhl. The corpus documents the language use and language attitudes of the German-speaking minority in Namibia. The recordings were made in July/August as well as November 2017 in classrooms and boarding schools of partly German-speaking schools, on farms, in private buildings and public spaces in Namibia. The corpus contains 227 recordings featuring 110 participating speakers. Its total length is 18 hours and 39 minutes. The recordings were made in three different set-ups: semi-structured interviews(on language biography, attitudes, perceptual dialectological aspects etc.), free conversations (involving two to five people in the absence of the researchers) and "language situations" (simulations of a formal or informal communication situation). The interviews comprise seven recordings with 15 participating speakers, running for a total length of 4 hours and 42 minutes. The free conversations cover 22 recordings with 65 participating speakers, with a total length of 9 hours and 15 minutes, The "speech situations" include 198 recordings with 103 participating speakers, for a total length of 4 hours and 42 minutes. All recordings are available as audio files and in transcribed form. The transcriptions were made according to the cGAT conventions and using the Partitur-editor that is part of the EXMARaLDA tool suite. The transcripts are provided with four levels of annotation: orthographic normalization, lemmatization, part-of-speech tagging (following STTS 2.0), identification of contact language tokens. The speakers participating in the corpus are young people aged 14 to 18 and adults aged 26 to 75. They are first language speakers of German who were born in Namibia or who had (in some few cases) immigrated to Namibia in early childhood. For further information on the speakers, numerous items of metadata are available, which were collected by questionnaire. Some of the metadata categories can be used as filters (year of birth, gender, place of birth, etc.) when querying the corpus via the DGD. The remaining metadata (e.g. language biographical information) is available as additional material to the corpus in the form of a table. Beyond the availability of the corpus in the Database for Spoken German (DGD), individual sample audio recordings are available for download. Further information can be found in the following publication. Please quote this article if you use data from the DNam corpus: Zimmer, Christian, Heike Wiese, Horst J. Simon, Marianne Zappen-Thomson, Yannic Bracke, Britta Stuhl & Thomas Schmidt. (2020): Das Korpus Deutsch in Namibia (DNam): Eine Ressource für die Kontakt-, Variations- und Soziolinguistik. [The corpus German in Namibia (DNam): A resource for contact, variation and sociolinguistics.] [On the Internet at: www.geisteswissenschaften.fu-berlin.de/v/namdeutsch/Publikationen/]

179 Events 227 Speech events 117 Speakers
...
DR
German Dialects: GDR

The corpus DR-- was created by staff of the Institute for German Language and Literature of the Academy of Sciences of the GDR. The project leader was Hans-Joachim Schädlich. Following the recording campaign of the then Deutsches Spracharchiv (DSAv) (cf. Korpus ZW--, Deutsche Mundarten: Zwirner-Korpus), samples of the dialects and colloquial speech in the GDR were to be recorded and a body of material to be compiled according to uniform criteria. A cooperation of the Berlin Academy with the DSAv and Eberhard Zwirner did not materialize, Zwirner's efforts to collect recordings in the GDR himself were not successful (see also Ehlers, Klaas-Hinrich (2022): The "Tape Recordings of German Dialects" in the Context of the (Low German) Dialectology of the GDR. (= IDSopen 3). Mannheim: IDS-Verlag ). The regional dictionary offices in Leipzig, Berlin, Rostock, Greifswald and Jena, which were subordinate to the Academy of Sciences, carried out the recordings between 1960 and 1964 in their respective areas; in 1966 and 1968 there were supplements and special surveys. For the technical realization, a recording car of German Democratic Broadcasting, the GDR's state broadcaster, was available at times, along with a sound engineer. But for the most part, the Academy's own recording equipment was used. The technical recording management was in the hands of Heinrich Eras from 1962 on. The DSAv's grid square division and the recording of speakers from three different generations were adopted from the ZW-- corpus. In the DR-- corpus, the speakers were prepared for the recording several days in advance. Refugees, expellees and resettlers were not officially included. Nevertheless, in the working area of the Pomeranian dictionary, speakers with birthplace in Hinterpommern were included. In 6 places of Saxon Upper Lusatia, speakers with sufficient knowledge of Sorbian were additionally included with the same content in Sorbian. Further information about the project is published in: Hans-Joachim Schädlich, Heinrich Eras (1965): Bericht über die Tonbandaufnahmen der deutschen Mundarten in der Deutschen Demokratischen Republik. In: Berichte über dialektologische Forschungen in der Deutschen Demokratischen Republik. Berlin, pp. 24-27. The corpus DR-- comprises 1642 sound recordings made in 440 places in the GDR from 1960 to 1968 with a total duration of 385 hours and 13 minutes. These are recordings of narrations, conversations and standard texts (comparative texts, word lists) with 1580 speakers from the GDR and former German eastern territories. The Comparison Text I, developed in Leipzig based on the linguistic material of the Wenker sentences, was elicited in the entire recording area. In addition, regionally further comparison texts were elicited as well as word lists by the regional dictionary offices. In reading the comparison texts, the speakers could proceed freely in the choice of words and word order. In many cases written transcriptions of the texts were made, occasionally also of parts of the narratives. A systematic transcription of the narratives was not undertaken by the Berlin Academy. The recordings were transferred to the IDS in Mannheim in 1992, where they were digitized at the Archive for Spoken German (AGD) (formerly the German Language Archive). In cooperation between the MPI for Evolutionary Anthropology (Leipzig) and the Archive for Spoken German (AGD), 117 transcripts (standard orthographic transcription with punctuation according to orthography, explanatory comments, partial prosodic annotation, annotation of incomplete words) were produced in 2013-2016 for recordings from the Central German-speaking area. These transcripts are archived in the AGD. In the meantime, 9 more transcripts from the Low German language area have been produced, partly in literary transcription, resulting now in a total of 126 available transcripts. Based on the metadata, a list of topics, a list of linguistic peculiarities, and a list of the speakers' occupations have been produced. There are also written versions of the comparative texts and the word lists. The corpus DR-- is made available via the Database for Spoken German (DGD). Individual sound recordings can also be obtained through the personal service of the AGD.

444 Events 1642 Speech events 1580 Speakers
...
DS
Dialog Structures

The Dialogue Structures Corpus (DS--) was created within a joint project involving the following institutions: IDS Research Center Freiburg; University of Freiburg, German Seminar; University of Giessen, Department of Psychology. The project leader was Hugo Steger. The project continued on with questions of spoken language research, as they had arisen, for example, in the work of the project "Basic Structures of the German Language", from which the corpus "Grundstrukturen: Freiburger Korpus" ["Basic Structures: Freiburg Corpus"] (FR--) had emerged. By analyzing the organization of natural dialogues, regularities and rules of conversational organization should now be described for individual dialogues and generalized for dialogue types. On a trial basis, it should also be clarified to which proportions and with which functions nonverbal behavioral elements are used in communication. Further project information is published in: Franz-Josef Berens, Karl-Heinz Jäger, Gerd Schank, Johannes Schwitalla (1976): Projekt Dialogstrukturen. A work report. Heutiges Deutsch I/12. Munich: Hueber. The corpus DS-- comprises 72 transcribed recordings. 70 of these are audio recordings from the period 1960 to 1977 with a total duration of 15 hours and 18 minutes from various sources. These 70 recordings are accessible externally. 27 recordings were taken from the corpus Basic Structures: Freiburg Corpus (FR--) and newly transcribed according to project-specific conventions. A further 11 recordings also originate from the inventory created for the corpus FR-- but were transcribed for the first time for the corpus DS--. For the remaining 34 recordings, an additional stock of video recordings was created from 1974 to 1977 (recordings of television broadcasts and recordings by the project participants themselves). Of these recordings, only the audio tracks were used for the corpus DS--. The recordings in DS-- involve 152 speakers (women and men) of the standard language or of colloquial language close to the standard in public and non-public communication. The recordings cover speech events of various kinds (registration, questioning, consultation, discussion, explanation, interview, examination, conversation, appointment). Some of them took place in the context of radio broadcasts. The recordings were digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). 70 digital transcripts (with orthographic wording, additional notation, lemmatization, POS tagging) are held by the Archiv. These transcripts were synchronized (aligned) with the audio by the AGD. Based on the metadata, a list of topics and a list of the speaker's professions were created. Also available word and lemma lists ordered alphabetically and by frequency. The corpus DS-- is provided online via the Database for Spoken German (DGD). Individual sound recordings can also be obtained for download or on physical media through the personal service of the AGD.

70 Events 70 Speech events 152 Speakers
...
DTRK
Deutsch von Türkeirückkehrern

The corpus "Deutsch von Türkeirückkehrern" (DTRK) was developed within an internally funded project conducted by the Department of German Language and Literature at the Faculty of Natural Sciences and Philosophy of Marmara University in Instanbul, with support by the Archiv für Gesprochenes Deutsch. The project's manager was Serap Devran. The data was mainly collected through so-called "autobiographical-narrative interviews", supplemented by ethnographic-historical material, if necessary. The aim of the project was the biographical and interactional analysis of the migration experiences of students of German Studies at Marmara University in Istanbul, who had been born in Germany or come to Germany as infants and grown up there, and who had passed through the German education system, either completely or in part. The focus of the project lay on the different life stories, social and linguistic experiences of the students in Germany and in Turkey. The aim was to analyze the linguistic means that the informants used to represent, evaluate and phrase the experiences and events in their old and new environments. These are linguistic actions and communicative practices with which the narrators represent themselves, their narrated self and other persons of their lived history in order to position them as socially determinable persons.

12 Events 12 Speech events 13 Speakers
...
EK
Elicited Conflict Talk between Mothers and their Adolescent Daughers

Das Korpus Elizitierte Konfliktgespräche zwischen Müttern und jugendlichen Töchtern (EK--) wurde im Teilprojekt C2: Argumente in Konfliktgesprächen zwischen Eltern und Jugendlichen des Sonderforschungsbereichs 245: Sprechen und Sprachverstehen im sozialen Kontext (Mannheim, Heidelberg) erstellt. Projektleiter war Manfred Hofer. Im ersten Projektabschnitt richtete sich die Arbeit v.a. auf eine Beschreibung der in Konfliktgesprächen zwischen Müttern und ihren jugendlichen Töchtern vorgebrachten Argumente und deren Verläufe im Gespräch. Dazu wurde ein integriertes sprachpsychologisches und linguistisches Kategoriensystem zur Klassifikation von Gesprächen entwickelt. Mithilfe dieses Systems war es möglich, Unterschiede in der Häufigkeit des Auftretens einzelner Argumentationselemente zwischen Müttern und Töchtern zu identifizieren und unterschiedliche "Niveaus" des Argumentierens und deren Abhängigkeit vom Alter der Töchter festzustellen sowie eine Typologie von Gesamtgesprächen zu erstellen. In den weiteren zwei Jahren standen erklärende Fragen im Mittelpunkt. Zum einen wurden sequenziell argumentative Abhängigkeiten im Gesprächsverlauf ermittelt, zum anderen wurde der Zusammenhang zwischen situativen Absichten (motivationalen Tendenzen) der beiden Partner und deren Vorbringen von Argumenten im Gespräch untersucht. Weitere Projektinformationen sind enthalten in: Manfred Hofer, Birgit Pikowsky, Thomas Spranz-Fogasy (1992): Projekt "Argumente in Konfliktgesprächen zwischen Eltern und Jugendlichen." Abschlussbericht an die Deutsche Forschungsgemeinschaft. Das Korpus EK-- umfasst 138 Tonaufnahmen mit 214 Sprecherinnen aus den Jahren 1988 und 1990 mit einer Gesamtdauer von 12 Stunden und 23 Minuten. Die Mütter waren zum Zeitpunkt der Aufnahmen zwischen 31 und 58 Jahren alt, die Töchter zwischen 12 und 24 Jahren. Die Aufnahmen wurden dem Archiv für Gesprochenes Deutsch (AGD) (früher: Deutsches Spracharchiv) vom Projekt in Kopie zur Verfügung gestellt und dort digitalisiert. 138 Transkripte (Wortlaut orthographisch, zusätzlichen Notationen) sind archiviert. Anhand der Metadaten wurde eine Themenliste erstellt. Das Korpus EK-- wird in der Datenbank für Gesprochenes Deutsch (DGD) bereitgestellt, einzelne Tonaufnahmen können auch im persönlichen Service AGD weitergegeben werden.

107 Events 138 Speech events 0 Speakers
...
FEGB
Flucht und Emigration nach Großbritannien

Das Korpus "Flucht und Emigration nach Großbritannien (FEGB)" wurde im Rahmen eines Forschungsaufenthalts in Großbritannien durch Eva-Maria Thüne (Universität Bologna, Italien) erstellt, die von Januar bis Juli 2017 Fellow in Cambridge (Clare Hall) war. Dort und im Großraum London hat sie die meisten der Interviews über Sprache und kulturelle Identität der Emigranten 70 bis 80 Jahre nach ihrer Immigration nach Großbritannien aufgenommen. Das gesammelte Korpus schließt konzeptionell und methodisch an das Korpus „Emigrantendeutsch in Israel (IS)“ an. Es wurden 42 narrative Interviews mit meistens jüdischen MigrantInnen in Großbritannien geführt, die in den 1930er Jahren aus Nazi-Deutschland, Österreich und der damaligen Tschechoslowakei emigriert waren. Der Großteil der Interviewten war mit dem „Kindertransport“ emigriert, der Rest hatte in den 1930er Jahren als Kinder oder Jugendliche unabhängig vom Kindertransport in Großbritannien Zuflucht gefunden. Außerdem wurden 10 Männer und Frauen der 2. Generation auf Englisch nach ihren Erfahrungen befragt. Diese Interviews fanden auf Englisch statt und sind nicht im FEGB dokumentiert. Die Gespräche mit den Emigranten wurden meist in den Privatwohnungen der Interviewten geführt und sind unterschiedlich lang (zwischen 45 Minuten bis mehr als 2 Stunden).Sie sind dialogisch angelegt, enthalten aber auch längere monologische Teile. Es ging in den Gesprächen hauptsächlich um Fragen des Sprachwechsels, Spracherwerbs, der Spracherhaltung und Sprachtradition in der Familie, doch wurde auch vieles angesprochen, was weit über diesen Themenbereich hinausreichte. Gemeinsame Themen sind Kindheitserinnerungen, Erfahrungen von Antisemitismus, Flucht/Emigration, Neuanfang, kulturelle Umorientierung, Kontakt und Reisen in deutschsprachige Länder nach dem 2. Weltkrieg. Dazu kommen viele individuelle Berichte. Von den Interviewten sprechen einige Standarddeutsch, die meisten aber Deutsch mit mehr oder weniger starkem Einfluss des Englischen, wobei es auch häufig zu Code-Switching und -mixing kommt. 2019 wurde anlässlich des 80. Jahrestags des 1. Kindertransports im Jahr 1938 ein Teil der Gespräche in dem Buch „Gerettet. Berichte von Kindertransport und Migration nach Großbritannien“ veröffentlicht (s.u.). Weitere Projektinformationen befinden sich auf der Webseite zu diesem Projekt und dem Buch dazu: http://www.gerettet2019.wordpress.com. Die Gesamtdauer der Interviews beträgt 3876 Minuten, d.h. ca. 64 Stunden (2189 Minuten mit ehemaligen Kindern des Kindertransports und 1687 Minuten mit Nicht-Kindertransport Personen). Zu den im AGD zur Verfügung stehenden Interviews gibt es Wortlauttranskripte. Außerdem gibt es Originalfragebögen.

37 Events 37 Speech events 37 Speakers
...
FGOP
Fluchtgeschichten aus Ostpreußen

Das Korpus besteht aus narrativen autobiographischen Interviews (Interviewerin: Lucia Cinato, Universität Turin), mit drei Überlebenden aus Ostpreußen - heute Polen und Russland -, die von Ereignissen der letzten Monate des Zweiten Weltkriegs berichten. Alle drei gehören zur gleichen Familie, haben aber diametral entgegengesetzte traumatische Erfahrungen erlebt, die emblematisch für diese Zeit sind. Auf der einen Seite steht hier die Flucht mit einer verheerenden Anzahl von Todesopfern, die aus extremen Wetterbedingungen, der Verfolgung durch Panzer und russischen Soldaten sowie Bombenangriffen (zum Beispiel in der Lagune der Frischen Nehrung) resultierten. Auf der anderen Seite steht die Unmöglichkeit der Flucht, mit der daraus resultierenden Angst vor Rache und Vergeltung, zunächst durch russische Soldaten und später durch polnische Einheiten, die die Teile Deutschlands östlich von Oder und Neiße (außer dem nördlichen Ostpreußen) als Entschädigung für die durch den Hitler-Stalin-Pakt erlittenen Gebietsverluste im Osten Polens in Besitz nahmen, wie es in den Konferenzen in Teheran und Jalta festgelegt worden war. Die Befragten durften ihre Geschichten frei erzählen und ihre Erinnerungen wurden mit direkten Fragen und Präzisierungen auf den Zeitraum zwischen Januar 1945 und April 1956 fokussiert, die Periode, während der die drei Geschwister von einander getrennt waren. Die narrativen Interviews dauern zwischen 12 Minuten und 3 Stunden und wurden verteilt über mehrere Sitzungen durchgeführt. Die Befragten waren während der Aufnahmen jeweils 86 (Otto) und 81 Jahre alt (Hedwig und Gertrud, Zwillinge). In diese Aufnahmen greifen auch Mitglieder aus dem nahen Familienkreis ein, die bei den Interviews anwesend waren, um die Geschichte inhaltlich zu vervollständigen. Die Dimension der Erzählung in den Interviews berücksichtigt keine bestimmte zeitlich-räumliche Reihenfolge. Sie wird von mentalen Assoziationen bestimmt, die allmählich dank der Interaktion zwischen den anwesenden Personen entstehen. Aus den Interviews gehen auch allgemeinere Daten hervor: die geografisch-politische Situation einiger Orte und Territorien in der Vor- und Nachkriegszeit, die Bedingungen der Flucht Richtung Westen und die Versenkung der Wilhelm Gustloff, die im Januar 1945 von einem sowjetischen U-Boot versenkt wurde. Die Aufnahmen haben das Ziel, bestimmte Wege der Familiengeschichte zu rekonstruieren. Sie bezeugen psychologische Schwierigkeiten, diese Leidensgeschichten zu erzählen, die jahrelang verschwiegen wurden, auch wenn, wie in diesem Fall, die damaligen Protagonisten noch Kinder oder Jugendliche waren. Der Zweck des Projekts und der Analyse der Aufnahmen besteht darin, neben der oben schon erwähnten Rückverfolgung des Familiengedächtnisses der Befragten auch die Verwendung der Sprache für mündliche Erzählungen aus verschiedenen thematischen und sprachbezogenen Forschungsperspektiven zu untersuchen. Zu letzteren gehören die Analyse typischer Strukturen gesprochener Sprache, die Sprache der Strukturierung des erzählten Raums sowie die Sprache des emotionalen Ausdrucks. Diese Geschichten eignen sich auch für die Analyse von Identitätskonstruktion, d.h. die Suche nach der eigenen Identität durch Erzählung.

4 Events 7 Speech events 9 Speakers
...
FOLK
Research and Teaching Corpus of Spoken German

The Research and Teaching Corpus of Spoken German (FOLK) is being built up in the pragmatics department of the IDS Mannheim since 2008. FOLK primarily addresses researchers, teachers and students in conversion analysis, corpus linguistics and related fields. The overall aim of the project is to provide to the scientific community a corpus of interactions in German speaking countries which covers a maximally broad spectrum of interaction types. To this end, audio and video recordings of verbal interactions in different private (e.g. table talk, game interactions), institutional (e.g. classroom discourse, professional communication) and public (e.g. panel discussions, public arbitrations) contexts are made. Additional stratification parameters like regional provenance, age or education level of speakers are taken into account in corpus compilation in order to enable the creation of virtual subcorpora which are balanced with respect to these parameters. Using the editor FOLKER, recordings are transcribed in modified orthography ('literal transcription') according to the cGAT conventions for minimal transcripts. The transcripts are time-aligned in segments no longer than 5 seconds. To optimize searchability of the corpus, three annotation levels are added to the literary transcription: an orthographic normalisation, a lemmatisation and a part-of-speech tagging according to a version of the Stuttgart-Tübingen-Tagset (STTS) optimized for interaction data. FOLK comprises data collected by the project itself as well as data from external collaborators. The current version of FOLK (version 2.20 from June 2023) comprises audio and video recordings, transcripts and metadata of 414 interactions with 1,317 documented speakers. The overall duration of the recordings is 347 hours and 6 minutes. The transcripts amount to 3,301,696 verbal tokens. Out of the 414 interactions, 142 were recorded on videos with a total duration of 156 hours and 57 minutes; the remaining 272 interactions (190 hours and 5 minutes) were recorded on audio only. The corpus also contains relevant additional materials: information on interaction setting and course of events, word lists ordered alphabetically and by frequency, transcription conventions, documentation of metadata systematics and further interaction specific materials. The corpus is extended continuously. New data are published via the Database for Spoken German (DGD) in regular intervals. For further information, see also the project's website at http://agd.ids-mannheim.de/folk.shtml. Instructions on proper citation of the FOLK corpus are provided via the Help menu of the DGD.

414 Events 414 Speech events 1317 Speakers
...
FR
Basic Structures: Freiburg Corpus

The corpus "Basic Structures: Freiburg Corpus" (FR--) was created by the former Freiburg-based Research Unit of the IDS. The project's leader was Hugo Steger. Within the framework of the project "Basic Structures of the German Language", the Freiburg Research Unit had the task of describing grammatical and stylistic features of the spoken standard language in order to lay new linguistic foundations for the teaching of German as a foreign language, similar to the "Grunddeutsch-Projekt" (Basic German Project) of the Institute for Basic German (see Pfeffer-Korpus PF--), which, however, focused on studies of the word inventory. The FR-- project team considered the already existing linguistic corpus collections on a dialectal (Zwirner-Corpus ZW--) or colloquial basis (Pfeffer-Corpus PF--) as unsuitable for their investigations because of their limitation to a single, elicited type of communication. Therefore, between 1966 and 1974, the project created its own extensive sound archive of more than 800 recordings for this purpose (consisting of the research center's own recordings, recordings of television and radio broadcasts, in part also of older recordings from the archives of broadcasting companies as well as material provided by cooperation partners) and produced transcripts amounting to approximately 500,000 words. These materials were used to carry out grammatical and stylistic analyses of the peculiarities of spoken language, which, among other things, were also intended to enable statements about the connection between speech constellations and the use of specific means of expression. The areas of subjunctive and mood, passive, future and present tenses, past tenses, morphology and word length were studied. Further project information is published in: Gesprochene Sprache. Bericht der Forschungsstelle Freiburg [Spoken Language. Report of the Freiburg Research Centre] (Forschungsberichte des Instituts für deutsche Sprache 7) 2. Auflage, 1975, Tübingen: Narr. The FR-- corpus comprises 222 sound transcribed recordings from the period 1955 to 1974 with a total duration of 68 hours and 6 minutes. These are recordings with 812 speakers (women and men) of the standard language or of colloquial language close to the standard in public and non-public communication. Speech events of various kinds were recorded, among them consultations, reports, meetings, discussions, explanations, narrations, interviews, sermons, press conferences, conversations, lectures). Some of these took place in the context of radio broadcasts. Three recordings were taken from the Pfeffer corpus and re-transcribed. The recordings were digitized at the Archive for Spoken German (AGD) (formerly: Deutsches Spracharchiv). 221 digital transcripts (with orthographic wording, additional notations, partly intonation notations, lemmatization, POS tagging) are held by the AGD. The transcripts were also synchronized (aligned) with the audio in the AGD. Based on the metadata, a list of topics and a list of the speaker's professions were created. Also available are also word and lemma lists ordered alphabetically and by frequency. The corpus FR-- is available online via the Database for Spoken German (DGD). Individual sound recordings can also be obtained for download or on physical media through the personal service of the AGD.

222 Events 222 Speech events 812 Speakers
...
© Christian Zimmer
GDSA
Spoken German in Southern Africa

The project "Spoken German in Southern Africa" aimed at systematically documenting the varieties of spoken German in Namibia and South Africa, making them accessible via the Internet and describing them using examples. Towards this end, sound recordings were made in Namibia and South Africa of German speakers producing various text types and interacting in various communicative settings. The recordings are archived in the Language Archive of the Institute for the German Language (IDS) and made accessible in the archive's database. During an exploratory and networking trip in April 2005, an initial set of sound recordings was made in Namibia as a pre-test. Most of the material was collected during recording trips in South Africa in February/March 2012 and in Namibia in February/March 2013. The recording locations in South Africa are various places in KwaZulu-Natal, Mpumalanga, Gauteng, Western Cape. The content of the recordings includes biographical information, various language data questionnaires, as well as the reading aloud of texts and lists of words and sentences. The materials used for reading aloud were "Nordwind und Sonne", Wenkersätze, the "Deutsch heute" word list, "Deutsch heute" sentence list, "Niederdeutsche Phonologie" word list and "Niederdeutsche Phonologie" sentence list, Niederdeutscher Wortatlas. (These materials are available as additional materials with the exception of the "Niederdeutsche Phonologie" word list and "Niederdeutsche Phonologie" sentence list). The recording locations in Namibia are various places, regionally scattered. The recordings consist of biographical information and several language data questionnaires: "Nordwind und Sonne", Wenkersätze, "Deutsch heute" word list and "Deutsch heute" sentence list. A total of 66 different speakers were recorded. As requested, the material of one speaker who was a minor at the time of recording will not be made publicly available. Accordingly,65 speakers are represented in the GDSA corpus as presented in the database for Spoken German .

65 Events 155 Speech events 75 Speakers
...
GWSS
Spoken Academic Language

GeWiss is a research project in spoken academic language. It provides a multilingual (German/English/Polish/Italian) corpus of audio recordings and transcriptions of academic communications, as an empirical foundation for comparative research. To this end, the GeWiss corpus focusses on two main genres of spoken adademic language: talks including discussions, and oral exams, and it explicitly distinguishes between L1 and L2 subcorpora. The corpus is enlarged and developed continuously.

417 Events 436 Speech events 733 Speakers
...
HL
German Standard Pronunciation

The corpus Deutsche Hochlautung (HL--) was created at the Deutsches Spracharchiv (DSAv) as part of the project "Hochlautung" (Standard pronuniation). The project leaders were Gerold Ungeheuer, Werner Besch and Edeltraud Knetschke. So that the project could investigate pronunciation in standard German, the DSAv was provided by ARD and ZDF and the Press and Information Office of the German Federal Government with 186 recordings (sound recordings only) made in a public capacity by 14 television journalists (newscasters, political and business journalists, reporters, foreign correspondents) and three spokespersons for the German Federal Government. The version of the corpus HL-- accessible to external parties includes 27 transcribed audio recordings (some in versions of varying length, not all audio recordings were transcribed in their entirety) from 1971 to 1975 with a total duration of 1 hour and 57 minutes. These are recordings of 9 television journalists, newscasters, and government spokespersons (women and men) in speech events of various types (including news reading, report, discussion piece, interview, commentary, moderation, reportage, statement) that took place during radio broadcasts and press conferences. The recordings were digitized at the Archive for Spoken German (AGD) (formerly the German Language Archive). The transcripts of corpus HL-- are published in: Edeltraud Knetschke, Margret Sperlbaum (1987): Zur Orthoepie der Plosiva in der deutschen Hochsprache. Phonai vol. 33; Tübingen: Niemeyer. The associated audio recordings were released by ARD and ZDF for use by the project. Also archived are 27 transcript versions (orthographic transcription, lemmatization, POS tagging) digitized at AGD and synchronized (aligned) with the audio, as well as word and lemma lists arranged alphabetically and by frequency. The components of the corpus HL-- accessible to external users are made available in the Database of Spoken German (DGD); individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

27 Events 27 Speech events 9 Speakers
...
HMAT
Hamburg Maptask Corpus (HaMaTaC)

The Hamburg Map Task Corpus was created between October 2009 and September 2010 in the project Z2 ‚Computer assisted methods for the creation and analysis of multilingual data’ of the Research Centre on Multilingualism at the University of Hamburg. In June 2013, the corpus was extended with two recordings on video. The main motivation for creating the corpus was to provide a set of data for testing and demonstrating the capabilities of the EXMARaLDA system, in particular with respect to annotation and data sharing. The map task designed for the corpus "Deutsch Heute" was chosen as the basic experiment for the corpus. The map task was performed by 25 learners of German with varying proficiency and on L1 speaker. The speakers’ L1 cover a broad spectrum of languages, including Romance languages (French, Galician, Spanish), Slavic languages (Russian, Polish, Bulgarian), Iranian languages (Farsi/Dari) and diverse languages from Non-Indo-European families (Turkish, Arabic, Chinese, Japanese, Thai, Vietnamese). Since speakers were selected and contacted by student assistants in the project, most of them are between 17 and 40 years old and have a higher education. Earlier versions of the corpus were archived at the Hamburger Zentrum für Sprachkorpora and are also available via the Zentrum für Nachhaltiges Forschungsdatenmanagement at the University of Hamburg

26 Events 26 Speech events 28 Speakers
...
HMOT
Hamburg Modern Times Corpus (HaMoTiC)

The Hamburg Modern Times Corpus (HaMoTiC) consists of audio recordings of retellings of an excerpt of the Charlie Chaplin movie "Modern Times". The task is the same as the one used for the ESF corpus (Klein/Predue). 24 L2 speakers of German with various L1 and varying competence in German were recorded. The 25th recording is a reference recording with a L1 speaker of German. Additionally, an interview on the language acquisition biography was done with each speakers. These interviews are archived, but are not available via the DGD. Earlier versions of the corpus were archived at the Hamburger Zentrum für Sprachkorpora and are also available via the Zentrum für Nachhaltiges Forschungsdatenmanagement at the University of Hamburg

25 Events 25 Speech events 29 Speakers
...
IPER
Interaction profile and personality

The corpus was collected in the context of the project "Interaction Profile and Personality" in the year 2020 at the Lucerne University of Applied Sciences and Arts. Project Leader was Sylvia Bendel Larcher. The project was funded by the Swiss National Science Foundation and the Departement of Economics of the Lucerne University of Applied Sciences and Arts. The aim of the project was to test if there is a correlation between the interaction profile of a person and her personality. For this purpose the test persons, who were recorded, also filled in a personality questionnaire following the HEXACO personality inventory. The corpus comprises six group interactions of 15 minutes, each consisting of four students in business administration accomplishing a group task. The students had to analyse a transcript with a customer complaint and had to formulate recommendations for the receptionist. The students speak Swiss german, most of them a dialect from central Switzerland, some students represent the areas of Zurich, Solothurn, Berne and Freiburg. The transcripts and the audio recordings are available at the Database for Spoken German (DGD). The transcripts were further annotated for the specific purposes of the project. These annotations comprise the type of turn-taking, the interactional moves and several stylistic features; these annotations are not available in the DGD. The contributions of the person handling the recording equipment are not transcribed.

6 Events 6 Speech events 24 Speakers
...
© Anne Betten
IS
Emigrant German in Israel

Most of the corpus „Emigrant German in Israel“ (IS--) was collected between 1989 and 1994 in the framework of a DFG (German Research Foundation)-project on the language and cultural identity of German speaking Jewish emigrants 50 to 60 years after their immigration to Palestine/Israel. The project leader was Anne Betten (until 1995 University of Eichstätt, then University of Salzburg), other interviewers were Kristine Hecker (University of Bologna), Miryam Du-nour (Jerusalem and Bar Ilan University), and Eva Eylon (Tel Aviv, also an interviewee herself). The interview partners stem from many German speaking areas of Central Europe, but most of the interviewees speak standard German, with at most slight regional touches. Most of the interviews were conducted in the private homes of the interviewees. The narrative-autobiographical interviews often show a tendency towards monologue, but they also include lively dialogic parts. Persistent topics are childhood and youth in Central Europe, experiences of antisemitism, flight/emigration, new beginnings, cultural breaks and reorientation; they are supplemented by a lot of individual reports. In the first working period the focus lay on syntactic-stylistic studies of the “Bildungsbürgerdeutsch” (the German of the educated classes) of the interviewees which most of them used even in spontaneous oral communication, and in addition on sociolinguistic studies of the variables that influenced their linguistic competence in German and in their second languages Hebrew and English, as well as on the forms and functions of code-switching. This first phase of analyses was followed by a large number of predominantly conversation-analytical studies on language and identity, metaphorization, emotion and interaction. After 2000, this collection was supplemented by a video interview with a new interviewee, some video recordings of meetings with former interviewees, supplementary thematic interviews (e.g. on the subject of childhood by Michaela Metz / University of Salzburg, 20 years after the first interviews), some more first recordings with new interviewees (conducted by Johannes Schwitalla / University of Würzburg and Michaela Metz) and a panel discussion with participants from all three Israel Corpora (IS, ISW, ISZ) moderated by Anne Betten. The version of the corpus IS-- archived at the IDS comprises 188 audio and video recordings with 185 speakers, collected in the period from 1989 to 2019, with a total duration of 290 hours and 44 minutes. The recordings were digitized and technically re-engineered at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). Furthermore 104 transcripts of various types (including uncorrected transcripts) are held at the archive; 16 of these transcripts were aligned with the audio in the AGD. Available online are also the original questionnaires, detailed tables of contents, linguistic comments, a list of linguistic peculiarities, and information about the degree of acquaintance between the interaction partners at the time of the interview. The recordings and 22 transcripts are available online via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or in physical form through the AGD’s personal service. 82 uncorrected transcripts are only accessible via the AGD’s personal service.

188 Events 188 Speech events 185 Speakers
...
© Anne Betten
ISW
Emigrant German in Israel: Viennese in Jerusalem

The recordings of the corpus “Emigrantendeutsch in Israel: Wiener in Jerusalem” [”Emigrant German in Israel: Viennese in Jerusalem”] (ISW-) were intended as a supplement to the collection of the earlier corpus “Emigrantendeutsch in Israel” [”Emigrant German in Israel”] (IS--). As in the case of corpus IS--, the interview project was directed by Anne Betten (University of Salzburg). Most of the interviews were conducted by students and staff of the Institute of German Studies at Salzburg University during an excursion to Israel in December 1998. The interview partners are 24 Jewish men and women, who were born in or had grown up in Austria (mostly in Vienna) and lived in Jerusalem; at the time of the recordings they were 69-90 years old. The majority of the speakers left Austria after the “Anschluss” (Annexation of Austria into Nazi Germany in 1938) without their parents, with the support of the Jugendalija organization (Youth Aliyah). The narrative autobiographical interviews focus on the biographies of the interviewees before and after their emigration and the associated problems of changing language and culture, but also allow for spontaneous thematic developments. All the interviewees speak standard German (more or less strongly Austrian-coloured), using dialect mostly only in quotations or in personal remarks addressed to their Austrian interviewers. Two interviewers visited each interviewee at his/her home, sometimes other family members were present. An exception is Anne Betten’s one-on-one interview with Ari Rath, which had begun during the excursion but was continued in 3 later sessions in Salzburg and Jerusalem. In 2010/11 the corpus was supplemented by 3 additional interviews by Michaela Metz (University of Salzburg) with partners who had already been interviewed in 1998. The corpus “Emigrant German in Israel: Viennese in Jerusalem” (ISW-) comprises 28 audio recordings from 1998 to 2011. The recordings were digitized and / or sound-engineered at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). They are made available in the Database for Spoken German (DGD), as well as 20 transcripts (orthographic raw transcripts with additional notations) and detailed tables of contents of all interviews. Individual audio recordings and transcripts can also be provided for download or on physical media via the personal service of the AGD. The uncorrected transcripts are only accessible via the personal service of the AGD. For 19 speakers, scans of their personal questionnnaires are available as well.

28 Events 28 Speech events 24 Speakers
...
© Anne Betten
ISZ
Second generation German-speaking Migrants in Israel

The corpus „Second generation German-speaking Migrants in Israel“ (ISZ-) was launched following the projects „Emigrant German in Israel“ (IS--) and “Emigrant German in Israel: Viennese in Jerusalem” (ISW-). It consists of interviews with 66 descendants of German-speaking Jews (“Jeckes”), especially the children of interview partners represented in the corpora IS-- and ISW-. The recordings were made in various phases: The core consists of 65 interviews conducted by Anne Betten in 1999-2000 and 2004-2006; in 2010-2012 the material was expanded by two interviews conducted by Michaela Metz (University of Salzburg) and a (video-) panel discussion of Anne Betten with 6 former interview partners. In 2017 and 2018, Anne Betten conducted additional interviews with 11 of her old interview partners as well as one new interview. In 2019, Rita Luppi (University of Milan) re-interviewed 18 former interview partners and recorded in addition 2 interviews with new partners. The majority of the interviews was conducted entirely or largely in German, but some of Betten’s interviewees preferred to speak English. In some cases, the matrix language switched several times between German and English, where the interviewer tried to trigger a switch back to German and which mainly succeeded when the conversation was about memories of the childhood home, visits to German-speaking countries and similar topics related to the family background. Thematically, Betten’s first 65 interviews and the panel discussion focus on the question of how the interviewees felt as children of German-speaking Jews (“Jeckes”) from their childhood through the present and how growing up in two cultures affected the formation of their identity; the interviews from 2017/18 concentrated on questions how the death of their parents influenced and perhaps changed their relationship to the parents‘ native countries, language and cultural heritage. The interviews of Michaela Metz focus again and in more detail on childhood experiences (like in her additional interviews to IS-- and ISW-). For Rita Luppi, the decisive factor for the repetition of interviews with partners, who were already interviewed 15-20 years before, was the topic of her PhD thesis „Re-telling“; for this purpose she has chosen speakers with a good command of German, trying to generate once more narrations from the first interview. The ISZ-corpus can thus provide also material for research on the connection between language skills, language attitudes and social experiences as well as for analyses on functional code switching. Special research opportunities are offered by the fact that several speakers were interviewed up to 4 times by 2 and even 3 different interviewers at different stages. The version of the corpus ISZ- archived at the IDS comprises 102 audio and video recordings made in the period from 1999 to 2019, with a total duration of 165 hours and 03 minutes. The recordings were digitized and technically re-engineered at the Archive for Spoken German (AGD) (formerly: German Language Archive). 65 uncorrected transcripts (ISZ-_E_00001 to ISZ-_E_00065) are stored in the archive. For one recording (ISZ-_E_00073) a corrected transcript is available within the Database for Spoken German, both in MS Word and PDF-formats. In addition, detailed tables of contents are available as well as a list of linguistic peculiarities. The recordings and additional materials are provided online through the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD. The uncorrected transcripts are only accessible via the personal service of the AGD.

100 Events 100 Speech events 66 Speakers
...
© Projekt 'Jugend, Kommunikation, Medien'
JK
Youth communication

The corpus of youth communication was collected in the Rhine-Main area of Germany as part of a project at the University of Frankfurt. The project's manager was Klaus Neumann-Braun. The aim of the project was to investigate the everyday communication culture of young people by means of an ethnographic conversation analysis study. The focus of interest was on linguistic-interactive in-group methods used by young people to form their group into a community and the forms of social categorization that they use to make sense of themselves and their social environment. Both formal and functional aspects of communicative practices were to be analysed. The ethnographic perspective aimed to gain insight into the variation and changes in practices over time and their dependence on different participation structures and situational contexts. The part of the corpus JK that is currently accessible to external users comprises 6 sound recordings with a total duration of 4 hours and 42 minutes from the period 1996 to 1999. These recordings involve adolescents or (in the later phases) young adults who lived in a small rural town in the Rhine-Main area and regularly visited the youth centre there. The speakers were accompanied during some of their activities by the recording managers, who simultaneously also acted as supervisors. This resulted in recordings of conversations in which plans were made as well as of informal conversations. The recordings were digitized at the Archive for Spoken German. The parts of the corpus JK-- that are available to external parties are made available online via the Database for Spoken German (DGD). Individual recordings may also obtained for download or on physical media through the personal service of the AGD.

6 Events 6 Speech events 17 Speakers
...
KN
German standard language: König-Korpus

The corpus "Deutsche Standardsprache: König-Korpus" (KN--) was created by Werner König (University of Freiburg, later University of Augsburg). For his research on the pronunciation of German and the compilation of a pronunciation atlas, he elicited text readings and word lists as well as conducted short language biographical interviews with 74 persons from 62 larger and smaller cities spread over the old Federal Republic of Germany, mostly students and young academics. His analyses are published, among others, in: Werner König (1989): Atlas zur Aussprache des Schriftdeutschen in der Bundesrepublik Deutschland. 2 vols. Ismaning: Hueber. After his retirement, Werner König handed over the audio recordings and other material to the IDS. For legal reasons, only a small part of the corpus KN-- can be made available to external users at present. Accessible are 43 audio recordings from 1975 with a total duration of 5 hours and 48 minutes. These are recordings with 43 students and academics (women and men) between the ages of 17 and 27 from 43 relatively evenly distributed locations in the old federal states. The speakers were born and raised in the selected locations. In all cases, at least one parent comes from the same place or from its vicinity. On the recordings accessible to external parties, the informants' read-aloud language can be heard. An excerpt from the Basic Law was read aloud. The recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). The recordings have been transcribed, and the transcripts have been synchronized (aligned) with the audio at the AGD. There are also word and lemma lists available. The parts of the corpus KN-- accessible to external users are made available via the Database for Spoken German (DGD). Individual audio recordings can also be shared via the personal service of the AGD.

43 Events 43 Speech events 70 Speakers
...
© IFM, LMU München
MEKI
Multilingual Children at Pre-school Age (MEKI)

The Corpus Multilingual Children at Pre-school Age (MEKI) was created as part of a study that accompanied the implementation of a language support program. The aim of the study was to examine the linguistic development of children aged 5-7 years who had not yet entered primary school under the conditions of a language support program over a period of nine months. The data was collected over a period of 10 months in the context of language didactic offerings made at day-care centres or as part of a preparatory course at a primary school. The groups of children observed consisted of eight to twelve children. The group events were documented with video cameras. As a participating observer, the researcher was marginally involved in the events. The video equipment was set up and disassembled together with the children, so that the children experienced the recording situation as part of the overall situation. In addition to the events in the language support group, stories were recorded in elicitation settings. The version of the MEKI corpus that is archived at the IDS contains 85 recordings with a total duration of 3 hours and 8 minutes. 82 transcripts are available as well as word and lemma lists ordered alphabetically and according to frequencies. The recordings have been sound-edited and anonymized at the Archive for Spoken German (AGD). The transcripts (orthographic transcription; orthographic normalization; lemmatization; POS tagging) were checked in the AGD, amended as necessary, corrected and synchronized with the audio signal (aligned). The version of the corpus MEKI archived at the IDS is available online via the Database for Spoken German (DGD).

55 Events 86 Speech events 28 Speakers
...
© Aaron Schmidt-Riese
MEND
Mennonite Plautdietsch in North and South America

The MEND corpus, which was collected by Göz Kaufmann in the years from 1999 to 2002, consists of the plautdiet translations of 46 stimulus sentences by 321 Mennonite informants. In total, there are about 14,500 usable sentence translations with a total recording time of about 40 hours. This corpus was prepared by the AGD in collaboration with Göz Kaufmann and Aaron Schmidt-Riese. As a rule, Spanish stimuli were used in Mexico, Paraguay and Bolivia, while Portuguese stimuli were used in Brazil and English stimuli in the US. The number of interviews per colony fluctuates considerably: Mexico: 103 informants from the area around Ciudad Cuauhtémoc / Chihuahua --- Paraguay: 42 informants from the Menno colony with its central town of Loma Plata; 37 informants from the Fernheim colony with the central location Filadelfia; 2 informants from the colony Neuland --- Bolivia: 8 informants from Colonia Canadiense in the area of ​​Santa Cruz de la Sierra --- Brazil: 56 informants from Colônia Nova / Rio Grande do Sul --- USA: 67 informants from Seminole / Texas and 6 informants who lived in Seminole / Texas at the time of recording but had lived there for less than 5 years. These informants had mainly lived in Canada (USA-9 / USA-23 / USA-24 / USA-45), Mexico (USA-26) or in other parts of the USA (USA-18). The 46 stimulus sentences (archived as part of the additional material) cover different sentence types. In addition to six main sentences (sentences 41-46), the following futher sentence types were queried: ten complement clauses in postposition (sentences 1-10), ten conditional clauses in pre-position (sentences 11-20), ten causal clauses in post-position (sentences 21-30) and ten sentence-medial or sentence-final relative clauses (sentences 31-40). All subordinate clauses require the additional translation of a matrix sentence. Main clause 42 also contains a preceding temporal clause. Eighteen stimulus sentences aim at translations with a (particle) verb (sentences 1-4, 11-14, 21-24, 31-34, 41 + 42), eighteen stimulus sentences for translation with two verbs (modal verb + infinitive, or tense auxiliary + participle; sentences 5-8, 15-18, 25-28, 35-38, 43 + 44) and ten stimulus sentences for a translation using three verbs (9 sentences with counterfactual proposition and modal verb and 1 sentence (sentence 9) with an epistemic modal verb with infinitive perfect; sentences 9 + 10, 19 + 20, 29 + 30, 39 + 40, 45 +46). Naturally, deviations from these expected productions were common (especially tun-periphrases and four-part verb complexes in counterfactual propositions). The main verbs in the sentences almost always govern a direct object, which allows for easier determination of the positions of the finite and non-finite verbal elements. Adverbs or markers of negation were also included in some sentences. The sentences were read to the respective informant individually and then immediately translated without the help of a written version. This oral translation of sentences, some of which are of considerable complexity, naturally requires the informants to have good competence in the source and target languages. This automatically excluded some groups of people, namely those who have little command of the majority language of their home country (especially the (older) women in Mexico and Bolivia), and those who have largely lost Plautdietsch (some younger Mennonites in the United States and Brazil). In the event of translation problems, the stimulus in question was repeated either immediately or at the end of the interview.

321 Events 321 Speech events 322 Speakers
...
MIKO
Note-taking in Lectures: A Multimodal Corpus of Academic Language

MIKO (Note-taking in Lectures: A Multimodal Corpus of Academic Language) is a multimodal, academic lecture-note corpus dedicated to the study of note-taking in academic lectures on subjects covered by exams during the introductory phase of study programs. MIKO contains twelve lectures from compulsory courses in Medicine (Functional Anatomy, Physics for Medical Practitioners), German as a Foreign Language (Fundamentals of Lexicology of Contemporary German) and Economics (Civil Law for Economists) compiled as a linguistic corpus. Anonymized lecture notes taken by students with German as L2 (international students) and L1 (domestic students), as well as the associated metadata collected in the context of the SpraStu project, are also available. The lecture notes can be requested through the AGD: agd-service@ids-mannheim.de.

12 Events 12 Speech events 5 Speakers
...
MV
Domestic German Varieties: Varia

The corpus MV-- was established by the German Language Archive (DSAv) as an archive corpus under the name "Domestic and foreign German dialects: Varia" with the code VII to store sound recordings of varied provenance which did not originate from the DSAv's own projects. Among these were some larger recording collections, many small collections and numerous individual recordings that were transferred to the DSAv by external scholars, e.g. from collaborators in the USA or Australia, for publication in the DSAv's PHONAI series, from projects developed in cooperation with the DSAv, as a gift or from the estate of researchers. Between about 1955 and 1985, a total of 360 sound recordings or sound carriers were placed in corpus VII. In the main, these were dialect and colloquial language recordings from the entire German-speaking area, from German-speaking enclaves in Europe and from the non-European German language enclaves that were hardly represented in the large variety corpora held by the DSAv (e.g. ZW, OS) . 184 sound recordings, whose processing status corresponded to the then common standard of the DSAv, were published in 1992 in the DSAv's general catalogue of sound recordings (PHONAI Vol. 38/39. Tübingen: Niemeyer). The same recordings were included in the first Database for spoken German (DGD) of the DSAv. 109 recordings remained in the corpus, which was henceforth labelled MV. 75 recordings with speakers from the USA, Canada and Mexico were moved to a new corpus "Deutsch in Nordamerika" (NA), which was later to be expanded to include further recordings from this area. These plans were not pursued further when the Archive for Spoken German (AGD, the successor organization of DSAv) developed DGD2, the updated second version of the Database for spoken German. Accordingly, the recordings from the NA corpus were reintegrated into the MV corpus. In order to be able to present the Archive's recordings from linguistic enclaves, two new corpora were split off from the MV corpus in 2019. The first corpus, "Extraterritoriale Varietäten Varia" (MVEX), consists of 46 re-engineered recordings from linguistic enclaves that were previously part of MV as well as 48 other recordings from linguistic islands that until then had been part of the old corpus VII and which were newly worked up for inclusion in DGD2. The second corpus split off from MV, "German in Wisconsin" (WISC), consists of a closed set of 64 recordings from Wisconsin/USA. At this time, only 72 standard language recordings from the contiguous German-speaking area in Europe remain as part of MV. In the future, these are to form a new corpus together with further domestic German variety recordings from the old corpus VII which are still to be worked up. The 72 recordings currently in MV have a total duration of 20 hours and 8 minutes. They were collected by the US-American scientist Carol Tokosh in 1972 for an investigation of the current language in Germany. 6 speakers (3 women/3 men, younger and middle-aged) were recorded in each of eight smaller cities in different linguistic regions of the old Federal Republic of Germany [then West Germany]. In the sam way, 6 speakers were recorded as well as in two cities in Austria and Switzerland, respectively, for a total of 72 speakers. Each speaker produces a list of words and a narrative on prespecified topics in standard and colloquial language, respectively, for a total of 72 speech events. Copies of the sound recordings were transferred to the DSAv and digized later by the AGD. Digital transcripts are not available. The sound recordings of the corpus MV-- are available online via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

72 Events 72 Speech events 72 Speakers
...
MVEX
Extraterritorial Varieties: Varia

The corpus "Extraterritorial varieties: Varia" (MVEX) consists of 94 recordings representing varieties from German language enclaves in Europe (Poland, Ukraine, Romania, Serbia, Italy) and outside of Europe (Canada, USA, Mexico, South Africa, Australia, New Zealand). The speakers represented in the corpus are mainly Mennonite Germans, Pennsylvania Germans, Australian Germans and Danube Swabians. These sound recordings from the period between 1958 and 1983 were taken in by the Deutsches Spracharchiv (DSAv) from foreign scholars and archived in the corpus "Inland and foreign German dialects: Varia" (VII, later MV). In 1992, 46 recordings which had been processed according to the state of the art the in use were selected for the DSAv general catalogue (1992; PHONAI Bd. 38/39. Tübingen: Niemeyer) and included in the database for spoken German (DGD) of the DSAv. 9 of the 46 recordings were integrated into the corpus "Deutsch in Nordamerika" (NA). In the second major version of the Database for Spoken German, DGD2, the 46 recordings were initally accessible as part of the MV-- corpus (see there). They have since been technically re-engineered and transferred to the corpus MVEX (while retaining the numeric part of the ID they had as part of MV). The remaining group of 48 recordings in MVEX are accessible via the DGD for the first time, after technical editing was performed and metadata compiled. The total duration of the recordings in MVEX is 28 hours and 31 minutes. The recordings are with 126 speakers (women and men) from Australia, Germany, Italy, Canada, Mexico, New Zealand, Austria, Poland, South Africa and the USA. The speech events captured are of various kinds, mainly narratives, conversations, translations and standard texts (control sentences, Wenker sentences, word lists, numbers, weekdays, months). The recordings have been digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv). The transcripts of 17 recordings from Australia, Italy, Canada and the USA were published in volumes 6, 10, 18, 21 and 31 of the PHONAI series edited by DSAv. For two recordings from Australia, transcripts synchronized (aligned) with the audio were created and corrected at the AGD on this basis. The sound recordings and currently two transcripts of the MVEX corpus are available via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media by way of the personal service of the AGD. Similar recordings to those in MVEX can be found in the DGD as part of the corpora "Australian German" (AD), "Mennonitenplautdietsch in North and South America" (MEND), "Spoken German in Southern Africa (GDSA), "Russian-German dialects" (RUDI) and "German in Wisconsin" (WISC). (WISC also originated from the old corpus MV--.) Data that is relevant for German linguistic enclaves in Europe can also be found in the Zwirner corpus (ZW) as well as in the corpus "German dialects: former German Eastern territories" (OS). Further similar recordings are part of AGD corpora which are not (yet) prepared for distribution through the DGD, in particular "German in New Zealand" (NZ).

94 Events 94 Speech events 126 Speakers
...
OS
German Dialects: Former German Eastern Territories

The corpus "Deutsche Mundarten: ehemalige deutsche Ostgebiete" ["German Dialects: Former German Eastern Territories"] (OS--) was created by the German Language Archive (DSAv) in cooperation with the Research Institute for the German Language "Deutscher Sprachatlas" (Marburg). Project leaders were Eberhard Zwirner (DSAv) and Ludwig Erich Schmidt (Deutscher Sprachatlas). In order to complement the corpus "Deutsche Mundarten: Zwirner-Korpus" ["German Dialects: Zwirner-Korpus"] (ZW--), the two institutions undertook a data collection effort with the aim of documenting, to the extent possible, the dialects of the contiguous German language area in the former German East as well as the German language enclaves in Eastern and South-Eastern Europe. In the corpus ZW-- as it existed, these dialects were represented only incidentally according to the recording locations. In order to supplement the corpus "Deutsche Mundarten: Zwirner-Korpus" ["German Dialects: Zwirner-Korpus"] (ZW--), a survey was carried out with the aim of documenting as comprehensively as possible the dialects of the closed German language area in the former German East and the German language islands in Eastern and Southeastern Europe, which were recorded in the corpus ZW-- only after random scattering at the recording location. The survey took place (i) throughout the old Federal Republic of Germany (often in refugee camps and the newly established so-called "refugee towns"), (ii) in Austria (around Salzburg), and (iii) in the northern Bohemian area around Gablonz (Jablonec nad Nisou) with speakers remaining in their original settlement area within the state that had by then become Czechoslovakia . The DSAv took on the technical supervision of the recordings, providing its own sound engineers and a recording truck. The "Deutsches Sprachatlas" was responsible for the linguistic and content-related supervision. Accordingly, the focus of the corpus OS--, in contrast to the thematically freer corpus ZW--, was on the pre-specified topics of the rural working environment and everyday life as well as regional customs. In addition, the Wenker sentences were recorded by almost all speakers in three regionally modified versions. Further project information is published in: Bellmann, Günter / Göschel, Joachim (1970): Tape recording of East German dialects 1962-1965. complete catalog. Marburg (= DDG 73). The corpus OS-- comprises 984 sound recordings. Of these, 981 sound recordings from the period 1962 to 1965 with a total duration of 462 hours and 5 minutes are externally accessible. These are recordings with 987 mostly older resettled ethnic Germans ["Übersiedler"] (women and men) from the former German eastern territories, who were speakers of German dialects from eastern and southeastern European states, representing the language status before 1945. For comparative purposes, some children born after 1945 as well as locally-born spouses of refugees in Schleswig-Holstein and Swabia were included, as well as specific speakers of the local dialect. Speech events of various kinds were recorded, especially narratives, conversations and standard texts (days of the week, numbers, Wenker sentences). The recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). Standard-language transcripts as well as phonetic and phonemic transcripts for 4 recordings were published in volumes 9 and 19 and in supplement 4 of the PHONAI series edited by the DSAv. The AGD archive holds 281 transcripts that it digitized and synchronized (aligned) to the audio. The textual transcriptions hew close to the standard language and follow the old orthography. They come with additional notes by the transcribers, lemmatization, and POS tagging. Based on the metadata. A list of topics, a list of linguistic peculiarities and a list of the professions of the speakers were created. Also available are word and lemma lists ordered alphabetically and by frequency. The corpus OS-- can be accessed via the Database for Spoken German (DGD). Individual sound recordings and transcripts can also be provided for download or on physical media via the personal service of the AGD.

981 Events 981 Speech events 987 Speakers
...
PF
German colloquial languages: Pfeffer corpus

The Korpus Deutsche Umgangssprachen: Pfeffer-Korpus (PF--) was created as part of a project at the Institute for Basic German (IBG) at the University of Pittsburgh (USA) on spoken contemporary German and transregional colloquialisms. The project's leaders were J. Alan Pfeffer and Walter F. W. Lohnes. Similarly to the project "Grundstrukturen der deutschen Sprache" ["Basic Structures of the German Language"] at the former Freiburg Research Center of the IDS (see Freiburg Corpus FR--), the aim of the Pittsburg project was to develop new linguistic foundations for teaching German, especially as a foreign language. Within the project "Basic German", the focus was on establishing a basic stock of German colloquial language as spoken at the time, including by using empirical methods. By contrasts, in the Basic Structures project, the main focus was on syntactic investigations. For Basic German, the entire German-speaking area was considered; the recordings were made in 37 cities of the Federal Republic of Germany [FRG, then West Germany], 10 cities in the German Democratic Republic [GDR, then East Germany], 6 cities in Austria and 4 cities in Switzerland. Regional cooperation partners of the IBG were: the Deutsches Spracharchiv (DSAv) for the Federal Republic of Germany; the Institut für deutsche Sprache und Literatur der Akademie der Wissenschaften der DDR for the German Democratic Republic; the Österreichische Akademie der Wissenschaften and the Phonogrammarchiv Wien for Austria; and the Universities of Basel and Zurich, among others, for Switzerland. The partner instituations organized the recordings in their areas and provided the recording equipment. In the FRG, the DSAv cooperated not only with university institutes and regional lexicographic offices, as was the case during the collection of the Zwirner corpus (ZW), but also with Volkshochschulen [adult education centers]. The PF-- corpus consists of 398 sound recordings from 1961 with a total duration of 79 hours and 15 minutes. These are recordings of reports, narratives and lectures from a list of pre-specified topics by 402 speakers (women and men) of different ages, with different levels of education and different professions from the FRG, the GDR, Austria and Switzerland. Two originals of each recording session were produced. One was received by the IBG, the other remained with the respective cooperation partner. Copies of the sound recordings from the GDR were transferred to the DSAv in 1965, and copies of the recordings from Austria and Switzerland were passed by the IBG to the DSAv in 1985. The recordings were later digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv) based on the holdings available there. After completion of the data collection, the recordings were transcribed by the IBG and the individual cooperation partners in order to be able to analyze them for the purposes of the Basic German project. Standard linguistic, phonetic, and phonemic transcripts for 10 recordings from four cities were published in volumes 16 and 35/36 of the PHONAI series published by the DSAv. Transcripts for another 26 abbreviated colloquial samples were published in volume 17 of PHONAI. The transcripts were revised and some of the recordings were re-transcribed during the period 1978-1982 as part of a project conducted jointly by the IBG, which had been relocated to Stanford University in 1976, the DSAv at the IDS, and the Goethe-Institut in Munich. The transcripts of the PF-- corpus were published with additional project information in: J. Alan Pfeffer, Walter F. W. Lohnes (eds.) (1984): Grunddeutsch. Texts on spoken contemporary German. (PHONAI vols. 28-30) Tübingen: Niemeyer. The template for the publication was created by the former Linguistic Data Processing Unit of the IDS. This tempplate also served as the basis for the digitization of the corpus data and the transcripts. At this time, the AGD's holdings for PF-- include 414 digital transcripts for all 398 recordings. 398 of the transcripts are the digital versions of the transcripts published in PHONAI vols. 29/30 (wording in old orthography ; punctuation according to old standard punctuation, additional notations, lemmatization, POS tagging), which were synchronized (aligned) with the audio in the AGD. A set of 16 recordings from Tübingen and Stuttgart were newly transcribed (high-level transcription oriented to the new orthography (token by token), transcribers' explanations, lemmatization, POS tagging) by the Arno Ruoff Archive/Ludwig Uhland Institute of the University of Tübingen (formerly: Tübinger Arbeitsstelle Sprache in Südwestdeutschland) as part of a joint project with the AGD during 2015-2021. A list of topics, a list of linguistic peculiarities, and a list of speakers' occupations are also provided. These were derived from the metadata. Also available are word and lemma lists arranged alphabetically and by frequency. The PF-- corpus is made available in the Database of Spoken German (DGD), and individual recordings can also be obrtained by way of AGD's personal service.

398 Events 398 Speech events 402 Speakers
...
RUDI
Russian-German Dialects

Das Korpus Russlanddeutsche Dialekte entstand im Rahmen von mehreren Projekten der Universitäten Tomsk und Omsk (Sibirien) im Zeitraum von 1959 bis 1989. Der besondere Wert des Korpus besteht darin, dass es authentische Sprachaufnahmen aus den heute nicht mehr existierenden deutschen Sprachinseln im östlichen Teil der ehemaligen Sowjetunion erfasst. Es handelt sich um Erzählungen, Interviews und Erinnerungen der Russlanddeutschen. Das Korpus repräsentiert den Sprachgebrauch kompetenter Mundartträger in damals intakten Sprachinseln und veranschaulicht die sieben Haupttypen der russlanddeutschen Dialekte: Hessisch, Schwäbisch, Bairisch, Südfränkisch, Wolhyniendeutsch, Niederdeutsch und Pfälzisch. Das Korpus ist nicht nur aus sprachlicher Sicht auf deutsche Mundarten in diesen Sprachinseln interessant, sondern spiegelt auch viele Facetten des Alltagslebens und der Kultur der Russlanddeutschen in diesem Zeitraum wider.

20 Events 286 Speech events 20 Speakers
...
SA
Children's language: Saarbrücken Corpus

The Corpus "Children's language: Saarbrücken" (SA--) was created under the auspices of a project at the University of Saarbrücken. The project's leaders were Rainer Rath, Hubert Immesberger and Josef Schu. The corpus was used to investigate the late-stage, undirected language acquisition of children of Turkish and Italian descent. The corpus SA-- comprises 65 sound recordings from the period between 1982 and 1984 with a total duration of 4 hours and 33 minutes. They recordings were made in situations of participatory observation in Saarland. The recordings are of child-adult interactions. The recordings focus on two Turkish, two Italian and two German children aged 9 to 13 years. Recordings were made of various types of speech events (among them descriptions, narratives, summaries, retellings, planning, games, conversations, appointments and directions). The recordings were digitized at the Archive for Spoken German (AGD) (formerly: Deutsches Spracharchiv). The transcripts of the corpus SA-- are published in book form in: Rainer Rath, Hubert Immesberger, Josef Schu, (1987): Kindersprache - Texte italienischer und türkischer Kinder zum ungesteuerten Zweitspracherwerb. Mit Vergleichstexten deutscher Kinder. [Children's language - Texts of Italian and Turkish children with relation to undirected second language acquisition. With comparative texts of German children ]. Phonai, vol. 32. Tübingen: Niemeyer. Copies of the corresponding recording excerpts were made available by the project to the German Language Archive (DSAv) at IDS and prepared as material accompanying the printed volume. No digital transcripts of the recorings are available. The audio recordings of the corpus SA-- are made available online via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD.

48 Events 48 Speech events 46 Speakers
...
SR
Slavic Dialects in the Ruhr Valley

The corpus Slavic Dialects in the Ruhr Area (SR--) was created within a project of the Seminar for Slavic Studies at the University of Bochum in cooperation with the German Language Archive (DSAv) to record Slavic lingua franca in the Ruhr Area (Polish, Slovenian, Ukrainian, Russian). The project leader was Christian A. van den Berk. The corpus SR-- comprises 30 sound recordings with 31 speakers of different Slavic languages from 1969. 23 recordings with a total duration of 6 hours and 40 minutes are made accessible. These are recordings of narrations by 23 women and men between the ages of 17 and 78. In addition to a Slavic language, a section of German with the same content was spoken in these recordings. In the remaining 7 sound recordings, the speakers' knowledge of German was insufficient or there was no recording made in German due to a lack of language skills, or it was deliberately omitted. Most of the speakers came from the Polish-speaking areas of the former German Reich (Upper Silesia, Poznan Province) to the Ruhr Area before the First World War to work in the mining industry, also as children with their parents. In some cases, the speakers are descendants who had already been born in the Ruhr Area. Other speakers came during the Second World War or as late resettlers in the 1950s from Poland and the Ukraine or in the 1960s as guest workers from Slovenia in what was then Yugoslavia. The recordings, which were made in Bochum, Bottrop, Essen, Herne and Recklinghausen, have been digitized by the Archive for Spoken German (AGD) (formerly the German Language Archive). No transcripts are available for the recordings. However, a list of topics was created based on the metadata. The corpus SR-- is made available via the Database for Spoken German (DGD). Individual recordings can also be obtained through the personal service of the AGD.

23 Events 23 Speech events 23 Speakers
...
SV
German Dialects: Southwest Germany and Vorarlberg

The corpus "German Dialects: Southwest Germany and Vorarlberg" (SV--) was compiled within the framework of a project at the Tübingen Centre for Language in Southwestern Germany. (At the time, the Centre was a branch of the German Language Archive [Deutsches Spracharchiv, DSAv]). The project's leaders were Arno Ruoff and Eugen Gabriel (the latter in charge of Vorarlberg). The sound recordings were intended to supplement the corpus of "German dialects: Zwirner-Korpus" (ZW--) so that the combination of SV and ZW would yield denser spatial coverage of Southwest Germany and Vorarlberg than ZW alone did. Accordingly, the recording campaign was based on the recordings made for the corpus ZW-- in southwest Germany. Further information about the project is published in: Ruoff, Arno (1973): Grundlagen und Methoden der Untersuchung gesprochener Sprache [Fundamentals and Methods of the Study of Spoken Language]. Tübingen. (Idiomatica Vol. 1). Up until the organizational separation of the Tübinger Arbeitsstelle [Tübingen branch] from the Deutsches Spracharchiv in 1970, 250 recordings were transferred to the DSAv, after which the corpus was continued by the Tübingen branch alone with further recordings. As part of a joint transcription project, in 2021 another 25 recordings from the part of the corpus whose collection the Tübingen branch had directed were transferred to the Archive for Spoken German (AGD; successor to the DSAv). Following this last expansion, the corpus SV-- comprises 275 sound recordings. 267 recordings from the period 1963 to 1987 with a total duration of 78 hours and 45 minutes are accessible externally. The recordings feature 267 speakers (women and men), some of whom were informants for the language atlas of Vorarlberg and Liechtenstein and some of whom were persons from Bessarabia (Ukraine) who were resettled in Württemberg after the Second World War. Recorded were narratives and standard texts (days of the week, numbers). The first 250 recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive), the remaining 25 recordings were digitized at Tübingen and transferred to the AGD in this form. Literary transcripts of 17 recordings were published as part of the two volumes of Alltagstexte [Everyday texts] compiled by the Tübinger Arbeitsstelle (Ruoff, Arno (ed.) (1984/1985) Alltagstexte I und II. IDIOMATICA vol. 10 and vol. 11; Tübingen: Niemeyer). 93 recordings from the Baden-Württemberg are were newly transcribed during the period 2015-2021 by the Arno Ruoff Archive/Ludwig Uhland Institute of the University of Tübingen (successor to the Tübinger Arbeitsstelle) as part of a cooperative project with the AGD. These transcripts cover the 25 recordings newly transferred from Tübingen to the AGD and 68 recordings from the older holdings (high-level transcription oriented to the new orthography (token for token), transcribers' explanations, lemmatization, POS tagging). The 93 transcripts were synchronized (aligned) with the audio. Based on the metadata, a list of topics and a list of speakers' occupations were generated. The corpus SV-- is made available in the Database for Spoken German (DGD), individual recordings can also be shared by way of the AGD' s personal service.

267 Events 267 Speech events 267 Speakers
...
SW
German Dialects: Black Forest

The corpus "German Dialects: Black Forest" (SW--) was created within the framework of a project of the Tübingen Center for the Language of Southwestern Germany. (At the time, the Center was a branch of the German Language Archive [DSAv]). The project's leader was Arno Ruoff. The recordings for SW were intended to supplement the corpus "German dialects: Zwirner corpus" (ZW--), leading to denser spatial coverage of the Black Forest area. The recording campaign thus took the recordings made for the ZW-- corpus as a point of reference. The data should, among other things, enable analyses of the local languages of three hamlets (Schönmünz, Romishorn and St. Roman). These more remote places were chosen deliberately, because it was suspected that local dialects were still preserved there in much purer form. Further information about the project is published in: Ruoff, Arno (1973): Grundlagen und Methoden der Untersuchung gesprochener Sprache [Fundamentals and Methods of the Investigation of Spoken Language]. Tübingen. (Idiomatica vol. 1). The corpus SW-- is made up of 130 sound recordings. 126 of these, collected during the period 1964 to 1974 with a total duration of 36 hours and 31 minutes, are externally accessible. These are recordings with (122) speakers (women and men) from the former districts of Freudenstadt and Wolfach. For 5 speakers, a second recording was made in 1974, 10 years after the first. Recorded were read-aloud speech, narrations and standard texts (days of the week, numbers). The recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). Literary transcripts of 10 recordings were published in the two volumes of Alltagstexte [Everyday texts] compiled by the Tübinger Arbeitsstelle (Ruoff, Arno (ed.) (1984/1985) Alltagstexte I und II. IDIOMATICA vols. 10 and 11; Tübingen: Niemeyer). 126 recordings were newly transcribed (high-level transcription oriented to the new orthography (tokens for tokens), transcribers' explanations, lemmatization, POS tagging) by the Arno-Ruoff-Archiv/Ludwig-Uhland-Institut of the University of Tübingen (successor to the Tübingen Arbeitsstelle) in a joint project with the AGD during the period 2015-2021. The transcripts were synchronized (aligned) with the audio. A list of topics and a list of speakers' occupations were created based on the available metadata. The corpus SW-- is made available as part of the Database for Spoken German (DGD), individual recordings can also be shared through the personal service of the AGD.

126 Events 126 Speech events 122 Speakers
...
UNSD
Rabaul Creole German

The Unserdeutsch (UNSD) corpus was created as part of the German Research Foundation (DFG)-funded project "Unserdeutsch (Rabaul Creole German): Documentation of a Highly Endangered Creole Language in Papua New Guinea" (2015-2021) (project number 275623802), which was initiated at the University of Augsburg and later continued at the University of Bern. Péter Maitz and Werner König served as project leaders. The cooperating partners of the project included Craig A. Volker (James Cook University Cairns), Peter Mühlhäusler (University of Adelaide), and Ludwig Eichinger (IDS). Siegwalt Lindenfelser served as a member of the research staff. The corpus documents language use, sociobiographical past, and language attitudes of the last surviving speakers of the German-based creole language Unserdeutsch in Papua New Guinea and on the east coast of Australia. The language emerged in the early 20th century in the boarding school environment of the Catholic mission station Vunapope on the island of New Britain in the Bismarck Archipelago, which was part of the colonial territory German New Guinea from 1884 to 1914. The provided recordings were made during several fieldwork trips between September 2014 and July 2018. The speech recordings collected include 68 interviews (over 61 hours) and 12 question book translations (approximately 18 hours), for a total of nearly 80 hours of audio material. The interviews are primarily semi-structured narrative group conversations, and the questionbook translations are based on approximately 320 stimuli in English or Tok Pisin. Nearly 48 hours of interview material are transcribed according to cGAT conventions using EXMARaLDA's Partitur editor. The transcripts are annotated with three levels of annotation: orthographic normalization, lemmatization, part-of-speech tagging (STTS 2.0 with slight modifications). The interviewed speakers are of advanced age, almost all of them older than 65 years of age. All speakers are at least trilingual with Unserdeutsch, Tok Pisin, and English as part of their language repertoire. The vast majority of them migrated to metropolitan areas on the east coast of Australia (Brisbane, Cairns, Gold Coast, Sydney) after Papua New Guinea's independence in 1975. The corpus also covers the acute linguistic endangerment of Unserdeutsch by documenting varying degrees of loss of competence (attrition) from speakers to semi-speakers and rememberers. For further information, extensive metadata is available, which was collected by questionnaire. Some of this metadata can also be used as filters for search queries in the corpus. As part of the release 2.21 in January 2024, 55 audio recordings and 52 transcripts for 53 events as well as metadata for 49 speakers are available for research and teaching in the database for Spoken German. Additionally, a few older Unserdeutsch audio recordings, collected by Craig Volker in 1979 and 1980, are available through the Archive for Spoken German (AGD) as part of the corpus OZ ("German in Oceania").

53 Events 55 Speech events 49 Speakers
...
WISC
German in Wisconsin

The corpus "German in Wisconsin" (WISC) consists of 120 sound recordings covering different German varieties found in the state of Wisconsin in the Midwest of the US. The recordings were collected in 1968/1969 by Jürgen Eichhoff from Madison, Wisconsin, who had moved to the US from Germany . The recordings represent mainly Low German varieties of German. The speakers recorded are mostly 3rd generation descendants of German immigrants from Pomerania, Mecklenburg, Schleswig-Holstein, and Lower Saxony). In addition, Central German ("Dane County Kölsch") and Upper German varieties (Bavarian/Austrian) and Schwyzerdytsch, as well as various standard language speakers figure in the corpus. Copies of 64 sound recordings were made by the Deutsches Spracharchiv (DSAv). These were later incorporated into the corpus "Binnen- und auslandsdeutsche Mundarten: Varia" (VII, later MV), which also held further recordings by other researchers from outside Germany. Because their processing was up to the state of the art in use at DSAv at the time, these 64 copied recordings were selected in 1992 for the DSAv catalogue (1992; PHONAI Bd. 38/39. Tübingen: Niemeyer) and made publicly available through the Database for spoken German (DGD) of the DSAv as part of the corpus "Deutsch in Nordamerika" (NA). In the second version of the Database for spoken German, DGD2, these recordings were accessible as part of the corpus MV-- until 2019 (see there). After technical re-editing they were subsequently separated out from MV as a new corpus WISC. The remaining batch of 56 recordings that had not been copied earlier were finally taken over as digital copies from the Max Kade Institute for German-American Studies at the University of Wisconsin-Madison in 2004. These 56 recordings are accessible for the first time via the DGD. The WISC corpus in the DGD consists of 65 recording sessions with 99 speech events, with a total duration of 79 hours. The corpus features recordings of 63 distinct speakers (women and men, mostly from the older generation) who come from different regions of Wisconsin. The recordings cover speech events of various kinds: narratives, standard texts (phrases, word list, numbers, weekdays) and free interviews. 64 recordings (narratives, Wenker sentences) have been digitized at the Archiv für Gesprochenes Deutsch (AGD) (formerly: Deutsches Spracharchiv), 56 recordings (word lists and interviews) were taken in as digital versions from Madison/Wisconsin and technically edited in the AGD. No digital transcripts are available for any of the recordings. The audio recordings of the corpus WISC are made available via the Database for Spoken German (DGD). Individual recordings can also be obtained for download or on physical media through the personal service of the AGD. Similar recordings to those in WISC can be found in the DGD as part of the the corpus "Extraterritoriale Varietäten: Varia" (MVEX).

65 Events 99 Speech events 63 Speakers
...
ZW
German dialects: Zwirner corpus

The corpus German Dialects: Zwirner Corpus (ZW--) was created as part of a project conducted by the German Language Archive (DSAv). The project leader was Eberhard Zwirner. The goal of the project was to document the dialects in Germany as completely as possible and to use the recordings in the DSAv for phonometric studies. The survey was carried out mainly in the period from 1955 to 1961 in the states of the Federal Republic of Germany [then West Germany], as well as in German-speaking areas of Austria (Vorarlberg), Liechtenstein and France (Alsace). Between 1964 and 1972, supplementary surveys and special surveys were carried out in some areas, e.g. in the Netherlands and in almost all towns in the Westphalian district of Herford. Within the individual regions, the selection of informants and the collection of data was performed by dialectologists with relevant expertise or by members of the project teams for the respective regional dialect dictionaries. While these researchers were responsible for the preparation and realization of the recordings, a sound engineer provided by the DSAv was in charge of the technical realization, using a recording truck provided by the DSAv. For the selection of recording locations, a square grid with a side length of sixteen kilometers was laid over the area and at least one location was selected in each grid square. As a rule, three autochthonous speakers were interviewed at each location, if possible one from the younger, the middle and the older generation (around 20 years, 40 years and over 60 years of age, respectively). Thanks to the inclusion of refugees, displaced persons and resettlers from the former German eastern territories, from the Soviet occupation zone or GDR and from the states of Eastern and Southeastern Europe, it was possible to record numerous dialects from these areas which, in the 1950s -- a short time after displacement or resettlement --, were still spoken by the relevant speakers in a form largely uninfluenced by their new environment. Here, too, speakers from different generations were selected to the extent that was possible and speakers were available at the recording locations. In addition to dialectal contributions, colloquial and standard language contributions were included. Regional languages and language minorities were taken into account, such as North Frisian (mainland and islands) and Platt Danish in Schleswig-Holstein, Sater Frisian in northwestern Lower Saxony, the Romance-French dialect in parts of the Vosges in Alsace (Pays Welche), among displaced persons Water Polish from Upper Silesia, in the Netherlands West Frisian, Yiddish and the languages of the former Dutch colonies in South America and Southeast Asia. The corpus ZW-- comprises 5809 sound recordings, of which 5796 sound recordings (including about 90 recordings in Dutch) from the period 1955 to 1972 with a total duration of 1077 hours and 15 minutes are accessible externally. 5887 persons (women and men) are documented as speakers in the transcripts. Speech events of various types were recorded, among them narratives and standard texts (days of the week, numbers, Wenker sentences, sentences from the Pfälzisches Wörterbuch (Palatinate Dictionary) , dialect-geographical test sentences by Theodor Baader). The recordings were digitized at the Archive for Spoken German (AGD) (formerly: German Language Archive). 130 recordings from the district of Herford (Westphalia) were transcribed in cooperation between the DSAv and the Kreisheimatverein Herford and were presented as an an independent "Herforder Korpus" (HE) in the first version of the Datenbank für Gesprochenes Deutsch (DGD) of the DSAv. This set of recodings was later reintegrated into the corpus ZW-- along with the relevant transcripts when the second generation of the Datenbank für Gesprochenes Deutsch was launched. According to Zwirner's conception, the audio recordings were to be transcribed orthographically, literarily, and phonetically. This goal could only be partially accomplished due to the large amount of recordings. Standard linguistic, phonetic and literary or phonological or phonemic transcripts for about 100 recordings were published in numerous volumes of the series PHONAI and Lautbibliothek der deutschen Mundarten (LDM) edited by the DSAv, as well as in other publications. Currently, 2944 digital transcripts for 2922 recordings are held by the AGD. Of these, 2396 were digitized at the DSAv/AGD (high-level transcription following the rules of the old orthography, transcriber notes, lemmatization, POS tagging). A set of 85 Low German recordings from Schleswig-Holstein, Lower Saxony, and Westphalia were newly transcribed in 2013-2016 in a joint project of AGD and the University of Oldenburg. A further 14 transcripts from Bavarian Swabia and Upper Bavaria were produced in 2017 as part of a course on transcription at the University of Augsburg. Most recently, 449 recordings from the Alemannic language area were transcribed during 2015-2021 by the Arno Ruoff Archive/Ludwig Uhland Institute of the University of Tübingen (formerly: Tübinger Arbeitsstelle Sprache in Südwestdeutschland) in cooperation with the AGD. 427 of these are digitally accessible for the first time (high-level transcription oriented to the new orthography (token by token), transcribers' explanations, lemmatization, POS tagging). All transcripts were synchronized (aligned) with the audio. Based on the metadata a list of topics, a list of linguistic peculilarities as well as a list of speakers' professions were produced. In addition, frequency lists for words and lemmas are made available, arranged alphabetically or by frequency. The ZW-- corpus is made available in the Database of Spoken German (DGD), and individual recordings can also be obtained by way of AGD's personal service.

5796 Events 5796 Speech events 5887 Speakers