Pre-conference workshops

4 pre-conference workshops will be held on the 1st of July.

Agnieszka Lenko-Szymanska

Corpus-Based Approaches
to Contrastive Discourse Analysis
Explorations of native and learner corpora

Agnieszka Lenko-Szymanska

Warsaw University, Poland

Keywords: contrastive discourse analysis, contrastive rhetoric, rhetorical strategies, learner corpora, writing instruction

Learner corpora have recently become an important source of data in second language acquisition studies. Samples of learners’ written and spoken L2 production are collected with the aim to describe as accurately as possible various characteristics of interlanguage. The main interests of researchers evolve around the differences between native and non-native linguistic systems. Thus, investigations focus on such linguistic features as lexical, grammatical and syntactic factors. However, the differences between native and non-native production also occur at the macrolinguistic level and are related to the ways discourse is structured by both groups of language users. Such discrepancies reflect broadly understood cultural differences and are studied within the framework of contrastive discourse analysis or contrastive rhetoric. Their claims can be summarised as follows:

Contrastive rhetoric maintains that language and writing are cultural phenomena. As a direct consequence, each language has rhetorical conventions unique to it.” (Connor 1996:5)

When writing in a foreign language learners show a tendency to transfer not only the linguistic features of their native tongue but also its rhetorical conventions. These conventions pertain to such factors as the structure or units of texts, information structure, the use of metadiscourse or intertextuality. As a result, native speakers of a language may find learners’ written discourse ineffective or even incomprehensible.

So far the research in contrastive rhetoric has rarely applied the corpus-based and quantitative methods to support its claims. However, more and more studies in recent years see the value of such approaches for the analysis of discourse. The aim of this workshop is to present how research in contrastive rhetoric can benefit from the application of the corpus linguistics methodology and how the simplest methods such as frequency counts, analyses of concordance lines and keyword analysis can shed light on the differences in the choice of rhetorical strategies by different groups of writers. The results of such research have important implications for L2 writing instruction.

The workshop will focus on the exploration and comparison of the rhetorical strategies employed by learners of different L1 backgrounds writing in English as well as by native professional and novice writers. The analysis will cover such issues as the differences in the choice of argumentation, the use of textual metadiscourse resources (e.g. linking adverbials), writer and reader visibility or the ways of formulating writer stance. The data will be drawn mainly form the FLOB and FROWN Corpora, ICLE (International Corpus of Learner English), LOCNESS (a corpus of essays produced by British and American university students) and PELCRA learner resources containing essays written in English by Polish advanced learners of English and compositions written by Polish and American students in their mother tongues. Such a range of corpora used in the analysis will help to tease apart the multiple factors influencing the foreign language texts such as L1 transfer, L2 instruction or the lack of expertise in writing. The pedagogical implications of these studies will also be discussed.

The workshop is addressed to scholars interested in undertaking research in corpus-based discourse analysis. It should also be relevant to researchers studying various aspects of learner language such as interlanguage lexis or syntax since it will be demonstrated how the differences in genres across languages have their consequences for purely linguistic choices concerning lexis or grammar. Finally, the workshop should be of interest for teachers involved in writing instruction, especially at the tertiary level.

Participants are assumed to have some background knowledge on discourse analysis and leaner corpora. However, the fist part of the workshop will be devoted to a quick overview of these two areas. Next, the participants will be taken step by step through a few studies exploring the differences in the choice of rhetorical strategies by different groups of writers with the opportunity to discuss the selection of data, data processing procedures, the interpretation of the results and their pedagogical implications. Finally, the participants will have the opportunity to pursue their own mini-investigation into a chosen aspect of contrastive discourse analysis.

Michael Barlow

Contrastive Studies and Translation
Using ParaConc: a Parallel Concordancer

Michael Barlow
University of Auckland – New Zealand

Keywords: translation, concordancer, parallel corpora, contrastive studies, frequency

Parallel concordance software allows a wide range of investigations of translated texts, from the analysis of bilingual terminology and phraseology to the study of alternative translations of a single text. A parallel concordancer can be used to provide information about translation "equivalences" on demand and can provide a much richer picture than that presented in a bilingual dictionary. It is also possible to use parallel corpora to investigate specialized or technical usage information or to examine usage in particular genres. The software can present the user with (i) several instances of the search term and (ii) a large context for each instance of the search term, thereby allowing a thorough analysis of usage, either in terms of the equivalences between two languages or the ways in which specific translation problems have been handled by individual translators. In other words, the software can either be used to analyse millions of words of translated texts or to examine one or more translations of a particular text.
A parallel concordancer can be also used to analyse larger corpora, in which case the influence of the individual translator is backgrounded, enabling the user to investigate similarities or contrasts between two or more languages. Thus some linguists use the software for what might be called corpus-based contrastive analyses.
The analysis of parallel corpora can be very revealing, but it is first necessary to create or acquire the necessary parallel corpora. In the workshop we cover these issues, along with the important topic of alignment of parallel texts, which is a prerequisite for analysis using the software. ParaConc contains an alignment utility and we will step through an alignment exercise as part of the workshop session. Workshop participants can bring sample files to align.
The bulk of the workshop will be based on worksheets related to the operation of the software. We perform simple text searches for words or phrases and sort the resulting concordance lines according to the alphabetical order of the words surrounding the searchword. We examine frequency information of various kinds, including the frequency of collocates of the search term. More complex searches are also possible, including context searches, searches based on regular expressions, and word/part-of-speech searches (assuming that the corpus is tagged for POS). Corpus frequency and collocate frequency information can be obtained. The program includes features for highlighting potential translations, including an automatic component Both the “Translation” and “Hot words,” functions use frequency data to provide information about possible translations of the searchword. Because tramslated texts are different in their structure from non-translated texts, paralel cocordancers cannot used uncritically. In the workshop we address issues of “translationese,” the direction of translation, and the combined use of parallel and monolingual corpora.

Guy Aston & Lou Burnard

Introducing XAIRA: an XML-aware concordance program

Guy Aston
Lou Burnard

This workshop will introduce participants to the latest version of the XAIRA system developed originally for use with the British National Corpus, but now enhanced as a general purpose and open source cross-platform software architecture.

Participants will learn how this software can take advantage of all the XML markup in the new XML edition of the British National Corpus. They will also learn how to use Xaira with their own corpora. Xaira can operate on a simple collection of plain text files, with no markup at all. It can also operate on a collection of texts with very sophisticated embedded linguistic markup, provided this is expressed in some dialect of XML. Participants will learn how to customize the program for either kind of material.

We will provide a series of exploratory exercises, designed to show off the searching capabilities of the system when used with the BNC. These will include the production of concordances, word lists, collocation lists etc. in the usual way, but with an emphasis on the kind of application for such capabilities likely to be of most use in a language teaching environment. Particular attention will be paid to issues of integration and portability, in order to show how results obtained with Xaira can be integrated into other teaching material. We will also present and discuss strategies for encouraging students' own exploration of corpus resources using the program.

Adam Kilgarriff

Build your own corpus

Adam Kilgarriff

Lexical Computing Ltd
Lexicography MasterClass Ltd
University of Sussex

For a corpus lesson to work in the classroom, it has to be the right corpus. If the topic is football, then the corpus needs to be about football. Moreover, students’ expectations, from using the web, are that all the data should be there, instantly.

In the workshop we shall use a web tool, WebBootCaT, for producing instant corpora, and explore those corpora in a corpus query tool, the Sketch Engine. Students will have the opportunity to build and explore their own corpus, in the language of their choice. (For some languages - English, French, German, Italian, Spanish – there is also the option of part-of-speech-tagging the corpus.)

We shall also consider how much larger corpora – BNC-sized and beyond – can be developed from the web, and the issues of balance, filtering, and text “cleaning” that this presents. We have recently developed billion-plus word corpora for English, German and Italian and we shall describe the process, the issues raised, and the prospects that they open up.

We shall also explore how we can use corpora effectively, using the CQP query language and developing grammars for identifying the common subjects, objects and prepositions for a verb, the common adjectives and verbs for a noun, and so forth. We shall show how “word sketches” (one-page, corpus-based descriptions of a word’s grammatical and collocational behaviour) were developed and will give all participants the opportunity to develop their own.