Both Nordic and international studies have indicated that subject-specific linguistic distinctions rarely are made explicit to students (chandler, 1995; dysthe et al., 2006). To be able to offer students adequate support, we need to be more knowledgeable of specific linguistic demands at different form levels. This calls for large-scale subject-specific empi-rical language studies.
My work describes a quantitative procedure for characterizing the register of Swedish text-books in natural sciences, i.e. from the subject fields of biology, chemistry and physics. The method is corpus-based and uses tools from language technology to automatically produce, what I call, index lists, i.e. lists of salient features of specialized language. The idea of such lists originates from the project to develop an academic word list for Swedish (riBecK et al., 2014). This method, which originally focused purely on lexical items, i.e. words, has been adjusted in order to also extract variables on the phraseological and syntactic levels of language.
In my presentation I will show examples of index lists over vocabulary, nominal phrases and syntactic structures, extracted from a corpus of 5 million words originating from Swedish secondary and upper secondary textbooks in natural science.