理论语言学及应用语言学中的语料库研究
上QQ阅读APP看本书,新人免费读10天
设备和账号都新为新人

3. Corpora and methodology

Following the classification of knowledge domains proposed by Becher and his associates (Becher,1987,1994; Becher&Trowler,2001), four contrasting disciplines representing each knowledge domain were selected. Becher's classification is selected because the four identified knowledge domains“span the full range of disciplinary specialisations across the spectrum of higher education”(Groom,2007:17). The classification of knowledge domains and the representative discipline selected for each domain is schematically represented in Figure 1.

Figure 1 The schematic representation of knowledge domains and the representative of each domain(adapted from Groom,2007:18)

Using the Elsevier databaseThe authors' university has an institutional subscription of the database, thus avoiding copyright issues., research articles of the four selected disciplines written by English L1 users and published in internationally recognized journals from 2000 to 2014 were collected to compile the corpora for the present study. Meta-information about the corpora is given in Table 1.

Table 1 Meta-information about the four corpora

The compiled corpora were then uploaded to Sketch Engine (Kilgarriff et al.,2004)for processing the data. As discussed in Section 2, the pattern V that is taken as the starting point to search and identify semantic sequences. Using the query script ‘[tag="VV.*"][tag="IN/that"]', we got the quantitative information, including raw and normalized frequency(per million), of the occurrences of this pattern in each discipline, as shown in Table 2.

Table 2 Quantitative information of V that in the four corpora

Table 2 is strongly suggestive that language in use indeed varies according to disciplines. To be specific, V that occurs relatively more frequently in disciplines of social sciences(Linguistics and Education)than in natural sciences (Physics and Mechanics). This is consistent with, for example, Charles'(2006a)finding that reporting clauses are considerably more frequent in social science disciplines than in disciplines of natural sciences, and Gray's(2015)observation of higher use of verb complement clauses in humanities and social sciences but lower use in hard sciences.

Because of space constraints, we only looked at the top 10 lexical items occurring in V that in each discipline. Our choice of starting with the most frequent 10 lexical items occurring in V that, nevertheless, can be justified by the fact that instances of these items account for more than half of the total instances of this pattern in each corpus [e.g. the top 10 lexical items have had 2,641 instances in Linguistics(4,686 in total), approximately accounting for 56.36%; and the top 10 lexical items have had 2,071 instances in Mechanics(3,108 in total), approximately ac counting for 66.63%]. This further indicates that semantic sequences identified by analyzing instances of these items are important indicators of the epistemology of each discipline. These items and their quantitative information are given in Table 3.

Table 3 Top 10 lexical items and their frequency in each subcorpus

It should be noted that it would not be practical to look at all the instances of these items in the four corpora; instead, the data have to be further limited to a manageable size. For the subsequent analyses, only 30 instances of each item in each discipline were randomly sampled, ideally totalling 300 instances in each corpus. However, a few instances were excluded that did not instantiate the pattern V that; for example, NOTE in Note that biological and adoptive parents were collapsed ..., and SHOW in Second, by showing that the stress-position variable(final-stress vs. penultimate-stress)has the same effects on alignment in Experiments 2 and 3. Additionally, since this is not a study which aims to capture comprehensively the disciplinary cultures associated with the disciplines under examination, the fact does not matter that the restriction of the investigation to the pattern V that indicates inexhaustive identification of semantic sequences which may be suggestive of disciplinary cultures of these disciplines. The current investigation should be helpful to demonstrate how semantic sequences can be identified and to show that semantic sequences can be used to investigate and characterize disciplinarity.