JALT Testing & Evaluation SIG Newsletter Vol. 2. No. 1. Oct. 1998. (p. 5 - 8)

Do Different C-tests Discriminate Proficiency Levels of EL2 learners? (cont'd.)

Results and discussion

Table 1  Basic descriptive statistics for non-returnees' (NR) scores on the C-test 1, C-test 2 and STEP tests
 
______________________________________________________________________
Test type               N    No. of Items     Mean     Reliability *
______________________________________________________________________

C-test 1                60      100            61      .67     .73
C-test 2                60      120            98      .70     .83
STEP                    60      160            109     .75     .85
______________________________________________________________________

    * Raw score reliabilities (KR 20) appear on the right and reliabilities that would 
      be observed if all the tests contained 100 items appear on the left.


Table 2  Basic descriptive statistics for returnees' (OS) scores on the C-test 1, C-test 2 and STEP tests

___________________________________________________________________
Test type               N       No. of items    Mean     Reliability *
____________________________________________________________________
C-test 1                30      100              74     .65     .76
C-test 2                30      120             109     .71     .89
STEP                    30      160             124     .87     .91
_____________________________________________________________________
 
    * Raw score reliabilities (K-R 20) appear on the left and reliabilities that 
      would be observed if all the tests contained 100 items on the right.

An analysis of means for all tests for the NR indicates the highest means obtained by the STEP, and the lowest mean scores by C-test 2, indicating the former to be the easiest, and the Narration C-test to be the most difficult. An ANOVA was conducted to find the statistical significance in these scores, and the obtained results were: F=176.18 (2, 179), p<.00.

[ p. 6 ]


A similar analysis was conducted on the mean scores obtained by the Returnees for the two types of C-tests and the STEP. The results indicated the highest mean scores for the latter and the lowest means for the first type of C-test. This shows a similar pattern as that observed for the non-returnees group. These differences in scores was checked by an ANOVA and the results were found to be highly significant : F= 56.94 (2,75), p<.00. These data are summarized in Table 3 below.
Table 3   Results of ANOVA analysis for the scores of all subjects on all tests

__________________________________________________________________________
Group                   Source of variance      SS      df      MS      F
__________________________________________________________________________
Non-returnees           Between groups        3581.4     2    1 790   233.3
                        Within group          6677.13   57    76.75
                        Total                10258.53

Returnees               Between groups        3416.5     2  17082.3   88.303
                        Within group          1458.8    27    193.5
                        Total                 4875.3  
_________________________________________________________________________
 p = <.001
A cursory glance at the tables above shows that the returnees group obtained a consistently much higher set of mean scores for both the C-test using different short segments from different texts and the C-test using only one Narration passage. These differences show that the C-test types were much easier for the returnees than for the other group.To further determine the extent to which C-tests of different types can discriminate levels of English proficiency among the Ss, t-tests were conducted between the scores of each group for each.

The results of t-test analyses indicate that C-test 2 using different short texts were easier for the returnees than for the other group of Ss at a significant level: t=.86 df= 29, p=.00. In the same manner, the narration type of C-test proved to be much easier for the returnees than for the non-returnees, and the difference level was found to be highly significant: t= 3.21 df=59, p=.005. The returnees outperformed the Non-returnees in both C-test 1 and C-test 2. From these results, it can be concluded that the two C-test types used in this study can discriminate levels of English proficiency of Japanese university students.

In addition, there is also the question of which of these two C-test types is superior to the other in terms of reliability, and in terms of concurrent validity. To permit comparison among the reliability estimates of the different tests used in this study, 'corrected' reliabilities', the reliabilities that would be observed if all the test types had contained 100 items, were applied to all the cloze tests and STEP test items (Gordon, 1989 and Chapelle, 1990). Higher reliability results were observed for the C-test using several segments than the Narration type for both sample groups.

[ p. 7 ]


Criterion related validity

For the purpose of determining how well C-tests relate to an outside criterion, the scores of the two groups of subjects from the two C-test types were correlated with their scores obtained from the STEP. Moreover, since the raw score correlations are related to reliabilities of tests with different number of items, correlations based on adjusted reliabilities and corrected for attenuation (Jafarpur, 1995) were likewise calculated. These are shown on Table 4 below.
Table 4 Correlations among the C-test types and STEP scores

_________________________________________________________________________
Group                   C-test1 (different texts)    C-test 2 (Narration)
                             and STEP                    and STEP
Returnees                      .58                         .29

NR                             .51                         .26
_________________________________________________________________________

The table shows only moderate correlations, of at least .50, (Klein-Braley, 1984) between C-test 1 and STEP test scores than C-test 2 and STEP scores for both sample population. The superiority of the correlational results between C-test 1 using different short text segments with an outside criterion such as that used in this investigation does not give support to Mochizuki's claim that using a single Narration type of text gives 'highest correlational results compared to other C-test types' (1994). More importantly, the moderate correlations between C-test from various texts against a single criterion suggests that it is possible for C-tests to tap different language abilities of ESL learners (Jafarpur, 1995). Finally, texts carefully chosen according to their similarities in terms of interest and readability level lead to the superiority of a C-test constructed using several short passages over a C-test using only one text.

Summary and conclusion

In the preceeding discussions, the C-tests have been analyzed from various angles in relation to the following points: (1) the ability of the C-testing procedure to discriminate between levels of English proficiency of Japanese university students, (2) the superiority of one of the two C-test types over the other: one using several short segments from different texts, C-test 1, and the other using only one long narration text, C-test 2, and (3) the criterion-related validity of each of these C-test types.

The writer acknowledges the fact that the limited number of samples and tests included in the study was small. It appears quite possible that this alone could account for the variability in the results of statistical analysis. Notwhithstanding, the results of this investigation indicated that C-tests have the ability to differentiate ESL levels of students in Japan. Previously untried material- but controlled in terms of the difficulty level of its segments - as in the case of C-test 1 used in this study, has demonstrated satisfactory reliability estimates. Furthermore, a C-test constructed from different passages can show an acceptable (Klein-Braley, 1984) validity against a reference criterion, even higher than that of a Narration type of C-test. Because of the far-reaching potential of C-tests in the field of empirical research as well as in classroom testing, further research on its nature and its application and effectivess on second language learning is needed.

[ p. 8 ]


References

Bormuth, J. R. (1967). Comparable cloze and multiple-choice comprehension test scores. Journal of Reading 10, 291-299.

Brown, J. D. (1983). A closer look at cloze: validity and reliability. In Oller, J. W. Jr. (edit.) Issues in Language Testing. Rowley, MA: Newbury House, 237-50.

____ (1988). Tailored cloze: improved with classical item analysis and techniques. Language Testing. 5, 19-31.

____ (1993). What are the characteristics of natural cloze tests? Language Testing. 10, 93-116.

Carroll, J.B. (1987). Review of Klein-Braley and Raatz. C-tests in der praxis. Language Testing. 4, 99-106.

Chapelle, A. and Abraham, R. (1990). Cloze Method: what difference does it make? Language Testing. 7, 121-146.

Chapelle, C. (1994). Are C-tests valid measures for L2 vocabulary research? Second Language Research. 10, 157 -187.

Cohen, A.D., Segal, M, and Weiss, R. (1984). The C-tests in Hebrew. Language Testing. 1, 221- 225.

Darnell, D.K. (1970). Clozentropy: a procedure for testing English language proficiency of foreign students. Speech monographs. 37, 36-46.

Dornjei, Z. and Katona, L. (1992). Validation of C-tests among Hungarian EFL learners. Language Testing. 2, 187-206.

Harris, D. & Palmer, L. (n.d.) A Comprehensive English language test for learners of English (CELT). New York: Mc Graw Hill.

Henning, J. (1987). A guide to language testing: development, evaluation, measurement. Cambridge, MA: Newbury House.

Ikeguchi, C. (Unpublished ms.) The four cloze types: to each its own. Tsukuba Women's, University, Japan

Jafarpur, A. (1995). Is C testing superior to Cloze? Language Testing. 12, 194-215.

Jonz, J. (1990). Another turn in the conversation: what does the cloze measure? TESOL Quarterly 24, 61-63.

Kimura, K. & Visgatis, B. (1996). Highschool English Textbooks and College Entrance Examinations. JALT Journal 18, 81-95.

Kimura, Y. (1995). Investigating the English competence of students returned from overseas. in Kitao, K. et al. Culture and communication. Kyoto: Yamaguchi Shoten.

Klare, G.R. (1984). Readability. in P.D. Pearson (Ed.), Handbook of Reading Research (pp. 681-738. New York: Longman.

Klein-Braley, C. (1985). A close-up on the C test: a study in the construct validation of authentic tests. Language Testing. 2, 76-104.

Klein-Braley, C. and Raatz, E. (1984). A Survey on the C test. Language Testing. 1, 134-146.

McBeath, N. (1990). C-tests - some words of caution. English Teaching. Forum, 28, 45-46.

Mochizuki, A. (1994). C-tests: four kinds of texts, their reliability and validity. JALT Journal 16, 41 - 54.

Negishi, M. (1987). The C-test: an integrative measure? IRLT Bulletin 1, 3-26.

Oller, J. W. Jr. (1972). Scoring methods and difficulty levels for cloze tests of proficiency in English as a second language. Modern Language Journal 56, 151-158.

Oller, J. W. Jr. (1983). Issues in Language Testing. Rowley, MA: Newbury House.

Raatz, U. (1985). Better theory for better tests? Language Testing. 2, 60-75.

Raatz, U. and Klein-Braley, C. (1981). The C-test - a modification of the cloze procedure. in Culhane, T., Klein-Braley, C and Stevenson, D.K., editors, Practice and problems in language testing. University of Essex. Paper 26. Colchester: University of Essex.

Taylor, W.L. (1953). Cloze procedure: a new tool for measuring radability. Journalism Quarterly. 30, 414-38.

Tschirner. E. (1996). Rethinking Beginning FL Instruction. Modern Language Journal. 80, 1-13.

- Return to Part 1 of this article -

A copy of the tests used in this study can be obtained from the author.


Newsletter: Topic IndexAuthor IndexTitle IndexDate Index
TEVAL SIG: Main Page Background Links Network Join
www.jalt.org/test/ike_2.htm

HOME PAGE

[ p. 9 ]