Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 1 |
Descriptor
Test Length | 6 |
Test Reliability | 6 |
Test Validity | 4 |
Foreign Countries | 3 |
Test Items | 3 |
Item Response Theory | 2 |
Test Construction | 2 |
Test Format | 2 |
Ability | 1 |
Achievement Tests | 1 |
Artificial Intelligence | 1 |
More ▼ |
Author
Burton, Richard F. | 1 |
Chen, Hsueh-Chu | 1 |
Embretson, Susan E. | 1 |
Freedman, Sarah Warshauer | 1 |
Jin Chen | 1 |
Wang, Wen-Chung | 1 |
Xiaodong Li | 1 |
Ying Xu | 1 |
Publication Type
Reports - Descriptive | 6 |
Journal Articles | 4 |
Reports - Research | 1 |
Education Level
Higher Education | 1 |
Junior High Schools | 1 |
Middle Schools | 1 |
Secondary Education | 1 |
Audience
Location
China | 1 |
Taiwan | 1 |
United Kingdom | 1 |
Vermont | 1 |
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of… | 1 |
What Works Clearinghouse Rating
Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025
This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…
Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests
Burton, Richard F. – Assessment and Evaluation in Higher Education, 2005
Examiners seeking guidance on multiple-choice and true/false tests are likely to encounter various faulty or questionable ideas. Twelve of these are discussed in detail, having to do mainly with the effects on test reliability of test length, guessing and scoring method (i.e. number-right scoring or negative marking). Some misunderstandings could…
Descriptors: Guessing (Tests), Multiple Choice Tests, Objective Tests, Test Reliability
Wang, Wen-Chung; Chen, Hsueh-Chu – Educational and Psychological Measurement, 2004
As item response theory (IRT) becomes popular in educational and psychological testing, there is a need of reporting IRT-based effect size measures. In this study, we show how the standardized mean difference can be generalized into such a measure. A disattenuation procedure based on the IRT test reliability is proposed to correct the attenuation…
Descriptors: Test Reliability, Rating Scales, Sample Size, Error of Measurement
Freedman, Sarah Warshauer – 1991
Writing teachers and educators can add to information from large-scale testing and teachers can strengthen classroom assessment by creating a tight fit between large-scale testing and classroom assessment. Across the years, large-scale testing programs have struggled with a difficult problem: how to evaluate student writing reliably and…
Descriptors: Elementary Secondary Education, Foreign Countries, Informal Assessment, Portfolios (Background Materials)
Embretson, Susan E. – Measurement: Interdisciplinary Research and Perspectives, 2004
The last century was marked by dazzling changes in many areas, such as technology and communications. Predictions into the second century of testing are seemingly difficult in such a context. Yet, looking back to the turn of the last century, Kirkpatrick (1900), in his American Psychological Association presidential address, presented fundamental…
Descriptors: Ability, Testing, Futures (of Society), Psychometrics
Arizona Univ., Tucson. Center for Educational Evaluation and Measurement. – 1984
The Head Start Measures Project was a 3-year study to develop a set of measures designed specifically for Head Start children. The measures are based on a path-referenced approach to assessment, in which children's performance is described in terms of their position along paths of development. A path is defined as a sequence of skills within a…
Descriptors: Achievement Tests, Early Childhood Education, General Science, Language Acquisition