Publication Date
In 2025 | 3 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 15 |
Since 2016 (last 10 years) | 63 |
Since 2006 (last 20 years) | 123 |
Descriptor
Scores | 253 |
Test Reliability | 253 |
Test Validity | 132 |
Testing | 68 |
Computer Assisted Testing | 66 |
Testing Problems | 60 |
Test Construction | 51 |
Standardized Tests | 43 |
Correlation | 42 |
Test Interpretation | 42 |
Achievement Tests | 38 |
More ▼ |
Source
Author
Bennett, Randy Elliot | 3 |
Gallas, Edwin J. | 3 |
Koretz, Daniel | 3 |
Booker, Kevin | 2 |
Bruch, Julie | 2 |
Ferguson, Richard L. | 2 |
Gill, Brian | 2 |
Hambleton, Ronald K. | 2 |
Kapes, Jerome T. | 2 |
Ling, Guangming | 2 |
McNeil, Malcolm R. | 2 |
More ▼ |
Publication Type
Education Level
Location
Australia | 4 |
China | 4 |
United Kingdom | 4 |
Vermont | 4 |
California | 3 |
Canada | 3 |
Germany | 3 |
Israel | 3 |
Turkey | 3 |
United Kingdom (England) | 3 |
United States | 3 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 4 |
Elementary and Secondary… | 1 |
Every Student Succeeds Act… | 1 |
Individuals with Disabilities… | 1 |
No Child Left Behind Act 2001 | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Meets WWC Standards without Reservations | 1 |
Meets WWC Standards with or without Reservations | 1 |
LaFlair, Geoffrey T.; Langenfeld, Thomas; Baig, Basim; Horie, André Kenji; Attali, Yigal; von Davier, Alina A. – Journal of Computer Assisted Learning, 2022
Background: Digital-first assessments leverage the affordances of technology in all elements of the assessment process--from design and development to score reporting and evaluation to create test taker-centric assessments. Objectives: The goal of this paper is to describe the engineering, machine learning, and psychometric processes and…
Descriptors: Computer Assisted Testing, Affordances, Scoring, Engineering
Julie Sriken; Bradley T. Erford; Martin F. Sherman; Kristen Watson; Heather L. Smith – Measurement and Evaluation in Counseling and Development, 2024
Psychometric characteristics of CESD-R scores were explored on a sample of 966 undergraduate students. Internal consistency ([alpha] = 0.92), external convergent and discriminant validity, and response bias were adequate to excellent. Strong measurement invariance was evident for gender and race comparisons, and the unidimensional model fit the…
Descriptors: Symptoms (Individual Disorders), Depression (Psychology), Measures (Individuals), Undergraduate Students
VanDerHeyden, Amanda M.; Codding, Robin; Solomon, Benjamin G. – Remedial and Special Education, 2023
Computer-based curriculum-based measurement (CBM) is a relatively common practice, but surprisingly few studies have examined the reliability of computer-based CBM. This study sought to examine the reliability of CBM administered via paper/pencil versus the computer. Twenty-one of 25 students in two third-grade classes (N = 21) participated in two…
Descriptors: Curriculum Based Assessment, Computer Assisted Testing, Test Format, Grade 3
Jiayi Wang; Michael T. Kalkbrenner; Riley Schaner – Psychology in the Schools, 2025
Teaching is a stressful profession with a high turnover rate. Schools and related institutions need to take more action to support teachers and keep teacher stress at a manageable level. The continued research and practical effort require measures to examine teachers' stress in a briefer and accurate manner. The Teacher Stress Scale is a recently…
Descriptors: Elementary School Teachers, Secondary School Teachers, Preschool Teachers, Stress Variables
Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025
Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…
Descriptors: Tests, Testing, Scores, Test Construction
Senel, Selma; Kutlu, Ömer – European Journal of Special Needs Education, 2018
This paper examines listening comprehension skills of visually impaired students (VIS) using computerised adaptive testing (CAT) and reader-assisted paper-pencil testing (raPPT) and student views about them. Explanatory mixed method design was used in this study. Sample is comprised of 51 VIS, in 7th and 8th grades. 9 of these students were…
Descriptors: Computer Assisted Testing, Adaptive Testing, Visual Impairments, Student Attitudes
Ying Xu; Xiaodong Li; Jin Chen – Language Testing, 2025
This article provides a detailed review of the Computer-based English Listening Speaking Test (CELST) used in Guangdong, China, as part of the National Matriculation English Test (NMET) to assess students' English proficiency. The CELST measures listening and speaking skills as outlined in the "English Curriculum for Senior Middle…
Descriptors: Computer Assisted Testing, English (Second Language), Language Tests, Listening Comprehension Tests
Metsämuuronen, Jari – International Journal of Educational Methodology, 2020
Pearson product-moment correlation coefficient between item g and test score X, known as item-test or item-total correlation ("Rit"), and item-rest correlation ("Rir") are two of the most used classical estimators for item discrimination power (IDP). Both "Rit" and "Rir" underestimate IDP caused by the…
Descriptors: Correlation, Test Items, Scores, Difficulty Level
Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019
Reliability is the consistency of a set of scores that are designed to measure the same thing. Reliability is a statistical property of scores that must be demonstrated rather than assumed.
Descriptors: Scores, Measurement, Test Reliability, Error Patterns
Lenz, A. Stephen; Ault, Haley; Balkin, Richard S.; Barrio Minton, Casey; Erford, Bradley T.; Hays, Danica G.; Kim, Bryan S. K.; Li, Chi – Measurement and Evaluation in Counseling and Development, 2022
In April 2021, The Association for Assessment and Research in Counseling Executive Council commissioned a time-referenced task group to revise the Responsibilities of Users of Standardized Tests (RUST) Statement (3rd edition) published by the Association for Assessment in Counseling (AAC) in 2003. The task group developed a work plan to implement…
Descriptors: Responsibility, Standardized Tests, Counselor Training, Ethics
Isbell, Dan; Winke, Paula – Language Testing, 2019
The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…
Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning
Min, Shangchao; He, Lianzhen; Zhang, Jie – Language Teaching, 2020
This article reviews a selected sample of 70 empirical studies in journal articles and doctoral dissertations on language assessment in China between 2011 and 2018. Following a brief introduction to the history and current state of language assessment in China, the article presents a critical review of language assessment research on six themes…
Descriptors: Language Tests, Test Reliability, Test Validity, Journal Articles
Márió Tibor Nagy; Erzsébet Korom – Journal of Baltic Science Education, 2023
Nowadays, the assessment of student performance has become increasingly technology-based, a trend that can also be observed in the evaluation of scientific reasoning, with more and more of the formerly paper-based assessment tools moving into the digital space. The study aimed to examine the reliability and validity of the paper-based and…
Descriptors: Science Process Skills, Elementary School Students, Grade 4, Science Tests
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Hakelind, Camilla; Sundström, Anna E. – Psychology Learning and Teaching, 2022
Finding valid and reliable ways to assess complex clinical skills within psychology is a challenge. Recently, there have been some examples of applying Objective Structured Clinical Examinations (OSCEs) in psychology for making such assessments. The aim of this study was to examine students' and examiners' perceptions of a digital OSCE in…
Descriptors: Graduate Students, Masters Programs, Clinical Psychology, Student Evaluation