Publication Date
In 2025 | 1 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 18 |
Since 2006 (last 20 years) | 38 |
Descriptor
Scoring | 56 |
Statistical Analysis | 56 |
Test Validity | 32 |
Correlation | 27 |
Test Reliability | 20 |
Validity | 20 |
Scores | 15 |
Test Construction | 15 |
Comparative Analysis | 14 |
Factor Analysis | 11 |
Psychometrics | 11 |
More ▼ |
Source
Author
Liu, Ou Lydia | 2 |
Steedle, Jeffrey T. | 2 |
Zahner, Doris | 2 |
Abdellah, Antar Solhy | 1 |
Alci, Bülent | 1 |
Algina, James | 1 |
Anum Khushal | 1 |
Arnold, Rachel | 1 |
Attali, Yigal | 1 |
Bachman, Lyle F. | 1 |
Bae, Jungok | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 14 |
Elementary Secondary Education | 7 |
Postsecondary Education | 7 |
Secondary Education | 7 |
Elementary Education | 4 |
High Schools | 4 |
Middle Schools | 3 |
Early Childhood Education | 2 |
Grade 3 | 1 |
Grade 5 | 1 |
Grade 7 | 1 |
More ▼ |
Audience
Parents | 1 |
Practitioners | 1 |
Researchers | 1 |
Students | 1 |
Teachers | 1 |
Location
California | 2 |
China | 2 |
Japan | 2 |
Turkey | 2 |
United States | 2 |
District of Columbia | 1 |
Estonia | 1 |
Florida | 1 |
Hawaii | 1 |
Israel | 1 |
Italy | 1 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Lyrica Lucas; Anum Khushal; Robert Mayes; Brian A. Couch; Joseph Dauer – International Journal of Science Education, 2025
Educational reform priorities such as emphasis on quantitative modelling (QM) have positioned undergraduate biology instructors as designers of QM experiences to engage students in authentic science practices that support the development of data-driven and evidence-based reasoning. Yet, little is known about how biology instructors adapt to the…
Descriptors: Undergraduate Students, College Science, Biology, Classroom Observation Techniques
Zeng, Songtian – ProQuest LLC, 2017
Over 30 states have adopted the Early Childhood Environmental Rating Scale-Revised (ECERS-R) as a component of their program quality assessment systems, but the use of ECERS-R on such a large scale has raised important questions about implementation. One of the most pressing question centers upon decisions users must make between two scoring…
Descriptors: Rating Scales, Scoring, Validity, Comparative Analysis
Elicited Imitation as a Measure of Second Language Proficiency: A Narrative Review and Meta-Analysis
Yan, Xun; Maeda, Yukiko; Lv, Jing; Ginther, April – Language Testing, 2016
Elicited imitation (EI) has been widely used to examine second language (L2) proficiency and development and was an especially popular method in the 1970s and early 1980s. However, as the field embraced more communicative approaches to both instruction and assessment, the use of EI diminished, and the construct-related validity of EI scores as a…
Descriptors: Second Language Learning, Language Proficiency, Meta Analysis, Effect Size
Kelleher, Leila K.; Beach, Tyson A. C.; Frost, David M.; Johnson, Andrew M.; Dickey, James P. – Measurement in Physical Education and Exercise Science, 2018
The scoring scheme for the functional movement screen implicitly assumes that the factor structure is consistent, stable, and congruent across different populations. To determine if this is the case, we compared principal components analyses of three samples: a healthy, general population (n = 100), a group of varsity athletes (n = 101), and a…
Descriptors: Factor Structure, Test Reliability, Screening Tests, Motion
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Zimmerman, Whitney Alicia; Kang, Hyun Bin; Kim, Kyung; Gao, Mengzhao; Johnson, Glenn; Clariana, Roy; Zhang, Fan – Journal of Statistics Education, 2018
Over two semesters short essay prompts were developed for use with the Graphical Interface for Knowledge Structure (GIKS), an automated essay scoring system. Participants were students in an undergraduate-level online introductory statistics course. The GIKS compares students' writing samples with an expert's to produce keyword occurrence and…
Descriptors: Undergraduate Students, Introductory Courses, Statistics, Computer Assisted Testing
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Han, Jing; Koenig, Kathleen; Cui, Lili; Fritchman, Joseph; Li, Dan; Sun, Wanyi; Fu, Zhao; Bao, Lei – Physical Review Physics Education Research, 2016
In a recent study, the 30-question Force Concept Inventory (FCI) was theoretically split into two 14-question "half-length" tests (HFCIs) covering the same set of concepts and producing mean scores that can be equated to those of the original FCI. The HFCIs require less administration time and reduce test-retest issues when different…
Descriptors: Physics, Scientific Concepts, Science Instruction, College Science
Demir, Ergul – Eurasian Journal of Educational Research, 2018
Purpose: The answer-copying tendency has the potential to detect suspicious answer patterns for prior distributions of statistical detection techniques. The aim of this study is to develop a valid and reliable measurement tool as a scale in order to observe the tendency of university students' copying of answers. Also, it is aimed to provide…
Descriptors: College Students, Cheating, Test Construction, Student Behavior
Gehsmann, Kristin; Spichtig, Alexandra; Tousley, Elias – Literacy Research: Theory, Method, and Practice, 2017
Assessments of developmental spelling, also called spelling inventories, are commonly used to understand students' orthographic knowledge (i.e., knowledge of how written words work) and to determine their stages of spelling and reading development. The information generated by these assessments is used to inform teachers' grouping practices and…
Descriptors: Spelling, Computer Assisted Testing, Grouping (Instructional Purposes), Teaching Methods
Oliveri, María Elena; von Davier, Alina A. – International Journal of Testing, 2016
In this study, we propose that the unique needs and characteristics of linguistic minorities should be considered throughout the test development process. Unlike most measurement invariance investigations in the assessment of linguistic minorities, which typically are conducted after test administration, we propose strategies that focus on the…
Descriptors: Psychometrics, Linguistics, Test Construction, Testing
Säre, Egle; Luik, Piret; Fisher, Robert – European Early Childhood Education Research Journal, 2016
The purpose of this study was to design an instrument for five- to six-year-old children to help measure their verbal reasoning skills and assess the validity and reliability of the resulting instrument. For this purpose, the researchers have created the Younger Children Verbal Reasoning Test (YCVR-test) and a control instrument, which have been…
Descriptors: Educational Researchers, Verbal Ability, Thinking Skills, Verbal Tests
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Mao, Liyang; Liu, Ou Lydia; Roohr, Katrina; Belur, Vinetha; Mulholland, Matthew; Lee, Hee-Sun; Pallant, Amy – Educational Assessment, 2018
Scientific argumentation is one of the core practices for teachers to implement in science classrooms. We developed a computer-based formative assessment to support students' construction and revision of scientific arguments. The assessment is built upon automated scoring of students' arguments and provides feedback to students and teachers.…
Descriptors: Computer Assisted Testing, Science Tests, Scoring, Automation
Conoyer, Sarah J.; Lembke, Erica S.; Hosp, John L.; Espin, Christine A.; Hosp, Michelle K.; Poch, Apryl L. – Reading & Writing Quarterly, 2017
The present study examined the technical adequacy of maze-selection tasks constructed in 2 different ways: typical versus novel. We selected distractors for each measure systematically based on rules related to the content of the passage and the part of speech of the correct choice. Participants included 262 middle school students who were…
Descriptors: Cloze Procedure, Multiple Choice Tests, Reading Tests, Reading Comprehension