ERIC - Search Results

Publication Date

In 2025	2
Since 2024	2
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	8
Since 2006 (last 20 years)	17

Descriptor

Accuracy	17
Evaluation Methods	17
Test Validity	17
Test Reliability	10
Classification	3
Foreign Countries	3
Language Tests	3
Measurement Techniques	3
Models	3
Scores	3
Student Surveys	3
Computation	2
Correlation	2
Data Analysis	2
Educational Practices	2
English (Second Language)	2
Evaluation Criteria	2
Evaluation Research	2
Evaluators	2
Grade 1	2
Higher Education	2
Hypothesis Testing	2
Interrater Reliability	2
Item Response Theory	2
Kindergarten	2
More ▼

Source

Journal of Educational…	2
ProQuest LLC	2
Applied Measurement in…	1
Assessment & Evaluation in…	1
Bill & Melinda Gates…	1
Contemporary School Psychology	1
EBP Briefs (Evidence-based…	1
ETS Research Report Series	1
Educational Evaluation and…	1
Educational Research and…	1
English Language Teaching	1
Grantee Submission	1
Higher Education Studies	1
National Centre for…	1
School Psychology Quarterly	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	12
Reports - Descriptive	3
Dissertations/Theses -…	2
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	5
Postsecondary Education	5
Primary Education	3
Early Childhood Education	2
Elementary Education	2
Elementary Secondary Education	2
Grade 1	2
Kindergarten	2
Grade 7	1
Preschool Education	1

Audience

Policymakers	1
Practitioners	1

Location

Australia	1
Colorado (Denver)	1
Germany	1
North Carolina (Charlotte)	1
Oman	1
Pennsylvania (Pittsburgh)	1
Tennessee (Memphis)	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

A Note on the Use of Categorical Subscores

Peer reviewed

Direct link

Kylie Gorney; Sandip Sinharay – Journal of Educational Measurement, 2025

Although there exists an extensive amount of research on subscores and their properties, limited research has been conducted on categorical subscores and their interpretations. In this paper, we focus on the claim of Feinberg and von Davier that categorical subscores are useful for remediation and instructional purposes. We investigate this claim…

Descriptors: Tests, Scores, Test Interpretation, Alternative Assessment

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Assessing for Developmental Language Disorder in the Context of African American English. EBP Briefs. Volume 16, Issue 2

Peer reviewed

Direct link

Francois, Isabelle; Lapka, Stefanie; Berstein Ratner, Nan; Mills, Monique T. – EBP Briefs (Evidence-based Practice Briefs), 2023

Clinical Question: For young AAE speakers (P), how useful is the Developmental Sentence Scoring (DSS) compared with Index of Productive Syntax (IPSyn) in identifying developmental language disorder (DLD) in the presence of African American English (AAE)? Method: Structured Review. Study Sources: PsycInfo®, Education Source, Education Resources…

Descriptors: Black Dialects, Language Impairments, Developmental Delays, Syntax

Comparison of DIF Methods for the Student Experience in the Research University Survey: A Validity and Methodological Study

Direct link

Thapelo Ncube Whitfield – ProQuest LLC, 2021

Student Experience surveys are used to measure student attitudes towards their campus as well as to initiate conversations for institutional change. Validity evidence to support the interpretations of these surveys' results, however, is lacking. The first purpose of this study was to compare three Differential Item Functioning (DIF) methods on…

Descriptors: College Students, Student Surveys, Student Experience, Student Attitudes

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Variance Difference between Maximum Likelihood Estimation Method and Expected A Posteriori Estimation Method Viewed from Number of Test Items

Peer reviewed
PDF on ERIC

Download full text

Mahmud, Jumailiyah; Sutikno, Muzayanah; Naga, Dali S. – Educational Research and Reviews, 2016

The aim of this study is to determine variance difference between maximum likelihood and expected A posteriori estimation methods viewed from number of test items of aptitude test. The variance presents an accuracy generated by both maximum likelihood and Bayes estimation methods. The test consists of three subtests, each with 40 multiple-choice…

Descriptors: Maximum Likelihood Statistics, Computation, Item Response Theory, Test Items

VET Program Completion Rates: An Evaluation of the Current Method. Occasional Paper

Download full text

National Centre for Vocational Education Research (NCVER), 2016

This work asks one simple question: "how reliable is the method used by the National Centre for Vocational Education Research (NCVER) to estimate projected rates of VET program completion?" In other words, how well do early projections align with actual completion rates some years later? Completion rates are simple to calculate with a…

Descriptors: Vocational Education, Graduation Rate, Predictive Measurement, Predictive Validity

Approaches for Combining Multiple Measures of Teacher Performance: Reliability, Validity, and Implications for Evaluation Policy

Peer reviewed

Direct link

Martínez, José Felipe; Schweig, Jonathan; Goldschmidt, Pete – Educational Evaluation and Policy Analysis, 2016

A key question facing teacher evaluation systems is how to combine multiple measures of complex constructs into composite indicators of performance. We use data from the Measures of Effective Teaching (MET) study to investigate the measurement properties of composite indicators obtained under various conjunctive, disjunctive (or complementary),…

Descriptors: Teacher Evaluation, Outcome Measures, Evaluation Methods, Educational Policy

On the Validity of Educational Evaluation and Its Construction

Peer reviewed
PDF on ERIC

Download full text

Huang, Xiaoping; Hu, Zhongfeng – Higher Education Studies, 2015

The main problem of the educational evaluation validity is that it just copies the conceptual framework system of validity from educational measurement to its own conceptual system. The validity conceptual system that fits the need of theory and practice of educational evaluation has not been established yet. According to the inherent attributive…

Descriptors: Test Validity, Educational Assessment, Evaluation Problems, Theory Practice Relationship

Combining Self-Assessments and Achievement Tests in Information Literacy Assessment: Empirical Results and Recommendations for Practice

Peer reviewed

Direct link

Rosman, Tom; Mayer, Anne-Kathrin; Krampen, Günter – Assessment & Evaluation in Higher Education, 2015

This article examines the significance of information literacy self-assessments in higher education with a special focus on situational conditions increasing their explanatory power. First, it was hypothesised that self-assessments of information literacy correlate higher with factual information literacy if measured after the administration of…

Descriptors: Achievement Tests, Information Literacy, Educational Practices, Higher Education

The Effects of Baseline Estimation on the Reliability, Validity, and Precision of CBM-R Growth Estimates

Peer reviewed

Direct link

Van Norman, Ethan R.; Christ, Theodore J.; Zopluoglu, Cengiz – School Psychology Quarterly, 2013

This study examined the effect of baseline estimation on the quality of trend estimates derived from Curriculum Based Measurement of Oral Reading (CBM-R) progress monitoring data. The authors used a linear mixed effects regression (LMER) model to simulate progress monitoring data for schedules ranging from 6-20 weeks for datasets with high and low…

Descriptors: Curriculum Based Assessment, Oral Reading, Reading Fluency, Regression (Statistics)

Evaluating the Effectiveness of a Reponse to Intervention Framework

Direct link

Collins, Mary Daniel – ProQuest LLC, 2013

The researcher evaluated the RtI framework of kindergarten (n = 686) and first grade (n = 592) students in a rural school district by assessing the accuracy of the universal screening method, monitoring the rate of improvement (ROI) in the tier groups, comparing the number of special education referrals, and determining the success rate of the…

Descriptors: Response to Intervention, Program Effectiveness, Program Evaluation, Kindergarten

Universal Screening in Mathematics for the Primary Grades: Beginnings of a Research Base

Peer reviewed
PDF on ERIC

Download full text

Direct link

Gersten, Russell M.; Clarke, Ben; Jordan, Nancy C.; Newman-Gonchar, Rebecca; Haymond, Kelly; Wilkins, Chuck – Grantee Submission, 2012

This article describes key findings from contemporary research on screening for early primary grade students in the area of mathematics. Existing studies were used to illustrate the constructs most worth measuring and the diverse strategies that researchers used to study potential measures. The authors discussed the strengths and weaknesses of…

Descriptors: Primary Education, Screening Tests, Predictive Validity, Correlation

English Language Assessment in the Colleges of Applied Sciences in Oman: Thematic Document Analysis

Peer reviewed
PDF on ERIC

Download full text

Al Hajri, Fatma – English Language Teaching, 2014

Proficiency in English language and how it is measured have become central issues in higher education research as the English language is increasingly used as a medium of instruction and a criterion for admission to education. This study evaluated the English language assessment in the foundation Programme at the Colleges of Applied sciences in…

Descriptors: Foreign Countries, College Programs, Thematic Approach, Language Proficiency

Utilizing Teacher Ratings of Student Literacy to Identify At-Risk Students: An Analysis of Data from the Early Childhood Longitudinal Study

Peer reviewed

Direct link

Titley, Jonathan E.; D'Amato, Rik Carl; Koehler-Hak, Kathrine M. – Contemporary School Psychology, 2014

The identification of children at-risk for reading problems can be costly and time-consuming. Previous research has indicated that teachers are relatively accurate in assessing children's overall reading ability. This study investigated the accuracy of kindergarten and first grade teacher rating scales in predicting children's reading…

Descriptors: Literacy, Student Evaluation, Achievement Rating, At Risk Students

Previous Page | Next Page »

Pages: 1 | 2

Al Hajri, Fatma	1
Amery D. Wu	1
Bejar, Isaac I.	1
Berstein Ratner, Nan	1
Christ, Theodore J.	1
Clarke, Ben	1
Cohen, Allan	1
Collins, Mary Daniel	1
D'Amato, Rik Carl	1
Francois, Isabelle	1
Gersten, Russell M.	1
Goldschmidt, Pete	1
Haymond, Kelly	1
Hemat, Ramin	1
Hu, Zhongfeng	1
Huang, Xiaoping	1
Jake Stone	1
Jordan, Nancy C.	1
Koehler-Hak, Kathrine M.	1
Krampen, Günter	1
Kylie Gorney	1
Lapka, Stefanie	1
Mahmud, Jumailiyah	1
Martínez, José Felipe	1
Mayer, Anne-Kathrin	1
More ▼