ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	6

Descriptor

Comparative Testing	9
Error of Measurement	9
Test Reliability	9
Test Validity	5
Academic Standards	3
Measurement Techniques	3
Correlation	2
Evaluation Methods	2
Foreign Countries	2
Item Analysis	2
Multiple Choice Tests	2
Psychometrics	2
Scaling	2
Test Construction	2
Test Format	2
Test Results	2
Achievement Tests	1
Behavioral Science Research	1
Best Practices	1
Causal Models	1
Classroom Research	1
Cognitive Ability	1
Construct Validity	1
Content Validity	1
Criterion Referenced Tests	1
More ▼

Source

Advances in Physiology…	1
Educational and Psychological…	1
GED Testing Service	1
Grantee Submission	1
International Journal of…	1
Measurement:…	1
ProQuest LLC	1

Author

Elosua, Paula	1
He, Yi	1
Iliescu, Dragos	1
Jiayi Deng	1
Ke-Hai Yuan	1
Lijuan Wang	1
Macpherson, Colin R.	1
Murchan, Damian P.	1
Ole J. Kemi	1
Rowley, Glenn L.	1
Setzer, J. Carl	1
Stokes, Elizabeth H.	1
Tülin Otbiçer Acar	1
Zhiyong Zhang	1
More ▼

Publication Type

Reports - Research	5
Journal Articles	4
Reports - Evaluative	2
Speeches/Meeting Papers	2
Dissertations/Theses -…	1

Education Level

Elementary Secondary Education	2
Higher Education	2
High Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Researchers

Location

Ireland

Laws, Policies, & Programs

Assessments and Surveys

General Educational…	1
Wechsler Intelligence Scale…	1

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Linking Errors Introduced by Rapid Guessing Responses When Employing Multigroup Concurrent IRT Scaling

Direct link

Jiayi Deng – ProQuest LLC, 2024

Test score comparability in international large-scale assessments (LSA) is of utmost importance in measuring the effectiveness of education systems and understanding the impact of education on economic growth. To effectively compare test scores on an international scale, score linking is widely used to convert raw scores from different linguistic…

Descriptors: Item Response Theory, Scoring Rubrics, Scoring, Error of Measurement

Comparing Measurement Reliability Estimation Techniques: Correlation Coefficient vs. Bland-Altman Plot

Peer reviewed

Direct link

Tülin Otbiçer Acar – Measurement: Interdisciplinary Research and Perspectives, 2024

The aim of this study is to compare the results of correlation coefficient estimation of reliability with those obtained through the Bland-Altman plot technique. The scale was first divided into two halves using three different approaches. A linear and high-level relationship was found between the scale scores obtained from the halved forms.…

Descriptors: High School Students, Measurement Techniques, Psychometrics, Comparative Testing

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

Signal-to-Noise Ratio in Estimating and Testing the Mediation Effect: Structural Equation Modeling versus Path Analysis with Weighted Composites

Peer reviewed

Direct link

Ke-Hai Yuan; Zhiyong Zhang; Lijuan Wang – Grantee Submission, 2024

Mediation analysis plays an important role in understanding causal processes in social and behavioral sciences. While path analysis with composite scores was criticized to yield biased parameter estimates when variables contain measurement errors, recent literature has pointed out that the population values of parameters of latent-variable models…

Descriptors: Structural Equation Models, Path Analysis, Weighted Scores, Comparative Testing

Tests in Europe: Where We Are and Where We Should Go

Peer reviewed

Direct link

Elosua, Paula; Iliescu, Dragos – International Journal of Testing, 2012

Psychometric practice does not always converge with the advances of psychometric theory. In order to investigate this gap, the authors focus on the 10 most used psychological tests in Europe, as identified by recent surveys. The article analyzes test manuals published in 6 different European countries for these 10 most used tests. A total of 32…

Descriptors: Psychological Testing, Personality Measures, Error of Measurement, Foreign Countries

Reliability Analysis for the Internationally Administered 2002 Series GED Tests. GED Testing Service[R] Research Studies, 2009-3

Download full text

Setzer, J. Carl; He, Yi – GED Testing Service, 2009

Reliability Analysis for the Internationally Administered 2002 Series GED (General Educational Development) Tests Reliability refers to the consistency, or stability, of test scores when the authors administer the measurement procedure repeatedly to groups of examinees (American Educational Research Association [AERA], American Psychological…

Descriptors: Educational Research, Error of Measurement, Scores, Test Reliability

A Comparison of WISC and WISC-R Scores of Sixth Grade Students: Implications for Validity.

Peer reviewed

Stokes, Elizabeth H.; And Others – Educational and Psychological Measurement, 1978

The Wechsler Intelligence Scale for Children, and the revised form of that measure, were administered to a sample of sixth grade pupils. Although the correlation between measures was high, scores on the revised form were significantly lower. (JKS)

Descriptors: Comparative Testing, Correlation, Error of Measurement, Grade 6

Essay versus Objective Achievement Testing in the Context of Large-Scale Assessment Programs.

Download full text

Murchan, Damian P. – 1989

The reliability, content validity, and construct validity were compared for two test formats in a public examination used to assess a secondary school geography course. The 11-item geography portion of the Intermediate Certificate Examination (essay examination) was administered in June 1987 to 400 secondary school students in Ireland who also…

Descriptors: Achievement Tests, Comparative Testing, Construct Validity, Content Validity

An Empirical Study of the Properties of Two Estimates of Decision-Consistency Used with Two Types of Teacher-Constructed Classroom Tests.

Macpherson, Colin R.; Rowley, Glenn L. – 1986

Teacher-made mastery tests were administered in a classroom-sized sample to study their decision consistency. Decision-consistency of criterion-referenced tests is usually defined in terms of the proportion of examinees who are classified in the same way after two test administrations. Single-administration estimates of decision consistency were…

Descriptors: Classroom Research, Comparative Testing, Criterion Referenced Tests, Cutting Scores