ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	9

Descriptor

Comparative Analysis	29
Error of Measurement	29
Test Reliability	29
Mathematical Models	12
Statistical Analysis	12
Item Analysis	8
Test Items	7
Scores	6
Test Validity	6
Correlation	5
Test Interpretation	5
Criterion Referenced Tests	4
Item Response Theory	4
Monte Carlo Methods	4
Norm Referenced Tests	4
Raw Scores	4
Simulation	4
Statistical Significance	4
Test Construction	4
Analysis of Variance	3
Computer Assisted Testing	3
Foreign Countries	3
Goodness of Fit	3
Hypothesis Testing	3
Measurement Techniques	3
More ▼

Source

Journal of Experimental…	2
Psychometrika	2
Applied Measurement in…	1
Applied Psychological…	1
ETS Research Report Series	1
Education and Information…	1
Educational and Psychological…	1
International Journal of…	1
International Journal of…	1
Journal of Educational…	1
Language Testing	1
Measurement and Evaluation in…	1
Measurement in Physical…	1
National Center for Education…	1
Research Quarterly for…	1
More ▼

Publication Type

Reports - Research	22
Journal Articles	15
Speeches/Meeting Papers	4
Reports - Evaluative	3
Guides - Non-Classroom	1
Numerical/Quantitative Data	1
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Kindergarten	1
Secondary Education	1

Audience

Location

Germany	2
New Zealand	1
South Carolina	1

Laws, Policies, & Programs

Assessments and Surveys

Comprehensive Tests of Basic…	1
Early Childhood Longitudinal…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 29 results Save | Export

Measuring Language Ability of Students with Compensatory Multidimensional CAT: A Post-Hoc Simulation Study

Peer reviewed

Direct link

Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022

The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…

Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency

When near Means Related: Evidence from Three Web Survey Experiments on Inter-Item Correlations in Grid Questions

Peer reviewed

Direct link

Silber, Henning; Roßmann, Joss; Gummer, Tobias – International Journal of Social Research Methodology, 2018

In this article, we present the results of three question design experiments on inter-item correlations, which tested a grid design against a single-item design. The first and second experiments examined the inter-item correlations of a set with five and seven items, respectively, and the third experiment examined the impact of the question design…

Descriptors: Foreign Countries, Online Surveys, Experiments, Correlation

Measuring the Development of General Language Skills in English as a Foreign Language--Longitudinal Invariance of the C-Test

Peer reviewed

Direct link

Schnoor, Birger; Hartig, Johannes; Klinger, Thorsten; Naumann, Alexander; Usanova, Irina – Language Testing, 2023

Research on assessing English as a foreign language (EFL) development has been growing recently. However, empirical evidence from longitudinal analyses based on substantial samples is still needed. In such settings, tests for measuring language development must meet high standards of test quality such as validity, reliability, and objectivity, as…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Longitudinal Studies

The Validity and Reliability of the Gymaware Linear Position Transducer for Measuring Counter-Movement Jump Performance in Female Athletes

Peer reviewed

Direct link

O'Donnell, Shannon; Tavares, Francisco; McMaster, Daniel; Chambers, Samuel; Driller, Matthew – Measurement in Physical Education and Exercise Science, 2018

The current study aimed to assess the validity and test-retest reliability of a linear position transducer when compared to a force plate through a counter-movement jump in female participants. Twenty-seven female recreational athletes (19 ± 2 years) performed three counter-movement jumps simultaneously using the linear position transducer and…

Descriptors: Test Validity, Test Reliability, Females, Athletes

Generalizability Theory and Classical Test Theory

Peer reviewed

Direct link

Brennan, Robert L. – Applied Measurement in Education, 2011

Broadly conceived, reliability involves quantifying the consistencies and inconsistencies in observed scores. Generalizability theory, or G theory, is particularly well suited to addressing such matters in that it enables an investigator to quantify and distinguish the sources of inconsistencies in observed scores that arise, or could arise, over…

Descriptors: Generalizability Theory, Test Theory, Test Reliability, Item Response Theory

Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011): User's Manual for the ECLS-K:2011 Kindergarten-Fourth Grade Data File and Electronic Codebook, Public Version. NCES 2018-032

Peer reviewed
PDF on ERIC

Download full text

Tourangeau, Karen; Nord, Christine; Lê, Thanh; Wallner-Allen, Kathleen; Vaden-Kiernan, Nancy; Blaker, Lisa; Najarian, Michelle – National Center for Education Statistics, 2018

This manual provides guidance and documentation for users of the longitudinal kindergarten-fourth grade (K-4) data file of the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011). It mainly provides information specific to the fourth-grade round of data collection. The first chapter provides an overview of the…

Descriptors: Children, Longitudinal Studies, Surveys, Kindergarten

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

Peer reviewed

Direct link

Yao, Lihua – Psychometrika, 2012

Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…

Descriptors: Item Banks, Test Length, Simulation, Adaptive Testing

The Accuracy of Three Approximations for Test Reliability.

Download full text

Kleinke, David J. – 1976

Data from 200 college-level tests were used to compare three reliability approximations (two of Saupe and one of Cureton) to Kuder-Richardson Formula 20 (KR20). While the approximations correlated highly (about .9) with the reliability estimate, they tended to be underapproximations. The explanation lies in an apparent bias of Lord's approximation…

Descriptors: Comparative Analysis, Correlation, Error of Measurement, Statistical Analysis

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

Empirical Estimates of the Comparative Reliability of Matching Tests and Multiple-Choice Tests.

Peer reviewed

Zimmerman, Donald W.; And Others – Journal of Experimental Education, 1984

Three types of test were compared: a completion test, a matching test, and a multiple-choice test. The completion test was more reliable than the matching test, and the matching test was more reliable than the multiple-choice test. (Author/BW)

Descriptors: Comparative Analysis, Error of Measurement, Higher Education, Mathematical Models

Accuracy of Two Procedures for Estimating Reliability of Mastery Tests.

Peer reviewed

Huynh, Huynh; Saunders, Joseph C. – Journal of Educational Measurement, 1980

Single administration (beta-binomial) estimates for the raw agreement index p and the corrected-for-chance kappa index in mastery testing are compared with those based on two test administrations in terms of estimation bias and sampling variability. Bias is about 2.5 percent for p and 10 percent for kappa. (Author/RL)

Descriptors: Comparative Analysis, Error of Measurement, Mastery Tests, Mathematical Models

A Monte Carlo Comparison of Phi and Kappa as Measures of Criterion-Referenced Reliability.

Reid, Jerry B.; Roberts, Dennis M. – 1978

Comparisons of corresponding values of phi and kappa coefficients were made for 270 instances of data generated by a Monte Carlo technique to simulate a test-retest situation. Data were generated for distributions with the same mean but three different levels of standard deviation, standard error of measurement and cutting score. Ten samples of…

Descriptors: Comparative Analysis, Correlation, Criterion Referenced Tests, Cutting Scores

Signal/Noise Ratios for Domain-Referenced Tests

Peer reviewed

Brennan, Robert L.; Kane, Michael T. – Psychometrika, 1977

Using the assumption of randomly parallel tests and concepts from generalizability theory, three signal/noise ratios for domain-referenced tests are developed, discussed, and compared. The three ratios have the same noise but different signals depending upon the kind of decision to be made as a result of measurement. (Author/JKS)

Descriptors: Comparative Analysis, Criterion Referenced Tests, Error of Measurement, Mathematical Models

The Internal and External Optimality of Decisions Based on Tests.

Peer reviewed

Mellenbergh, Gideon J.; van der Linden, Wim J. – Applied Psychological Measurement, 1979

For six tests, coefficient delta as an index for internal optimality is computed. Internal optimality is defined as the magnitude of risk of the decision procedure with respect to the true score. Results are compared with an alternative index (coefficient kappa) for assessing the consistency of decisions. (Author/JKS)

Descriptors: Classification, Comparative Analysis, Decision Making, Error of Measurement

An Empirical Investigation of Lu's Method of Reliability Estimation.

Peer reviewed

Huck, Schuyler W.; And Others – Educational and Psychological Measurement, 1981

Believing that examinee-by-item interaction should be conceptualized as true score variability rather than as a result of errors of measurement, Lu proposed a modification of Hoyt's analysis of variance reliability procedure. Via a computer simulation study, it is shown that Lu's approach does not separate interaction from error. (Author/RL)

Descriptors: Analysis of Variance, Comparative Analysis, Computer Programs, Difficulty Level

Previous Page | Next Page »

Pages: 1 | 2

Bashaw, W. L.	2
Brennan, Robert L.	2
Haladyna, Tom	2
Rentz, R. Robert	2
Ackerman, Terry A.	1
Benson, Jeri	1
Blaker, Lisa	1
Bowes, Neal	1
Chambers, Samuel	1
Cummings, Oliver W.	1
Driller, Matthew	1
Dunivant, Noel	1
Evans, John A.	1
Foster, Jeff L.	1
Fox, Kenneth R.	1
Gelbal, Selahattin	1
Gummer, Tobias	1
Hartig, Johannes	1
Huck, Schuyler W.	1
Huynh, Huynh	1
Kane, Michael T.	1
Kleinke, David J.	1
Klinger, Thorsten	1
Koehly, Laura M.	1
More ▼