ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	1

Descriptor

Test Reliability	12
Sampling	8
Item Sampling	4
Test Construction	4
Test Items	4
Test Validity	4
Testing Problems	4
Estimation (Mathematics)	3
Academic Achievement	2
Computer Simulation	2
Equations (Mathematics)	2
Error of Measurement	2
Evaluation Research	2
Foreign Countries	2
Mathematical Models	2
Measurement Techniques	2
Monte Carlo Methods	2
Psychometrics	2
Research Problems	2
Scores	2
Accountability	1
Achievement Tests	1
Bayesian Statistics	1
Best Practices	1
Change	1
More ▼

Source

Applied Psychological…	1
Brookings Papers on Education…	1
College Student Journal	1
Ethics and Education	1
Evaluation and Research in…	1
Psychometrika	1

Author

Albanese, Mark A.	1
Bourque, Mary Lyn	1
Eiting, Mindert H.	1
Fendler, Lynn	1
Hsiung, Chao A.	1
Kane, Thomas J.	1
Liang, Xin	1
Lin, Miao-Hsiang	1
Linn, Robert	1
Meijer, Rob R.	1
Shavelson, Richard J.	1
Skaggs, Gary	1
Staiger, Douglas O.	1
Taylor, Annette Kujawski	1
de Jong, John H. A. L.	1
More ▼

Publication Type

Reports - Evaluative	12
Journal Articles	6
Speeches/Meeting Papers	3

Education Level

Grade 4	1
Grade 5	1
Higher Education	1
Secondary Education	1

Audience

Location

California	1
Netherlands	1
North Carolina	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Management Admission…	1
National Assessment of…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Ethical Implications of Validity-vs.-Reliability Trade-Offs in Educational Research

Peer reviewed

Direct link

Fendler, Lynn – Ethics and Education, 2016

In educational research that calls itself empirical, the relationship between validity and reliability is that of trade-off: the stronger the bases for validity, the weaker the bases for reliability (and vice versa). Validity and reliability are widely regarded as basic criteria for evaluating research; however, there are ethical implications of…

Descriptors: Educational Research, Ethics, Test Validity, Test Reliability

Violating Conventional Wisdom in Multiple Choice Test Construction

Peer reviewed

Taylor, Annette Kujawski – College Student Journal, 2005

This research examined 2 elements of multiple-choice test construction, balancing the key and optimal number of options. In Experiment 1 the 3 conditions included a balanced key, overrepresentation of a and b responses, and overrepresentation of c and d responses. The results showed that error-patterns were independent of the key, reflecting…

Descriptors: Comparative Analysis, Test Items, Multiple Choice Tests, Test Construction

Conceptualization of Issues in Construct and Content Validity. Studies in Measurement and Methodology, Work Unit No. 1: Conceptual and Design Problems in Competency-Based Measurements.

Linn, Robert – 1978

A series of studies on conceptual and design problems in competency-based measurements are explained. The concept of validity within the context of criterion-referenced measurement is reviewed. The authors believe validation should be viewed as a process rather than an end product. It is the process of marshalling evidence to support…

Descriptors: Criterion Referenced Tests, Item Analysis, Item Sampling, Test Bias

Four Bootstrap Confidence Intervals for the Binomial-Error Model.

Peer reviewed

Lin, Miao-Hsiang; Hsiung, Chao A. – Psychometrika, 1992

Four bootstrap methods are identified for constructing confidence intervals for the binomial-error model. The extent to which similar results are obtained and the theoretical foundation of each method and its relevance and ranges of modeling the true score uncertainty are discussed. (SLD)

Descriptors: Bayesian Statistics, Computer Simulation, Equations (Mathematics), Estimation (Mathematics)

Sequential Reliability Tests.

Peer reviewed

Eiting, Mindert H. – Applied Psychological Measurement, 1991

A method is proposed for sequential evaluation of reliability of psychometric instruments. Sample size is unfixed; a test statistic is computed after each person is sampled and a decision is made in each stage of the sampling process. Results from a series of Monte-Carlo experiments establish the method's efficiency. (SLD)

Descriptors: Computer Simulation, Equations (Mathematics), Estimation (Mathematics), Mathematical Models

An Empirical Examination of a Modified Matrix Sampling Procedure as an Evaluation Tool for Grades 7 to 12 in a Midwestern School District

Peer reviewed

Direct link

Liang, Xin – Evaluation and Research in Education, 2003

Multiple matrix sampling is a data collection technique that ensures accuracy and efficiency in group performance. It has been widely used in large-scale curriculum evaluation since the 1980s. However, the design does not always fully embrace the dynamics of local evaluation demands. The purpose of this study is to introduce a modified matrix…

Descriptors: Curriculum Evaluation, Item Sampling, Matrices, Statistical Studies

Reliability Estimation for Single Dichotomous Items. Research Report 94-5.

Download full text

Meijer, Rob R.; And Others – 1994

Three methods for the estimation of the reliability of single dichotomous items are discussed. All methods are based on the assumptions of nondecreasing and nonintersecting item response functions and the Mokken model of double monotonicity. Based on analytical and Monte Carlo studies, it is concluded that one method is superior to the other two…

Descriptors: Estimation (Mathematics), Foreign Countries, Item Response Theory, Monte Carlo Methods

Overview of the Most Difficult Technical Issues on the VNT.

Download full text

Skaggs, Gary; Bourque, Mary Lyn – 1998

Political and legislative pressures have posed a number of measurement issues and challenges to the development of sound, valid voluntary national tests (VNTs). This paper focuses on what appear to be the most difficult technical issues related to the VNT proposed by President Clinton in 1997. Technical issues refer to psychometric issues, as…

Descriptors: Academic Achievement, Achievement Tests, Classification, Difficulty Level

Volatility in School Test Scores: Implications for Test-Based Accountability Systems

Peer reviewed

Direct link

Kane, Thomas J.; Staiger, Douglas O. – Brookings Papers on Education Policy, 2002

By the spring of 2000, forty states had begun using student test scores to rate school performance. Twenty states have gone a step further and are attaching explicit monetary rewards or sanctions to a school's test performance. In this paper, the authors focus on accountability programs in which states measure the effectiveness of individual…

Descriptors: Elementary Schools, Accountability, Scores, Risk

Some Comments on the Correction for Guessing. A Further Analysis of Angoff and Schrader.

Download full text

Albanese, Mark A. – 1985

This study reexamines results reported by Angoff and Schrader regarding formula directions and rights directions for standardized tests. In that study, it was concluded that the two scoring directions were essentially equivalent. In this study, methodological concerns are discussed and additional data analyses undertaken. Among various…

Descriptors: College Entrance Examinations, Data Interpretation, Fatigue (Biology), Guessing (Tests)

Sampling Variability of Performance Assessments. Report on the Status of Generalizability Performance: Generalizability and Transfer of Performance Assessments. Project 2.4: Design Theory and Psychometrics for Complex Performance Assessment in Science.

Download full text

Shavelson, Richard J.; And Others – 1993

In this paper, performance assessments are cast within a sampling framework. A performance assessment score is viewed as a sample of student performance drawn from a complex universe defined by a combination of all possible tasks, occasions, raters, and measurement methods. Using generalizability theory, the authors present evidence bearing on the…

Descriptors: Academic Achievement, Educational Assessment, Error of Measurement, Evaluators

Testing Foreign Language Listening Comprehension.

Download full text

de Jong, John H. A. L. – 1982

The development and validation of a test of listening comprehension for English as a second language at the Dutch National Institute for Educational Measurement (Cito) is described. The test uses two distinct item formats: true-false items and modified cloze items with two options. Both item formats were found to measure foreign language listening…

Descriptors: Cloze Procedure, English (Second Language), Evaluation Criteria, Foreign Countries