ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	2

Descriptor

Item Analysis	12
Measurement Techniques	12
Testing Problems	12
Test Items	5
Evaluation Methods	4
Latent Trait Theory	4
Test Reliability	4
Scoring	3
Test Construction	3
Test Validity	3
Achievement Rating	2
Achievement Tests	2
Classification	2
Criterion Referenced Tests	2
Difficulty Level	2
Elementary Secondary Education	2
Equated Scores	2
Interrater Reliability	2
Judges	2
Mathematical Models	2
Measurement Objectives	2
Response Style (Tests)	2
Scores	2
Test Theory	2
Tests	2
More ▼

Source

Assessment in Education:…	1
Educational Research Quarterly	1
Educational Technology	1
Educational and Psychological…	1
Evaluation in Education:…	1
Journal of Educational…	1
Journal of Educational and…	1

Publication Type

Reports - Research	6
Journal Articles	4
Speeches/Meeting Papers	3
Reports - Descriptive	2
Guides - Non-Classroom	1
Information Analyses	1

Education Level

Secondary Education

Audience

Researchers

Location

Netherlands

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Basic Skills

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Screening Test Items for Differential Item Functioning

Peer reviewed

Direct link

Longford, Nicholas T. – Journal of Educational and Behavioral Statistics, 2014

A method for medical screening is adapted to differential item functioning (DIF). Its essential elements are explicit declarations of the level of DIF that is acceptable and of the loss function that quantifies the consequences of the two kinds of inappropriate classification of an item. Instead of a single level and a single function, sets of…

Descriptors: Test Items, Test Bias, Simulation, Hypothesis Testing

Educational Measurement Issues and Implications of High Stakes Decision Making in Final Examinations in Secondary Education in the Netherlands

Peer reviewed

Direct link

van Rijn, P. W.; Beguin, A. A.; Verstralen, H. H. F. M. – Assessment in Education: Principles, Policy & Practice, 2012

While measurement precision is relatively easy to establish for single tests and assessments, it is much more difficult to determine for decision making with multiple tests on different subjects. This latter is the situation in the system of final examinations for secondary education in the Netherlands and is used as an example in this paper. This…

Descriptors: Secondary Education, Tests, Foreign Countries, Decision Making

Delphi Methodology: An Empirical Investigation

Peer reviewed

Barnette, J. Jackson; And Others – Educational Research Quarterly, 1978

The DELPHI procedure requires respondents to reply to several questionnaire iterations with subsequent rounds containing previous round feedback. This study investigated the methodology (response rates, effects of feedback) and offered evidence that large-scale DELPHI surveys are not as advantageous as has previously been indicated. Suggestions…

Descriptors: Feedback, Item Analysis, Measurement Techniques, Predictive Measurement

Vertical Equating Using the Rasch Model.

Peer reviewed

Loyd, Brenda H.; Hoover, H. D. – Journal of Educational Measurement, 1980

Three levels of a mathematics computation test were equated using the Rasch model. Sixth, seventh, and eighth graders were administered different levels of the test. Lack of consistency among equatings suggested that the Rasch model did not produce a satisfactory vertical equating of this computation test. (Author/RD)

Descriptors: Ability Grouping, Achievement Tests, Elementary Education, Equated Scores

Constructing Higher Level Multiple Choice Questions Covering Factual Content

Miller, Harry G.; Williams, Reed G. – Educational Technology, 1973

Descriptors: Content Analysis, Item Analysis, Measurement Techniques, Multiple Choice Tests

A Theoretical Study of the Measurement Effectiveness of Flexilevel Tests

Peer reviewed

Lord, Frederic M. – Educational and Psychological Measurement, 1971

A number of empirical studies are suggested to answer certain questions in connection with flexilevel tests. (MS)

Descriptors: Comparative Analysis, Difficulty Level, Guessing (Tests), Item Analysis

Latent Trait Theory in the Affective Domain--Applications of the Rasch Model.

Curry, Allen R.; Riegel, N. Blyth – 1978

The Rasch model of test theory is described in general terms, compared with latent trait theory, and shown to have interesting applications for the measurement of affective as well as cognitive traits. Three assumption of the Rasch model are stated to support the conclusion that calibration of the items and tests is independent of the examinee…

Descriptors: Affective Measures, Goodness of Fit, Item Analysis, Latent Trait Theory

Criterion-Referenced Measurement: Its Main Applications, Problems and Findings.

van der Linden, Wim J. – Evaluation in Education: International Progress, 1982

Instructional programs organized according to modern educational technology are discussed within the purposes of criterion-referenced measurements used. The problems of criterion-referenced measurements include scoring and score interpretation, item and test analysis, and mastery testing. An overview of solutions and approaches to the problems and…

Descriptors: Criterion Referenced Tests, Educational Testing, Evaluation Methods, Item Analysis

Inter-Judge Reliability: Is Complete Agreement among Judges the Ideal?

Constable, Elizabeth; Andrich, David – 1984

In circumstances where judges are required to make ratings of performance, it is usually required to have two or more raters who are trained to agree on independent ratings of the same performance. It is suggested that such a requirement may produce the paradox of attenuation associated with item analysis, in which too high a correlation between…

Descriptors: Elementary Secondary Education, Evaluation Methods, Interrater Reliability, Interviews

Experimental Item Types to Measure Judgment.

Northrop, Lois C. – 1977

The Professional and Administrative Career Examination measures judgment ability using comprehension and logical order of events item types. Since the verbal component of comprehension items is high, and since these items are extremely difficult to write and document, a search was conducted for a simpler item type for assessing judgment. Three…

Descriptors: Abstract Reasoning, Administrators, Comprehension, Decision Making Skills

The Use and Effect of Caution Indices in Detecting Aberrant Patterns of Standard-Setting Recommendations.

Jaeger, Richard M.; Busch, John Christian – 1986

This study explores the use of the modified caution index (MCI) for identifying judges whose patterns of recommendations suggest that their judgments might be based on incomplete information, flawed reasoning, or inattention to their standard-setting tasks. It also examines the effect on test standards and passing rates when the test standards of…

Descriptors: Criterion Referenced Tests, Error of Measurement, Evaluation Methods, High Schools

Riding the Rasch Tiger. Part 2: Implications for District Testing Programs (Where the Rubber Meets the Road).

Ingebo, George S. – 1987

Greater knowledge about the practical application of Rasch technology can help in avoiding misapplications and confusions in testing programs. Equal interval curriculum-based scaling makes possible the following improvements in measuring basic skills achievement by enabling testing programs to: (1) individualize the difficulty level of…

Descriptors: Achievement Rating, Achievement Tests, Basic Skills, Difficulty Level

Andrich, David	1
Barnette, J. Jackson	1
Beguin, A. A.	1
Busch, John Christian	1
Constable, Elizabeth	1
Curry, Allen R.	1
Hoover, H. D.	1
Ingebo, George S.	1
Jaeger, Richard M.	1
Longford, Nicholas T.	1
Lord, Frederic M.	1
Loyd, Brenda H.	1
Miller, Harry G.	1
Northrop, Lois C.	1
Riegel, N. Blyth	1
Verstralen, H. H. F. M.	1
Williams, Reed G.	1
van Rijn, P. W.	1
van der Linden, Wim J.	1
More ▼