ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	1
Since 2006 (last 20 years)	2

Descriptor

Evaluation Methods	5
Program Effectiveness	5
Test Items	2
Achievement Tests	1
Basic Skills	1
Classification	1
College Entrance Examinations	1
Comparative Analysis	1
Computer Assisted Testing	1
Content Analysis	1
Data	1
Data Analysis	1
Data Collection	1
Educational Assessment	1
Educational Testing	1
Evaluation Criteria	1
Evaluation Problems	1
Evaluation Research	1
Expertise	1
Formative Evaluation	1
Group Testing	1
Health Personnel	1
Instruction	1
Item Response Theory	1
Licensing Examinations…	1
More ▼

Source

Journal of Educational…

Author

Baldwin, Su G.	1
Bengs, Daniel	1
Brefeld, Ulf	1
Clauser, Brian E.	1
Dillon, Gerard F.	1
Finch, Holmes	1
Habing, Brian	1
Kroehne, Ulf	1
Marco, Gary L.	1
Margolis, Melissa J.	1
Mee, Janet	1
Schmidt, William H.	1
More ▼

Publication Type

Journal Articles	4
Reports - Research	2
Reports - Evaluative	1
Reports - General	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)

What Works Clearinghouse Rating

Showing all 5 results Save | Export

Simultaneous Constrained Adaptive Item Selection for Group-Based Testing

Peer reviewed

Direct link

Bengs, Daniel; Kroehne, Ulf; Brefeld, Ulf – Journal of Educational Measurement, 2021

By tailoring test forms to the test-taker's proficiency, Computerized Adaptive Testing (CAT) enables substantial increases in testing efficiency over fixed forms testing. When used for formative assessment, the alignment of task difficulty with proficiency increases the chance that teachers can derive useful feedback from assessment data. The…

Descriptors: Computer Assisted Testing, Formative Evaluation, Group Testing, Program Effectiveness

Judges' Use of Examinee Performance Data in an Angoff Standard-Setting Exercise for a Medical Licensing Examination: An Experimental Study

Peer reviewed

Direct link

Clauser, Brian E.; Mee, Janet; Baldwin, Su G.; Margolis, Melissa J.; Dillon, Gerard F. – Journal of Educational Measurement, 2009

Although the Angoff procedure is among the most widely used standard setting procedures for tests comprising multiple-choice items, research has shown that subject matter experts have considerable difficulty accurately making the required judgments in the absence of examinee performance data. Some authors have viewed the need to provide…

Descriptors: Standard Setting (Scoring), Program Effectiveness, Expertise, Health Personnel

A Classification Scheme for Methods of Using Student Data to Assess School Effectiveness

Peer reviewed

Marco, Gary L.; And Others – Journal of Educational Measurement, 1976

Special emphasis is given to the kinds of control that can be exercised over initial status, including the use of proxy input data. A rationale for the classification scheme is developed, based on (1) three one-shot, one cross-sectional, and two longitudinal data types and (2) two types of referencing: criterion referencing and norm referencing.…

Descriptors: Classification, Data Collection, Evaluation Methods, Methods

Content Biases in Achievement Tests.

Peer reviewed

Schmidt, William H. – Journal of Educational Measurement, 1983

A conception of invalidity as bias is related to content validity for standardized achievement tests. A method of estimating content bias for each of three content domains (a priori, curricular, and instructional) based on the specification of a content taxonomy is also proposed. (Author/CM)

Descriptors: Achievement Tests, Content Analysis, Evaluation Methods, Instruction

Comparison of NOHARM and DETECT in Item Cluster Recovery: Counting Dimensions and Allocating Items

Peer reviewed

Direct link

Finch, Holmes; Habing, Brian – Journal of Educational Measurement, 2005

This study examines the performance of a new method for assessing and characterizing dimensionality in test data using the NOHARM model, and comparing it with DETECT. Dimensionality assessment is carried out using two goodness-of-fit statistics that are compared to reference X[2] distributions. A Monte Carlo study is used with item parameters…

Descriptors: Program Effectiveness, Monte Carlo Methods, Item Response Theory, Comparative Analysis