ERIC - Search Results

Descriptor

Comparative Analysis	11
Mathematical Models	11
Test Length	11
Criterion Referenced Tests	5
Cutting Scores	5
Item Response Theory	4
Mastery Tests	4
Bayesian Statistics	3
Decision Making	3
Equations (Mathematics)	3
Higher Education	3
Latent Trait Theory	3
Sample Size	3
Test Interpretation	3
Test Reliability	3
Achievement Tests	2
Adaptive Testing	2
Classification	2
Computer Assisted Testing	2
Computer Simulation	2
Elementary Secondary Education	2
Equated Scores	2
Estimation (Mathematics)	2
Maximum Likelihood Statistics	2
Monte Carlo Methods	2
More ▼

Source

Applied Psychological…	1
Journal of Educational…	1

Author

Kim, Seock-Ho	2
Allen, Nancy L.	1
Cohen, Allan S.	1
Donoghue, John R.	1
Eignor, Daniel R.	1
Frick, Theodore W.	1
Gross, Susan K.	1
Haladyna, Tom	1
Hambleton, Ronald K.	1
Hutten, Leah R.	1
McKinley, Robert L.	1
Noonan, Brian W.	1
Reckase, Mark D.	1
Roid, Gale	1
Schaefer, Mary M.	1
Steinheiser, Frederick H., Jr.	1
More ▼

Publication Type

Speeches/Meeting Papers	8
Reports - Research	6
Reports - Evaluative	4
Journal Articles	2
Information Analyses	1
Tests/Questionnaires	1

Education Level

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

School and College Ability…

What Works Clearinghouse Rating

Showing all 11 results Save | Export

The Effect of Test Length and IRT Model on the Distribution and Stability of Three Appropriateness Indexes.

Peer reviewed

Noonan, Brian W.; And Others – Applied Psychological Measurement, 1992

Studied the extent to which three appropriateness indexes, Z(sub 3), ECIZ4, and W, are well standardized in a Monte Carlo study. The ECIZ4 most closely approximated a normal distribution, and its skewness and kurtosis were more stable and less affected by test length and item response theory model than the others. (SLD)

Descriptors: Comparative Analysis, Item Response Theory, Mathematical Models, Maximum Likelihood Statistics

A Comparison of Equating Methods under the Graded Response Model.

Download full text

Cohen, Allan S.; Kim, Seock-Ho – 1993

Equating tests from different calibrations under item response theory (IRT) requires calculation of the slope and intercept of the appropriate linear transformation. Two methods have been proposed recently for equating graded response items under IRT, a test characteristic curve method and a minimum chi-square method. These two methods are…

Descriptors: Chi Square, Comparative Analysis, Computer Simulation, Equated Scores

An Investigation of Hierarchical Bayes Procedures in Item Response Theory.

Download full text

Kim, Seock-Ho; And Others – 1992

Hierarchical Bayes procedures were compared for estimating item and ability parameters in item response theory. Simulated data sets from the two-parameter logistic model were analyzed using three different hierarchical Bayes procedures: (1) the joint Bayesian with known hyperparameters (JB1); (2) the joint Bayesian with information hyperpriors…

Descriptors: Ability, Bayesian Statistics, Comparative Analysis, Equations (Mathematics)

Thin versus Thick Matching in the Mantel-Haenszel Procedure for Detecting DIF.

Peer reviewed

Donoghue, John R.; Allen, Nancy L. – Journal of Educational Statistics, 1993

Forming the matching variable for the Mantel-Haenszel differential item functioning (DIF) procedure through use of the total score as the matching variable (thin) and forming the matching variable by pooling total score levels (thick) were compared in a Monte Carlo study. Reasons thick matching is superior are discussed. (SLD)

Descriptors: Comparative Analysis, Computer Simulation, Equations (Mathematics), Graphs

Effects of Test Length and Advancement Score on Several Criterion-Referenced Test Reliability and Validity Indices. Laboratory of Psychometric and Evaluation Research Report No. 86.

Download full text

Eignor, Daniel R.; Hambleton, Ronald K. – 1979

The purpose of the investigation was to obtain some relationships among (1) test lengths, (2) shape of domain-score distributions, (3) advancement scores, and (4) several criterion-referenced test score reliability and validity indices. The study was conducted using computer simulation methods. The values of variables under study were set to be…

Descriptors: Comparative Analysis, Computer Assisted Testing, Criterion Referenced Tests, Cutting Scores

A Comparison of Reliability Estimates from Single and Double Administrations of Criterion-Referenced Tests.

Schaefer, Mary M.; Gross, Susan K. – 1983

Viewing the reliability for criterion-referenced tests as that of mastery classification decisions, three models for determining reliability were examined using two test administrations so that two estimates could be compared to a standard. A major purpose of the research was to determine how several reliability coefficients (coefficient kappa, an…

Descriptors: Comparative Analysis, Correlation, Criterion Referenced Tests, Cutting Scores

A Comparison of a Bayesian and a Maximum Likelihood Tailored Testing Procedure.

Download full text

McKinley, Robert L.; Reckase, Mark D. – 1981

A study was conducted to compare tailored testing procedures based on a Bayesian ability estimation technique and on a maximum likelihood ability estimation technique. The Bayesian tailored testing procedure selected items so as to minimize the posterior variance of the ability estimate distribution, while the maximum likelihood tailored testing…

Descriptors: Academic Ability, Adaptive Testing, Bayesian Statistics, Comparative Analysis

Criterion-Referenced Testing: A Critical Analysis of Selected Models. Technical Paper 306. Final Report

Download full text

Steinheiser, Frederick H., Jr.; And Others – 1978

Alternative mathematical models for scoring and decision making with criterion referenced tests are described, especially as they concern appropriate test length and methods of establishing statistically valid cutting scores. Several of these approaches are reviewed and compared on formal-analytic and empirical grounds: (1) Block's approach to…

Descriptors: Comparative Analysis, Criterion Referenced Tests, Cutting Scores, Decision Making

A Comparison of Decision-Making Methods for Criterion-Referenced Tests.

Haladyna, Tom; Roid, Gale – 1980

The problems associated with misclassifying students when pass-fail decisions are based on test scores are discussed. One protection against misclassification is to set a confidence interval around the cutting score. Those whose scores fall above the interval are passed; those whose scores fall below the interval are failed; and those whose scores…

Descriptors: Bayesian Statistics, Classification, Comparative Analysis, Criterion Referenced Tests

A Comparison of the Fit of Empirical Data to Two Latent Trait Models. Report No. 92.

Hutten, Leah R. – 1979

Goodness of fit of raw test score data were compared, using two latent trait models: the Rasch model and the Birnbaum three-parameter logistic model. Data were taken from various achievement tests and the Scholastic Aptitude Test (Verbal). A minimum sample size of 1,000 was required, and the minimum test length was 40 items. Results indicated that…

Descriptors: Ability Identification, Achievement Tests, College Entrance Examinations, Comparative Analysis

An Investigation of the Validity of the Sequential Probability Ratio Test for Mastery Decisions in Criterion-Referenced Testing.

Download full text

Frick, Theodore W. – 1986

The sequential probability ratio test (SPRT), developed by Abraham Wald, is one statistical model available for making mastery decisions during computer-based criterion referenced tests. The predictive validity of the SPRT was empirically investigated with two different and relatively large item pools with heterogeneous item parameters. Graduate…

Descriptors: Achievement Tests, Adaptive Testing, Classification, Comparative Analysis