ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	11

Descriptor

Test Items	40
Test Length	40
Item Response Theory	17
Adaptive Testing	14
Computer Assisted Testing	12
Test Construction	11
Estimation (Mathematics)	9
Test Format	9
Item Banks	8
Simulation	8
Ability	7
Sample Size	7
Scores	7
Test Reliability	7
Comparative Analysis	6
Computer Simulation	6
Error of Measurement	6
Test Validity	6
Classification	5
Comparative Testing	4
Difficulty Level	4
Goodness of Fit	4
Higher Education	4
Models	4
Nonparametric Statistics	4
More ▼

Source

Journal of Educational…	5
Applied Psychological…	4
Educational and Psychological…	4
Applied Measurement in…	1
Assessment & Evaluation in…	1
Educational Research and…	1
European Journal of Science…	1
IDEA Center, Inc.	1
OECD Publishing (NJ1)	1
Psychological Methods	1
Psychometrika	1
Research in the Schools	1
More ▼

Publication Type

Reports - Evaluative	40
Journal Articles	20
Speeches/Meeting Papers	10
Numerical/Quantitative Data	3
Reports - Research	2
Guides - Non-Classroom	1
Information Analyses	1

Education Level

Elementary Secondary Education	2
Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Location

Asia	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

Armed Forces Qualification…	1
COMPASS (Computer Assisted…	1
Medical College Admission Test	1
Program for International…	1
Raven Advanced Progressive…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 40 results Save | Export

Non-Response Rates to Individual Items on the IDEA Student Ratings of Instruction Forms. IDEA Research Note #5

Download full text

Li, Dan; Benton, Stephen L. – IDEA Center, Inc., 2017

In the study evaluated in this report, the authors asked what effect survey length has on student non-response rates to individual items on IDEA's "Diagnostic Feedback" (DF) and "Learning Essentials" (LE) forms. The approach was to analyze individual student ratings of classes contained in the 2015-2016 IDEA-CL database.…

Descriptors: Response Rates (Questionnaires), Student Surveys, Test Length, Test Items

Profile Analyses as Feedback by Evaluating the Balance in Exam Scores

Peer reviewed
PDF on ERIC

Download full text

Vaheoja, Monika; Verhelst, N. D.; Eggen, T.J.H.M. – European Journal of Science and Mathematics Education, 2019

In this article, the authors applied profile analysis to Maths exam data to demonstrate how different exam forms, differing in difficulty and length, can be reported and easily interpreted. The results were presented for different groups of participants and for different institutions in different Maths domains by evaluating the balance. Some…

Descriptors: Feedback (Response), Foreign Countries, Statistical Analysis, Scores

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

Computerized Classification Testing under the Generalized Graded Unfolding Model

Peer reviewed

Direct link

Wang, Wen-Chung; Liu, Chen-Wei – Educational and Psychological Measurement, 2011

The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree-disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut…

Descriptors: Computer Assisted Testing, Adaptive Testing, Classification, Item Response Theory

Ongoing Issues in Test Fairness

Peer reviewed

Direct link

Camilli, Gregory – Educational Research and Evaluation, 2013

In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…

Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

On the Use of Nonparametric Item Characteristic Curve Estimation Techniques for Checking Parametric Model Fit

Peer reviewed

Direct link

Lee, Young-Sun; Wollack, James A.; Douglas, Jeffrey – Educational and Psychological Measurement, 2009

The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items,…

Descriptors: Nonparametric Statistics, Item Response Theory, Test Items, Simulation

Comparison of Parametric and Nonparametric Bootstrap Methods for Estimating Random Error in Equipercentile Equating

Peer reviewed

Direct link

Cui, Zhongmin; Kolen, Michael J. – Applied Psychological Measurement, 2008

This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…

Descriptors: Test Length, Test Content, Simulation, Computation

Comparing the Similarities and Differences of PISA 2003 and TIMSS. OECD Education Working Papers, No. 32

Direct link

Wu, Margaret – OECD Publishing (NJ1), 2010

This paper makes an in-depth comparison of the PISA (OECD) and TIMSS (IEA) mathematics assessments conducted in 2003. First, a comparison of survey methodologies is presented, followed by an examination of the mathematics frameworks in the two studies. The methodologies and the frameworks in the two studies form the basis for providing…

Descriptors: Mathematics Achievement, Foreign Countries, Gender Differences, Comparative Analysis

The Hierarchy Consistency Index: Evaluating Person Fit for Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009

In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…

Descriptors: Test Length, Simulation, Correlation, Research Methodology

On the Consistency of Individual Classification Using Short Scales

Peer reviewed

Direct link

Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007

Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…

Descriptors: Psychiatry, Patients, Error of Measurement, Test Length

Exploring the Relationship between Item Exposure Rate and Test Overlap Rate in Computerized Adaptive Testing.

Download full text

Chen, Shu-Ying; Ankenmann, Robert D.; Spray, Judith A. – 1999

This paper presents a derivation of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CAT). This relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. Implications for practice as well as future research are also…

Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Test Items

An Investigation into the Possible Speededness of the Medical College Admission Test. MCAT Monograph 3.

PDF pending restoration

Neustel, Sandra – 2001

As a continuing part of its validity studies, the Association of American Medical Colleges commissioned a study of the speediness of the Medical College Admission Test (MCAT). If speed is a hidden part of the test, it is a threat to its construct validity. As a general rule, the criterion used to indicate lack of speediness is that 80% of the…

Descriptors: College Applicants, College Entrance Examinations, Higher Education, Medical Education

An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

Download full text

Ito, Kyoko; Sykes, Robert C. – 2000

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…

Descriptors: Constructed Response, Elementary Education, Essay Tests, Test Construction

The Influence of Multidimensionality on the Graded Response Model.

Peer reviewed

De Ayala, R. J. – Applied Psychological Measurement, 1994

Previous work on the effects of dimensionality on parameter estimation for dichotomous models is extended to the graded response model. Datasets are generated that differ in the number of latent factors as well as their interdimensional association, number of test items, and sample size. (SLD)

Descriptors: Estimation (Mathematics), Item Response Theory, Maximum Likelihood Statistics, Sample Size

Previous Page | Next Page »

Pages: 1 | 2 | 3

Wainer, Howard	4
Chen, Shu-Ying	2
De Ayala, R. J.	2
Kim, Seock-Ho	2
Meijer, Rob R.	2
Pommerich, Mary	2
Spray, Judith A.	2
Wang, Wen-Chung	2
Ankenman, Robert D.	1
Ankenmann, Robert D.	1
Arthur, Winfred, Jr.	1
Benton, Stephen L.	1
Burton, Richard F.	1
Camilli, Gregory	1
Chen, Cheng-Te	1
Clements, Andrea D.	1
Cohen, Allan S.	1
Cui, Ying	1
Cui, Zhongmin	1
Davey, Tim	1
Day, David V.	1
De Champlain, Andre F.	1
Douglas, Jeffrey	1
Eggen, T.J.H.M.	1
More ▼