ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	7
Since 2006 (last 20 years)	9

Descriptor

Testing Problems	25
Elementary Secondary Education	7
Test Construction	6
Higher Education	5
Item Response Theory	5
Multiple Choice Tests	5
Scores	5
Test Items	5
College Entrance Examinations	4
Educational Assessment	4
Psychometrics	4
Standardized Tests	4
Test Format	4
Achievement Tests	3
Comparative Analysis	3
Difficulty Level	3
Equated Scores	3
Evaluation Methods	3
Item Analysis	3
Licensing Examinations…	3
Sample Size	3
Scoring	3
State Programs	3
Test Bias	3
Test Validity	3
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	25
Reports - Research	15
Reports - Evaluative	9
Speeches/Meeting Papers	4
Information Analyses	2
Reports - Descriptive	1

Education Level

Higher Education	4
Postsecondary Education	4
Elementary Secondary Education	1

Audience

Location

Florida

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	2
Armed Services Vocational…	1
Wechsler Intelligence Scale…	1
Woodcock Johnson Tests of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 25 results Save | Export

Are Multiple-Choice Items Too Fat?

Peer reviewed

Direct link

Haladyna, Thomas M.; Rodriguez, Michael C.; Stevens, Craig – Applied Measurement in Education, 2019

The evidence is mounting regarding the guidance to employ more three-option multiple-choice items. From theoretical analyses, empirical results, and practical considerations, such items are of equal or higher quality than four- or five-option items, and more items can be administered to improve content coverage. This study looks at 58 tests,…

Descriptors: Multiple Choice Tests, Test Items, Testing Problems, Guessing (Tests)

Personalized Online Learning, Test Fairness, and Educational Measurement: Considering Differential Content Exposure Prior to a High Stakes End of Course Exam

Peer reviewed

Direct link

Daniel Katz; Anne Corinne Huggins-Manley; Walter Leite – Applied Measurement in Education, 2022

According to the "Standards for Educational and Psychological Testing" (2014), one aspect of test fairness concerns examinees having comparable opportunities to learn prior to taking tests. Meanwhile, many researchers are developing platforms enhanced by artificial intelligence (AI) that can personalize curriculum to individual student…

Descriptors: High Stakes Tests, Test Bias, Testing Problems, Prior Learning

Rasch Model Extensions for Enhanced Formative Assessments in MOOCs

Peer reviewed

Direct link

Abbakumov, Dmitry; Desmet, Piet; Van den Noortgate, Wim – Applied Measurement in Education, 2020

Formative assessments are an important component of massive open online courses (MOOCs), online courses with open access and unlimited student participation. Accurate conclusions on students' proficiency via formative, however, face several challenges: (a) students are typically allowed to make several attempts; and (b) student performance might…

Descriptors: Item Response Theory, Formative Evaluation, Online Courses, Response Style (Tests)

Challenges to the Cattell-Horn-Carroll Theory: Empirical, Clinical, and Policy Implications

Peer reviewed

Direct link

Canivez, Gary L.; Youngstrom, Eric A. – Applied Measurement in Education, 2019

The Cattell-Horn-Carroll (CHC) taxonomy of cognitive abilities married John Horn and Raymond Cattell's Extended Gf-Gc theory with John Carroll's Three-Stratum Theory. While there are some similarities in arrangements or classifications of tasks (observed variables) within similar broad or narrow dimensions, other salient theoretical features and…

Descriptors: Taxonomy, Cognitive Ability, Intelligence, Cognitive Tests

Investigating Repeater Effects on Small Sample Equating: Include or Exclude?

Peer reviewed

Direct link

Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020

Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…

Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Are the Nonparametric Person-Fit Statistics More Powerful than Their Parametric Counterparts? Revisiting the Simulations in Karabatsos (2003)

Peer reviewed

Direct link

Sinharay, Sandip – Applied Measurement in Education, 2017

Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…

Descriptors: Nonparametric Statistics, Goodness of Fit, Simulation, Comparative Analysis

A New Procedure for Detection of Students' Rapid Guessing Responses Using Response Time

Peer reviewed

Direct link

Guo, Hongwen; Rios, Joseph A.; Haberman, Shelby; Liu, Ou Lydia; Wang, Jing; Paek, Insu – Applied Measurement in Education, 2016

Unmotivated test takers using rapid guessing in item responses can affect validity studies and teacher and institution performance evaluation negatively, making it critical to identify these test takers. The authors propose a new nonparametric method for finding response-time thresholds for flagging item responses that result from rapid-guessing…

Descriptors: Guessing (Tests), Reaction Time, Nonparametric Statistics, Models

Practical Issues in Large-Scale Computerized Adaptive Testing.

Peer reviewed

Mills, Craig N.; Stocking, Martha L. – Applied Measurement in Education, 1996

Issues that must be addressed in the large-scale application of computerized adaptive testing are explored, including considerations of test design, scoring, test administration, item and item bank development, and other aspects of test construction. Possible solutions and areas in which additional work is needed are identified. (SLD)

Descriptors: Adaptive Testing, Computer Assisted Testing, Elementary Secondary Education, Higher Education

Critical Problems in Computer-Based Psychological Measurement.

Peer reviewed

Green, Bert F. – Applied Measurement in Education, 1988

Emerging areas and critical problems related to computer-based testing are identified. Topics covered include adaptive testing; calibration; item selection; multidimensional items; uses of information processing theory; relation to cognitive psychology; and tests of short-term and spatial memory, perceptual speed and accuracy, and movement…

Descriptors: Cognitive Tests, Computer Assisted Testing, Content Validity, Information Processing

Sources of Uncertainty Often Ignored in Adjusting State Mean SAT Scores for Differential Participation Rates: The Rules of the Game.

Peer reviewed

Holland, Paul W.; Wainer, Howard – Applied Measurement in Education, 1990

Two attempts to adjust state mean Scholastic Aptitude Test (SAT) scores for differential participation rates are examined. Both attempts are rejected, and five rules for performing adjustments are outlined to foster follow-up checks on untested assumptions. National Assessment of Educational Progress state data are determined to be more accurate.…

Descriptors: College Applicants, College Entrance Examinations, Estimation (Mathematics), Item Bias

Testing for Differences in Test Score Distributions Using Loglinear Models.

Peer reviewed

Hanson, Bradley A. – Applied Measurement in Education, 1996

Determining whether score distributions differ on two or more test forms administered to samples of examinees from a single population is explored using three statistical tests using loglinear models. Examples are presented of applying tests of distribution differences to decide if equating is needed for alternative forms of a test. (SLD)

Descriptors: Equated Scores, Scoring, Statistical Distributions, Test Format

The Effectiveness of Several Multiple-Choice Formats.

Peer reviewed

Haladyna, Thomas A. – Applied Measurement in Education, 1992

Several multiple-choice item formats are examined in the current climate of test reform. The reform movement is discussed as it affects use of the following formats: (1) complex multiple-choice; (2) alternate choice; (3) true-false; (4) multiple true-false; and (5) the context dependent item set. (SLD)

Descriptors: Cognitive Psychology, Comparative Testing, Context Effect, Educational Change

Simultaneous Use of Multiple Answer Copying Indexes to Improve Detection Rates

Peer reviewed

Direct link

Wollack, James A. – Applied Measurement in Education, 2006

Many of the currently available statistical indexes to detect answer copying lack sufficient power at small [alpha] levels or when the amount of copying is relatively small. Furthermore, there is no one index that is uniformly best. Depending on the type or amount of copying, certain indexes are better than others. The purpose of this article was…

Descriptors: Statistical Analysis, Item Analysis, Test Length, Sample Size

Reliability Estimation When a Test Is Split into Two Parts of Unknown Effective Length.

Peer reviewed

Feldt, Leonard S. – Applied Measurement in Education, 2002

Considers the situation in which content or administrative considerations limit the way in which a test can be partitioned to estimate the internal consistency reliability of the total test score. Demonstrates that a single-valued estimate of the total score reliability is possible only if an assumption is made about the comparative size of the…

Descriptors: Error of Measurement, Reliability, Scores, Test Construction

Previous Page | Next Page »

Pages: 1 | 2

Abbakumov, Dmitry	1
Anne Corinne Huggins-Manley	1
Aschbacher, Pamela R.	1
Bassett, James	1
Canivez, Gary L.	1
Daniel Katz	1
Davison, Mark L.	1
Desmet, Piet	1
Diao, Hongyu	1
Dunham, Trudy C.	1
Feldt, Leonard S.	1
Frary, Robert B.	1
Geisinger, Kurt F.	1
Green, Bert F.	1
Guo, Hongwen	1
Haberman, Shelby	1
Haladyna, Thomas A.	1
Haladyna, Thomas M.	1
Hambleton, Ronald K.	1
Hanson, Bradley A.	1
Holland, Paul W.	1
Jones, Lyle V.	1
Keller, Lisa	1
Lam, Tony C. M.	1
Liu, Ou Lydia	1
More ▼