ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	16
Since 2006 (last 20 years)	34

Descriptor

Testing	66
Test Items	19
Scores	15
Test Reliability	15
Comparative Analysis	14
Measurement Techniques	11
Test Construction	11
Psychometrics	10
Scoring	9
Simulation	9
Test Interpretation	9
Accuracy	8
Equated Scores	8
Item Response Theory	8
Multiple Choice Tests	8
Statistical Analysis	8
Measurement	7
Models	7
Response Style (Tests)	7
Difficulty Level	6
Guessing (Tests)	6
Scoring Formulas	6
Test Validity	6
Achievement Tests	5
Correlation	5
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	45
Reports - Research	22
Reports - Evaluative	10
Reports - Descriptive	8
Opinion Papers	3
Guides - Non-Classroom	1
Information Analyses	1
Tests/Questionnaires	1

Education Level

Higher Education	1
Postsecondary Education	1
Secondary Education	1

Audience

Practitioners	1
Researchers	1

Location

Canada

Laws, Policies, & Programs

Assessments and Surveys

Wechsler Intelligence Scale…	3
California Achievement Tests	1
Iowa Tests of Basic Skills	1
Metropolitan Achievement Tests	1
Program for International…	1
SRA Achievement Series	1
Sequential Tests of…	1
Wechsler Adult Intelligence…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 66 results Save | Export

Modeling the Intraindividual Relation of Ability and Speed within a Test

Peer reviewed

Direct link

Augustin Mutak; Robert Krause; Esther Ulitzsch; Sören Much; Jochen Ranger; Steffi Pohl – Journal of Educational Measurement, 2024

Understanding the intraindividual relation between an individual's speed and ability in testing scenarios is essential to assure a fair assessment. Different approaches exist for estimating this relationship, that either rely on specific study designs or on specific assumptions. This paper aims to add to the toolbox of approaches for estimating…

Descriptors: Testing, Academic Ability, Time on Task, Correlation

Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior

Peer reviewed

Direct link

Gregory M. Hurtz; Regi Mucino – Journal of Educational Measurement, 2024

The Lognormal Response Time (LNRT) model measures the speed of test-takers relative to the normative time demands of items on a test. The resulting speed parameters and model residuals are often analyzed for evidence of anomalous test-taking behavior associated with fast and poorly fitting response time patterns. Extending this model, we…

Descriptors: Student Reaction, Reaction Time, Response Style (Tests), Test Items

Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model

Peer reviewed

Direct link

Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023

Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…

Descriptors: Testing, Computation, Classification, Accuracy

Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing

Peer reviewed

Direct link

Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022

While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…

Descriptors: Scoring, Testing, Test Items, Test Format

Using Multilabel Neural Network to Score High-Dimensional Assessments for Different Use Foci: An Example with College Major Preference Assessment

Peer reviewed

Direct link

Shun-Fu Hu; Amery D. Wu; Jake Stone – Journal of Educational Measurement, 2025

Scoring high-dimensional assessments (e.g., > 15 traits) can be a challenging task. This paper introduces the multilabel neural network (MNN) as a scoring method for high-dimensional assessments. Additionally, it demonstrates how MNN can score the same test responses to maximize different performance metrics, such as accuracy, recall, or…

Descriptors: Tests, Testing, Scores, Test Construction

Score Comparability Issues with At-Home Testing and How to Address Them

Peer reviewed

Direct link

Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022

As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…

Descriptors: Scores, Scoring, Comparative Analysis, Testing

Comparing the Accuracy of Student Growth Measures

Peer reviewed

Direct link

Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020

Testing programs are often interested in using a student growth measure. This article presents analytic derivations of the accuracy of common student growth measures on both the raw scale of the test and the percentile rank scale in terms of the proportional reduction in mean squared error and the squared correlation between the estimator and…

Descriptors: Student Evaluation, Accuracy, Testing, Student Development

How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data

Peer reviewed

Direct link

Sinharay, Sandip – Journal of Educational Measurement, 2017

Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…

Descriptors: Goodness of Fit, Testing, Test Items, Scores

Monitoring Items in Real Time to Enhance CAT Security

Peer reviewed

Direct link

Zhang, Jinming; Li, Jie – Journal of Educational Measurement, 2016

An IRT-based sequential procedure is developed to monitor items for enhancing test security. The procedure uses a series of statistical hypothesis tests to examine whether the statistical characteristics of each item under inspection have changed significantly during CAT administration. This procedure is compared with a previously developed…

Descriptors: Computer Assisted Testing, Test Items, Difficulty Level, Item Response Theory

Aggregating Polytomous DIF Results over Multiple Test Administrations

Peer reviewed

Direct link

Zwick, Rebecca; Ye, Lei; Isham, Steven – Journal of Educational Measurement, 2018

In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…

Descriptors: Test Bias, Testing, Test Items, Bayesian Statistics

Scoring Stability in a Large-Scale Assessment Program: A Longitudinal Analysis of Leniency/Severity Effects

Peer reviewed

Direct link

Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019

Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…

Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation

An Item-Level Expected Classification Accuracy and Its Applications in Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Ding, Shuliang – Journal of Educational Measurement, 2019

Most of the existing classification accuracy indices of attribute patterns lose effectiveness when the response data is absent in diagnostic testing. To handle this issue, this article proposes new indices to predict the correct classification rate of a diagnostic test before administering the test under the deterministic noise input…

Descriptors: Cognitive Tests, Classification, Accuracy, Diagnostic Tests

A Short Note on the Relationship between Pass Rate and Multiple Attempts

Peer reviewed

Direct link

Cheng, Ying; Liu, Cheng – Journal of Educational Measurement, 2016

For a certification, licensure, or placement exam, allowing examinees to take multiple attempts at the test could effectively change the pass rate. Change in the pass rate can occur without any change in the underlying latent trait, and can be an artifact of multiple attempts and imperfect reliability of the test. By deriving formulae to compute…

Descriptors: Testing, Computation, Change, Simulation

Classroom Assessment and Large-Scale Psychometrics: Shall the Twain Meet? (A Conversation with Margaret Heritage and Neal Kingston)

Peer reviewed

Direct link

Heritage, Margaret; Kingston, Neal M. – Journal of Educational Measurement, 2019

Classroom assessment and large-scale assessment have, for the most part, existed in mutual isolation. Some experts have felt this is for the best and others have been concerned that the schism limits the potential contribution of both forms of assessment. Margaret Heritage has long been a champion of best practices in classroom assessment. Neal…

Descriptors: Measurement, Psychometrics, Context Effect, Classroom Environment

Multilevel Cognitive Diagnosis Models for Assessing Changes in Latent Attributes

Peer reviewed

Direct link

Huang, Hung-Yu – Journal of Educational Measurement, 2017

Cognitive diagnosis models (CDMs) have been developed to evaluate the mastery status of individuals with respect to a set of defined attributes or skills that are measured through testing. When individuals are repeatedly administered a cognitive diagnosis test, a new class of multilevel CDMs is required to assess the changes in their attributes…

Descriptors: Testing, Cognitive Measurement, Test Items, Classification

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

Puhan, Gautam	6
Chen, Ping	2
Ding, Shuliang	2
Gierl, Mark J.	2
Guo, Hongwen	2
Hakstian, A. Ralph	2
Kansup, Wanlop	2
Kim, Sooyeon	2
Linn, Robert L.	2
Lord, Frederic M.	2
Sinharay, Sandip	2
Song, Lihong	2
Wang, Wenyi	2
Albanese, Mark A.	1
Amery D. Wu	1
Anderson, Thomas H.	1
Angoff, William H.	1
Augustin Mutak	1
Baldwin, Peter	1
Bolt, Daniel M.	1
Bridgeman, Brent	1
Brown, Frederick G.	1
Bunch, Michael B.	1
Carlson, Jerry S.	1
More ▼