ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	15

Descriptor

Error of Measurement	21
Item Analysis	21
Item Response Theory	6
Comparative Analysis	5
Test Reliability	5
Evaluation Methods	4
Measurement Techniques	4
Models	4
Simulation	4
Statistical Bias	4
Correlation	3
Evaluation Problems	3
Guidelines	3
Psychometrics	3
Scores	3
Test Bias	3
Test Construction	3
Test Items	3
Academic Standards	2
Classification	2
Computation	2
Construct Validity	2
Decision Making	2
Difficulty Level	2
Effect Size	2
More ▼

Source

Educational and Psychological…	2
Journal of Educational…	2
Applied Psychological…	1
Early Education and…	1
GED Testing Service	1
Journal of Educational and…	1
Journal of Psychoeducational…	1
Learning Disability Quarterly	1
Multivariate Behavioral…	1
National Center for Research…	1
Practical Assessment,…	1
Psychological Methods	1
Research Quarterly for…	1
School Psychology Review	1
Structural Equation Modeling:…	1
Teaching in Higher Education	1
More ▼

Publication Type

Reports - Evaluative	21
Journal Articles	16
Speeches/Meeting Papers	2
Tests/Questionnaires	2
Information Analyses	1

Education Level

Elementary Secondary Education	2
Higher Education	2
Adult Education	1
Early Childhood Education	1

Audience

Researchers

Location

Canada	1
Mississippi	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Beck Depression Inventory	1
Expressive One Word Picture…	1
General Educational…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 21 results Save | Export

Examining the Precision of Cut Scores within a Generalizability Theory Framework: A Closer Look at the Item Effect

Peer reviewed

Direct link

Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020

An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…

Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting

On the Relationship between Classical Test Theory and Item Response Theory: From One to the Other and Back

Peer reviewed

Direct link

Raykov, Tenko; Marcoulides, George A. – Educational and Psychological Measurement, 2016

The frequently neglected and often misunderstood relationship between classical test theory and item response theory is discussed for the unidimensional case with binary measures and no guessing. It is pointed out that popular item response models can be directly obtained from classical test theory-based models by accounting for the discrete…

Descriptors: Test Theory, Item Response Theory, Models, Correlation

Assumptions of Multiple Regression: Correcting Two Misconceptions

Peer reviewed
PDF on ERIC

Download full text

Williams, Matt N.; Gomez Grajales, Carlos Alberto; Kurkiewicz, Dason – Practical Assessment, Research & Evaluation, 2013

In 2002, an article entitled "Four assumptions of multiple regression that researchers should always test" by Osborne and Waters was published in "PARE." This article has gone on to be viewed more than 275,000 times (as of August 2013), and it is one of the first results displayed in a Google search for "regression…

Descriptors: Multiple Regression Analysis, Misconceptions, Reader Response, Predictor Variables

Nonparametric Item Response Curve Estimation with Correction for Measurement Error

Peer reviewed

Direct link

Guo, Hongwen; Sinharay, Sandip – Journal of Educational and Behavioral Statistics, 2011

Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…

Descriptors: Testing Programs, Measurement, Item Analysis, Error of Measurement

Measurement Invariance and Latent Mean Differences of the Beck Depression Inventory II across Gender Groups

Peer reviewed

Direct link

Wu, Pei-Chen – Journal of Psychoeducational Assessment, 2010

This study examined measurement invariance (i.e., configural invariance, metric invariance, scalar invariance) of the Chinese version of Beck Depression Inventory II (BDI-II-C) across college males and females and compared gender differences on depression at the latent factor mean level. Two samples composed of 402 male college students and 595…

Descriptors: College Students, Females, Negative Attitudes, Construct Validity

Measuring Educational Quality by Appraising Theses and Dissertations: Pitfalls and Remedies

Peer reviewed

Direct link

Hamilton, Patti; Johnson, Robert; Poudrier, Chelsey – Teaching in Higher Education, 2010

In this paper, we argue that, as indicators of the educational quality of graduate degree programs, student theses and dissertations are best used in specific contexts. High-quality theses and dissertations, that is, may be the result of factors such as verbal skills students already possessed at admission or of complex interactions between…

Descriptors: Educational Quality, Doctoral Dissertations, Theses, Change Strategies

Obscuring Vital Distinctions: The Oversimplification of Learning Disabilities within RTI

Peer reviewed

Direct link

McKenzie, Robert G. – Learning Disability Quarterly, 2009

The assessment procedures within Response to Intervention (RTI) models have begun to supplant the use of traditional, discrepancy-based frameworks for identifying students with specific learning disabilities (SLD). Many RTI proponents applaud this shift because of perceived shortcomings in utilizing discrepancy as an indicator of SLD. However,…

Descriptors: Intervention, Learning Disabilities, Error of Measurement, Psychometrics

Is Parceling Really Necessary? A Comparison of Results from Item Parceling and Categorical Variable Methodology

Peer reviewed

Direct link

Bandalos, Deborah L. – Structural Equation Modeling: A Multidisciplinary Journal, 2008

This study examined the efficacy of 4 different parceling methods for modeling categorical data with 2, 3, and 4 categories and with normal, moderately nonnormal, and severely nonnormal distributions. The parceling methods investigated were isolated parceling in which items were parceled with other items sharing the same source of variance, and…

Descriptors: Structural Equation Models, Computation, Goodness of Fit, Classification

Reliability Analysis for the Internationally Administered 2002 Series GED Tests. GED Testing Service[R] Research Studies, 2009-3

Download full text

Setzer, J. Carl; He, Yi – GED Testing Service, 2009

Reliability Analysis for the Internationally Administered 2002 Series GED (General Educational Development) Tests Reliability refers to the consistency, or stability, of test scores when the authors administer the measurement procedure repeatedly to groups of examinees (American Educational Research Association [AERA], American Psychological…

Descriptors: Educational Research, Error of Measurement, Scores, Test Reliability

Iterative Purification and Effect Size Use with Logistic Regression for Differential Item Functioning Detection

Peer reviewed

Direct link

French, Brian F.; Maller, Susan J. – Educational and Psychological Measurement, 2007

Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…

Descriptors: Effect Size, Test Bias, Guidelines, Error of Measurement

On the Consistency of Individual Classification Using Short Scales

Peer reviewed

Direct link

Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007

Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…

Descriptors: Psychiatry, Patients, Error of Measurement, Test Length

The Practical Realization of the Standard Error of the Mantel-Haenszel Statistic.

Linacre, John Michael – 1988

Simulations were performed to verify the accuracy with which the Mantel-Haenszel (MH) and Rasch PROX procedures recover simulated item bias. Several standard error estimators for the MH procedure were evaluated. Item bias is recovered satisfactorily by both techniques under all simulated conditions. The proposed MH standard error estimators have…

Descriptors: Error of Measurement, Estimation (Mathematics), Item Analysis, Statistical Analysis

A Confirmatory Analysis of Item Reliability Trends (CAIRT): Differentiating True Score and Error Variance in the Analysis of Item Context Effects

Peer reviewed

Direct link

Hartig, Johannes; Holzel, Britta; Moosbrugger, Helfried – Multivariate Behavioral Research, 2007

Numerous studies have shown increasing item reliabilities as an effect of the item position in personality scales. Traditionally, these context effects are analyzed based on item-total correlations. This approach neglects that trends in item reliabilities can be caused either by an increase in true score variance or by a decrease in error…

Descriptors: True Scores, Error of Measurement, Structural Equation Models, Simulation

Does the EDI Measure School Readiness in the Same Way across Different Groups of Children?

Peer reviewed

Direct link

Guhn, Martin; Gadermann, Anne; Zumbo, Bruno D. – Early Education and Development, 2007

The present study investigates whether the Early Development Instrument (Offord & Janus, 1999) measures school readiness similarly across different groups of children. We employ ordinal logistic regression to investigate differential item functioning, a method of examining measurement bias. For 40,000 children, our analysis compares groups…

Descriptors: School Readiness, Kindergarten, Child Development, Program Validation

Recommendations for Building a Valid Benchmark Assessment System: Second Report to the Jackson Public Schools. CRESST Report 724

Download full text

Niemi, David; Wang, Jia; Wang, Haiwen; Vallone, Julia; Griffin, Noelle – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2007

There are usually many testing activities going on in a school, with different tests serving different purposes, thus organization and planning are key in creating an efficient system in assessing the most important educational objectives. In the ideal case, an assessment system will be able to inform on student learning, instruction and…

Descriptors: School Administration, Educational Objectives, Administration, Public Schools

Previous Page | Next Page »

Pages: 1 | 2

Altepeter, Tom	1
Bandalos, Deborah L.	1
Bowes, Neal	1
Clauser, Brian E.	1
Clauser, Jerome C.	1
Colton, Dean A.	1
Emons, Wilco H. M.	1
Finch, Holmes	1
Fox, Kenneth R.	1
French, Brian F.	1
Gadermann, Anne	1
Gomez Grajales, Carlos Alberto	1
Griffin, Noelle	1
Guhn, Martin	1
Guo, Hongwen	1
Hamilton, Patti	1
Hartig, Johannes	1
He, Yi	1
Holzel, Britta	1
Johnson, Robert	1
Kane, Michael	1
Kurkiewicz, Dason	1
Lance, Charles E.	1
Lane, Andrew M.	1
Linacre, John Michael	1
More ▼