ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	6
Since 2006 (last 20 years)	23

Descriptor

Evaluation Methods	31
Computer Assisted Testing	10
Hypothesis Testing	9
Test Items	9
Statistical Analysis	8
Factor Analysis	7
Item Response Theory	7
Simulation	7
Measurement Techniques	6
Scores	6
Foreign Countries	5
Measures (Individuals)	5
Test Construction	5
Adaptive Testing	4
Comparative Analysis	4
Computation	4
Effect Size	4
Error of Measurement	4
Evaluation Research	4
Models	4
Test Bias	4
Testing	4
Validity	4
Bayesian Statistics	3
Classification	3
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	29
Reports - Research	17
Reports - Evaluative	7
Reports - Descriptive	3
Book/Product Reviews	1

Education Level

Elementary Education	2
Elementary Secondary Education	2
Grade 4	2
Grade 6	2
High Schools	2
Intermediate Grades	2
Grade 10	1
Grade 3	1
Grade 5	1
Grade 7	1
Grade 8	1
Grade 9	1
Higher Education	1
Middle Schools	1
More ▼

Audience

Location

Canada	2
Taiwan	2
United States	2
Africa	1
Asia	1
Australia	1
China	1
Florida	1
Hong Kong	1
India	1
Japan	1
Mexico	1
South Korea	1
United Kingdom	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Bem Sex Role Inventory	1
Florida Comprehensive…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 31 results Save | Export

Quality of Item Pool (QIP) Index: A Novel Approach to Evaluating CAT Item Pool Adequacy

Peer reviewed

Direct link

Gönülates, Emre – Educational and Psychological Measurement, 2019

This article introduces the Quality of Item Pool (QIP) Index, a novel approach to quantifying the adequacy of an item pool of a computerized adaptive test for a given set of test specifications and examinee population. This index ranges from 0 to 1, with values close to 1 indicating the item pool presents optimum items to examinees throughout the…

Descriptors: Item Banks, Adaptive Testing, Computer Assisted Testing, Error of Measurement

Linear Factor Analytic Thurstonian Forced-Choice Models: Current Status and Issues

Peer reviewed

Direct link

Markus T. Jansen; Ralf Schulze – Educational and Psychological Measurement, 2024

Thurstonian forced-choice modeling is considered to be a powerful new tool to estimate item and person parameters while simultaneously testing the model fit. This assessment approach is associated with the aim of reducing faking and other response tendencies that plague traditional self-report trait assessments. As a result of major recent…

Descriptors: Factor Analysis, Models, Item Analysis, Evaluation Methods

Perspectives on the Use of Null Hypothesis Statistical Testing. Part III: the Various Nuts and Bolts of Statistical and Hypothesis Testing

Peer reviewed

Direct link

Marmolejo-Ramos, Fernando; Cousineau, Denis – Educational and Psychological Measurement, 2017

The number of articles showing dissatisfaction with the null hypothesis statistical testing (NHST) framework has been progressively increasing over the years. Alternatives to NHST have been proposed and the Bayesian approach seems to have achieved the highest amount of visibility. In this last part of the special issue, a few alternative…

Descriptors: Hypothesis Testing, Bayesian Statistics, Evaluation Methods, Statistical Inference

Observation-Oriented Modeling: Going beyond "Is It All a Matter of Chance"?

Peer reviewed

Direct link

Grice, James W.; Yepez, Maria; Wilson, Nicole L.; Shoda, Yuichi – Educational and Psychological Measurement, 2017

An alternative to null hypothesis significance testing is presented and discussed. This approach, referred to as observation-oriented modeling, is centered on model building in an effort to explicate the structures and processes believed to generate a set of observations. In terms of analysis, this novel approach complements traditional methods…

Descriptors: Hypothesis Testing, Models, Observation, Statistical Inference

Hypothesis Testing, "p" Values, Confidence Intervals, Measures of Effect Size, and Bayesian Methods in Light of Modern Robust Techniques

Peer reviewed

Direct link

Wilcox, Rand R.; Serang, Sarfaraz – Educational and Psychological Measurement, 2017

The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and "p" values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have…

Descriptors: Hypothesis Testing, Bayesian Statistics, Computation, Effect Size

Factorial Invariance in Multiple Populations: A Multiple Testing Procedure

Peer reviewed

Direct link

Raykov, Tenko; Marcoulides, George A.; Millsap, Roger E. – Educational and Psychological Measurement, 2013

A multiple testing method for examining factorial invariance for latent constructs evaluated by multiple indicators in distinct populations is outlined. The procedure is based on the false discovery rate concept and multiple individual restriction tests and resolves general limitations of a popular factorial invariance testing approach. The…

Descriptors: Testing, Statistical Analysis, Factor Analysis, Statistical Significance

Psychometric Consequences of Subpopulation Item Parameter Drift

Peer reviewed

Direct link

Huggins-Manley, Anne Corinne – Educational and Psychological Measurement, 2017

This study defines subpopulation item parameter drift (SIPD) as a change in item parameters over time that is dependent on subpopulations of examinees, and hypothesizes that the presence of SIPD in anchor items is associated with bias and/or lack of invariance in three psychometric outcomes. Results show that SIPD in anchor items is associated…

Descriptors: Psychometrics, Test Items, Item Response Theory, Hypothesis Testing

A New Stopping Rule for Computerized Adaptive Testing

Peer reviewed

Direct link

Choi, Seung W.; Grady, Matthew W.; Dodd, Barbara G. – Educational and Psychological Measurement, 2011

The goal of the current study was to introduce a new stopping rule for computerized adaptive testing (CAT). The predicted standard error reduction (PSER) stopping rule uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was…

Descriptors: Item Banks, Adaptive Testing, Computer Assisted Testing, Evaluation Methods

Examining Student Factors in Sources of Setting Accommodation DIF

Peer reviewed

Direct link

Lin, Pei-Ying; Lin, Yu-Cheng – Educational and Psychological Measurement, 2014

This exploratory study investigated potential sources of setting accommodation resulting in differential item functioning (DIF) on math and reading assessments for examinees with varied learning characteristics. The examinees were those who participated in large-scale assessments and were tested in either standardized or accommodated testing…

Descriptors: Test Bias, Multivariate Analysis, Testing Accommodations, Mathematics Tests

Comparison between Dichotomous and Polytomous Scoring of Innovative Items in a Large-Scale Computerized Adaptive Test

Peer reviewed

Direct link

Jiao, Hong; Liu, Junhui; Haynie, Kathleen; Woo, Ada; Gorham, Jerry – Educational and Psychological Measurement, 2012

This study explored the impact of partial credit scoring of one type of innovative items (multiple-response items) in a computerized adaptive version of a large-scale licensure pretest and operational test settings. The impacts of partial credit scoring on the estimation of the ability parameters and classification decisions in operational test…

Descriptors: Test Items, Computer Assisted Testing, Measures (Individuals), Scoring

Development and Application of Detection Indices for Measuring Guessing Behaviors and Test-Taking Effort in Computerized Adaptive Testing

Peer reviewed

Direct link

Chang, Shu-Ren; Plake, Barbara S.; Kramer, Gene A.; Lien, Shu-Mei – Educational and Psychological Measurement, 2011

This study examined the amount of time that different ability-level examinees spend on questions they answer correctly or incorrectly across different pretest item blocks presented on a fixed-length, time-restricted computerized adaptive testing (CAT). Results indicate that different ability-level examinees require different amounts of time to…

Descriptors: Evidence, Test Items, Reaction Time, Adaptive Testing

Testing Measurement Invariance Using MIMIC: Likelihood Ratio Test with a Critical Value Adjustment

Peer reviewed

Direct link

Kim, Eun Sook; Yoon, Myeongsun; Lee, Taehun – Educational and Psychological Measurement, 2012

Multiple-indicators multiple-causes (MIMIC) modeling is often used to test a latent group mean difference while assuming the equivalence of factor loadings and intercepts over groups. However, this study demonstrated that MIMIC was insensitive to the presence of factor loading noninvariance, which implies that factor loading invariance should be…

Descriptors: Test Items, Simulation, Testing, Statistical Analysis

Factor Loading Estimation Error and Stability Using Exploratory Factor Analysis

Peer reviewed

Direct link

Sass, Daniel A. – Educational and Psychological Measurement, 2010

Exploratory factor analysis (EFA) is commonly employed to evaluate the factor structure of measures with dichotomously scored items. Generally, only the estimated factor loadings are provided with no reference to significance tests, confidence intervals, and/or estimated factor loading standard errors. This simulation study assessed factor loading…

Descriptors: Intervals, Simulation, Factor Structure, Hypothesis Testing

The Evidence for a Subscore Structure in a Test of English Language Competency for English Language Learners

Peer reviewed

Direct link

Reckase, Mark D.; Xu, Jing-Ru – Educational and Psychological Measurement, 2015

How to compute and report subscores for a test that was originally designed for reporting scores on a unidimensional scale has been a topic of interest in recent years. In the research reported here, we describe an application of multidimensional item response theory to identify a subscore structure in a test designed for reporting results using a…

Descriptors: English, Language Skills, English Language Learners, Scores

Polytomous Adaptive Classification Testing: Effects of Item Pool Size, Test Termination Criterion, and Number of Cutscores

Peer reviewed

Direct link

Gnambs, Timo; Batinic, Bernad – Educational and Psychological Measurement, 2011

Computer-adaptive classification tests focus on classifying respondents in different proficiency groups (e.g., for pass/fail decisions). To date, adaptive classification testing has been dominated by research on dichotomous response formats and classifications in two groups. This article extends this line of research to polytomous classification…

Descriptors: Test Length, Computer Assisted Testing, Classification, Test Items

Previous Page | Next Page »

Pages: 1 | 2 | 3

Jiao, Hong	2
Wilcox, Rand R.	2
Arce-Ferrer, Alvaro J.	1
Arlin, Patricia Kennedy	1
Batinic, Bernad	1
Bodenhorn, Nancy	1
Breland, Hunter	1
Chang, Shu-Ren	1
Chen, Hsueh-Chu	1
Choi, Namok	1
Choi, Seung W.	1
Cousineau, Denis	1
Dodd, Barbara G.	1
Ebel, Robert L.	1
Fuqua, Dale R.	1
Gnambs, Timo	1
Gorham, Jerry	1
Grady, Matthew W.	1
Grice, James W.	1
Guzman, Elvira Martinez	1
Gönülates, Emre	1
Haynie, Kathleen	1
Huggins-Manley, Anne Corinne	1
Jarvinen, Denis W.	1
Kim, Do-Hong	1
More ▼