ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	8

Descriptor

Comparative Analysis	10
Difficulty Level	10
Raw Scores	10
Test Items	8
Item Analysis	5
Equated Scores	4
Statistical Analysis	4
Scores	3
Test Format	3
Goodness of Fit	2
Item Response Theory	2
Language Tests	2
Mathematics Tests	2
Multiple Choice Tests	2
Reading Tests	2
Sample Size	2
Test Wiseness	2
Ability	1
Academic Achievement	1
Accuracy	1
Adolescents	1
Age	1
Biology	1
Calculus	1
Children	1
More ▼

Source

ETS Research Report Series	3
American Journal on…	1
CBE - Life Sciences Education	1
Educational and Psychological…	1
International Journal of…	1
Pearson	1

Publication Type

Reports - Research	8
Journal Articles	7
Speeches/Meeting Papers	3
Reports - Evaluative	2

Education Level

Elementary Secondary Education	1
Higher Education	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Improvement of Norm Score Quality via Regression-Based Continuous Norming

Peer reviewed

Direct link

Lenhard, Wolfgang; Lenhard, Alexandra – Educational and Psychological Measurement, 2021

The interpretation of psychometric test results is usually based on norm scores. We compared semiparametric continuous norming (SPCN) with conventional norming methods by simulating results for test scales with different item numbers and difficulties via an item response theory approach. Subsequently, we modeled the norm scores based on random…

Descriptors: Test Norms, Scores, Regression (Statistics), Test Items

The Pseudo-Equivalent Groups Approach as an Alternative to Common-Item Equating. Research Report. ETS RR-18-02

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Lu, Ru – ETS Research Report Series, 2018

The purpose of this study was to evaluate the effectiveness of linking test scores by using test takers' background data to form pseudo-equivalent groups (PEG) of test takers. Using 4 operational test forms that each included 100 items and were taken by more than 30,000 test takers, we created 2 half-length research forms that had either 20…

Descriptors: Test Items, Item Banks, Difficulty Level, Comparative Analysis

Development of the BioCalculus Assessment (BCA)

Peer reviewed

Direct link

Taylor, Robin T.; Bishop, Pamela R.; Lenhart, Suzanne; Gross, Louis J.; Sturner, Kelly – CBE - Life Sciences Education, 2020

We describe the development and initial validity assessment of the 20-item BioCalculus Assessment (BCA), with the objective of comparing undergraduate life science students' understanding of calculus concepts in different courses with alternative emphases (with and without focus on biological applications). The development process of the BCA…

Descriptors: Test Construction, Mathematics Tests, Calculus, Test Validity

An Investigation of the Impact of Misrouting under Two-Stage Multistage Testing: A Simulation Study. Research Report. ETS RR-14-01

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; Moses, Tim – ETS Research Report Series, 2014

The purpose of this study was to investigate the potential impact of misrouting under a 2-stage multistage test (MST) design, which includes 1 routing and 3 second-stage modules. Simulations were used to create a situation in which a large group of examinees took each of the 3 possible MST paths (high, middle, and low). We compared differences in…

Descriptors: Comparative Analysis, Difficulty Level, Scores, Test Wiseness

A Qualitative Analysis of General Receptive Vocabulary of Adolescents with Down Syndrome

Peer reviewed

Direct link

Facon, Bruno; Nuchadee, Marie-Laure; Bollengier, Therese – American Journal on Intellectual and Developmental Disabilities, 2012

This study aimed to discover whether general receptive vocabulary is qualitatively phenotypical in Down syndrome. Sixty-two participants with Down syndrome (M age = 16.74 years, SD = 3.28) were individually matched on general vocabulary raw total score with 62 participants with intellectual disability of undifferentiated etiology (M age = 16.20…

Descriptors: Down Syndrome, Adolescents, Etiology, Receptive Language

Population Invariance of Vertical Scaling Results

Direct link

Powers, Sonya; Turhan, Ahmet; Binici, Salih – Pearson, 2012

The population sensitivity of vertical scaling results was evaluated for a state reading assessment spanning grades 3-10 and a state mathematics test spanning grades 3-8. Subpopulations considered included males and females. The 3-parameter logistic model was used to calibrate math and reading items and a common item design was used to construct…

Descriptors: Scaling, Equated Scores, Standardized Tests, Reading Tests

High Stakes Tests with Self-Selected Essay Questions: Addressing Issues of Fairness

Peer reviewed

Direct link

Lamprianou, Iasonas – International Journal of Testing, 2008

This study investigates the effect of reporting the unadjusted raw scores in a high-stakes language exam when raters differ significantly in severity and self-selected questions differ significantly in difficulty. More sophisticated models, introducing meaningful facets and parameters, are successively used to investigate the characteristics of…

Descriptors: High Stakes Tests, Raw Scores, Item Response Theory, Language Tests

Examining an Alternative to Score Equating: A Randomly Equivalent Forms Approach. Research Report. ETS RR-08-14

Peer reviewed
PDF on ERIC

Download full text

Liao, Chi-Wen; Livingston, Samuel A. – ETS Research Report Series, 2008

Randomly equivalent forms (REF) of tests in listening and reading for nonnative speakers of English were created by stratified random assignment of items to forms, stratifying on item content and predicted difficulty. The study included 50 replications of the procedure for each test. Each replication generated 2 REFs. The equivalence of those 2…

Descriptors: Equated Scores, Item Analysis, Test Items, Difficulty Level

An Analytical Evaluation of Two Common-Odds Ratios as Population Indicators of DIF.

Download full text

Pommerich, Mary; And Others – 1995

The Mantel-Haenszel (MH) statistic for identifying differential item functioning (DIF) commonly conditions on the observed test score as a surrogate for conditioning on latent ability. When the comparison group distributions are not completely overlapping (i.e., are incongruent), the observed score represents different levels of latent ability…

Descriptors: Ability, Comparative Analysis, Difficulty Level, Item Bias

A Comparison of Three Equating Procedures on the Certifying Examination for Primary Care Physician's Assistants.

Bell, Anita I. – 1979

An equating study was conducted on the Certifying Examination for Primary Care Physician's Assistants to compare the ability of current examinees with the standardization group and to determine if current test items are more difficult than previous items. Using 46 common items from the multiple choice section, the 1978 exam was equated to the 1976…

Descriptors: Comparative Analysis, Difficulty Level, Educational Trends, Equated Scores

Kim, Sooyeon	2
Bell, Anita I.	1
Binici, Salih	1
Bishop, Pamela R.	1
Bollengier, Therese	1
Facon, Bruno	1
Gross, Louis J.	1
Lamprianou, Iasonas	1
Lenhard, Alexandra	1
Lenhard, Wolfgang	1
Lenhart, Suzanne	1
Liao, Chi-Wen	1
Livingston, Samuel A.	1
Lu, Ru	1
Moses, Tim	1
Nuchadee, Marie-Laure	1
Pommerich, Mary	1
Powers, Sonya	1
Sturner, Kelly	1
Taylor, Robin T.	1
Turhan, Ahmet	1
More ▼