ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	19

Descriptor

Reliability	28
Test Items	28
Scoring	22
Item Response Theory	7
Classification	6
Comparative Analysis	5
Foreign Countries	5
Psychometrics	5
Test Construction	5
Difficulty Level	4
Measurement Techniques	4
Sampling	4
Simulation	4
Standard Setting (Scoring)	4
Statistical Analysis	4
Computation	3
Correlation	3
Cutting Scores	3
Error of Measurement	3
Generalizability Theory	3
Licensing Examinations…	3
Mathematics Tests	3
Measurement	3
Models	3
Responses	3
More ▼

Source

Educational and Psychological…	3
Applied Measurement in…	2
Grantee Submission	2
Journal of Educational…	2
ProQuest LLC	2
American Journal of…	1
Applied Psychological…	1
Asia Pacific Education Review	1
Educational Assessment	1
Educational Research and…	1
Eurasian Journal of…	1
International Journal of…	1
International Journal of…	1
Journal of Educational…	1
Journal of Educational and…	1
OECD Publishing	1
Society for Research on…	1
More ▼

Publication Type

Journal Articles	18
Reports - Research	15
Reports - Evaluative	6
Reports - Descriptive	3
Speeches/Meeting Papers	3
Dissertations/Theses -…	2
Collected Works - General	1
ERIC Digests in Full Text	1
ERIC Publications	1
Reports - General	1
Tests/Questionnaires	1
More ▼

Education Level

Elementary Secondary Education	2
Higher Education	2
Postsecondary Education	2
Early Childhood Education	1
Grade 4	1
Grade 5	1
Secondary Education	1

Audience

Location

Turkey	2
Australia	1
Austria	1
Belgium	1
Canada	1
Chile	1
Cyprus	1
Czech Republic	1
Denmark	1
Estonia	1
France	1
Germany	1
Ireland	1
Israel	1
Italy	1
Japan	1
Netherlands	1
Norway	1
Poland	1
Russia	1
Slovakia	1
South Korea	1
Spain	1
Sweden	1
United Kingdom	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
National Assessment of…	1
SAT (College Admission Test)	1
Work Keys (ACT)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 28 results Save | Export

A Comparison of Latent Semantic Analysis and Latent Dirichlet Allocation in Educational Measurement

Peer reviewed

Direct link

Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024

Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…

Descriptors: Semantics, Educational Assessment, Evaluators, Reliability

Using Rasch Analysis to Examine Raters' Expertise Turkish Teacher Candidates' Competency Levels in Writing Different Types of Test Items

Peer reviewed
PDF on ERIC

Download full text

Sayin, Ayfer; Sata, Mehmet – International Journal of Assessment Tools in Education, 2022

The aim of the present study was to examine Turkish teacher candidates' competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates…

Descriptors: Foreign Countries, Item Response Theory, Evaluators, Expertise

Evaluating the Effectiveness of the Expectation-Maximization (EM) Algorithm for Bayesian Network Calibration

Direct link

Tingir, Seyfullah – ProQuest LLC, 2019

Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…

Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability

Validating a Claim-Evidence-Science Idea-Reasoning (CESR) Framework for Use in NGSS Assessment Tasks

Peer reviewed
PDF on ERIC

Download full text

Hardcastle, Joseph M.; Herrmann Abell, Cari F.; DeBoer, George E. – Grantee Submission, 2021

We developed assessment tasks aligned to the Next Generation Science Standards (NGSS) that require students to use argumentation and explanation practices along with disciplinary core ideas and crosscutting concepts to make sense of energy-related phenomena. Scoring rubrics were created to evaluate students' ability to make accurate claims, cite…

Descriptors: Academic Standards, Energy, Scientific Concepts, Persuasive Discourse

Multivariate Generalizability Analysis of Automated Scoring for Short Answer Items of Social Studies in Large-Scale Assessment

Peer reviewed

Direct link

Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017

With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…

Descriptors: Automation, Scoring, Social Studies, Test Items

Consistency of Angoff-Based Standard-Setting Judgments: Are Item Judgments and Passing Scores Replicable across Different Panels of Experts?

Peer reviewed

Direct link

Tannenbaum, Richard J.; Kannan, Priya – Educational Assessment, 2015

Angoff-based standard setting is widely used, especially for high-stakes licensure assessments. Nonetheless, some critics have claimed that the judgment task is too cognitively complex for panelists, whereas others have explicitly challenged the consistency in (replicability of) standard-setting outcomes. Evidence of consistency in item judgments…

Descriptors: Standard Setting (Scoring), Reliability, Scores, Licensing Examinations (Professions)

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

On the Bias-Amplifying Effect of Near Instruments in Observational Studies

Peer reviewed
PDF on ERIC

Download full text

Steiner, Peter M.; Kim, Yongnam – Society for Research on Educational Effectiveness, 2014

In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…

Descriptors: Observation, Research Methodology, Test Bias, Regression (Statistics)

Maximum Likelihood Item Easiness Models for Test Theory without an Answer Key

Peer reviewed

Direct link

France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015

Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…

Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory

Determining When Single Scoring for Constructed-Response Items Is as Effective as Double Scoring in Mixed-Format Licensure Tests

Peer reviewed

Direct link

Kim, Sooyeon; Moses, Tim – International Journal of Testing, 2013

The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…

Descriptors: Scoring, Test Format, Licensing Examinations (Professions), Test Items

Analysis of Open-Ended Statistics Questions with Many Facet Rasch Model

Peer reviewed
PDF on ERIC

Download full text

Güler, Nese – Eurasian Journal of Educational Research, 2014

Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other…

Descriptors: Item Response Theory, Statistics, Scoring, Reliability

Gains in Marking Reliability from Item-Level Marking: Is the Sum of the Parts Better than the Whole?

Peer reviewed

Direct link

Wheadon, Christopher; Pinot de Moira, Anne – Educational Research and Evaluation, 2013

Marking of high-stakes examinations in England has traditionally been administered by schools and colleges sending their examination papers directly to examiners. As a consequence, the work of one candidate has, historically, been marked by one examiner, as has work of an entire centre. Previous studies have suggested that the marking of both…

Descriptors: Foreign Countries, Scoring, High Stakes Tests, Reliability

Technical Report of the Survey of Adult Skills (PIAAC)

Direct link

OECD Publishing, 2013

The Programme for the International Assessment of Adult Competencies (PIAAC) has been planned as an ongoing program of assessment. The first cycle of the assessment has involved two "rounds." The first round, which is covered by this report, took place over the period of January 2008-October 2013. The main features of the first cycle of…

Descriptors: International Assessment, Adults, Skills, Test Construction

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Measuring Implementation of Evidence-Based Programs Targeting Young Children at Risk for Emotional/Behavioral Disorders: Conceptual Issues and Recommendations

Peer reviewed
PDF on ERIC

Download full text

Direct link

Sutherland, Kevin S.; McLeod, Bryce D.; Conroy, Maureen A.; Cox, Julia R. – Grantee Submission, 2013

Young children with and at risk for emotional/behavioral disorders (EBD) present challenges for early childhood teachers. Evidence-based programs designed to address these young children's behavior problems exist, but there are a number of barriers to implementing these programs in early childhood settings. Advancing the science of treatment…

Descriptors: Program Implementation, Emotional Disturbances, Behavior Disorders, At Risk Persons

Previous Page | Next Page »

Pages: 1 | 2

Braun, Henry I.	2
Kannan, Priya	2
Tannenbaum, Richard J.	2
Allan S. Cohen	1
Allen, Sally	1
Batchelder, William H.	1
Birenbaum, Menucha	1
Brennan, Robert L.	1
Childs, Ruth A.	1
Chon, Kyong Hee	1
Conroy, Maureen A.	1
Cox, Julia R.	1
Cui, Ying	1
DeBoer, George E.	1
Deng, Nina	1
Fergadiotis, Gerasimos	1
France, Stephen L.	1
Gierl, Mark J.	1
Güler, Nese	1
Hardcastle, Joseph M.	1
Herrmann Abell, Cari F.	1
Hula, William D.	1
Impara, James C.	1
Irwin, Patrick	1
More ▼