Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 5 |
| Since 2007 (last 20 years) | 19 |
Descriptor
| Reliability | 28 |
| Test Items | 28 |
| Scoring | 22 |
| Item Response Theory | 7 |
| Classification | 6 |
| Comparative Analysis | 5 |
| Foreign Countries | 5 |
| Psychometrics | 5 |
| Test Construction | 5 |
| Difficulty Level | 4 |
| Measurement Techniques | 4 |
| More ▼ | |
Source
Author
| Braun, Henry I. | 2 |
| Kannan, Priya | 2 |
| Tannenbaum, Richard J. | 2 |
| Allan S. Cohen | 1 |
| Allen, Sally | 1 |
| Batchelder, William H. | 1 |
| Birenbaum, Menucha | 1 |
| Brennan, Robert L. | 1 |
| Childs, Ruth A. | 1 |
| Chon, Kyong Hee | 1 |
| Conroy, Maureen A. | 1 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 2 |
| Higher Education | 2 |
| Postsecondary Education | 2 |
| Early Childhood Education | 1 |
| Grade 4 | 1 |
| Grade 5 | 1 |
| Secondary Education | 1 |
Audience
Location
| Turkey | 2 |
| Australia | 1 |
| Austria | 1 |
| Belgium | 1 |
| Canada | 1 |
| Chile | 1 |
| Cyprus | 1 |
| Czech Republic | 1 |
| Denmark | 1 |
| Estonia | 1 |
| France | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Advanced Placement… | 1 |
| National Assessment of… | 1 |
| SAT (College Admission Test) | 1 |
| Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…
Descriptors: Semantics, Educational Assessment, Evaluators, Reliability
Sayin, Ayfer; Sata, Mehmet – International Journal of Assessment Tools in Education, 2022
The aim of the present study was to examine Turkish teacher candidates' competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates…
Descriptors: Foreign Countries, Item Response Theory, Evaluators, Expertise
Tingir, Seyfullah – ProQuest LLC, 2019
Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…
Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability
Validating a Claim-Evidence-Science Idea-Reasoning (CESR) Framework for Use in NGSS Assessment Tasks
Hardcastle, Joseph M.; Herrmann Abell, Cari F.; DeBoer, George E. – Grantee Submission, 2021
We developed assessment tasks aligned to the Next Generation Science Standards (NGSS) that require students to use argumentation and explanation practices along with disciplinary core ideas and crosscutting concepts to make sense of energy-related phenomena. Scoring rubrics were created to evaluate students' ability to make accurate claims, cite…
Descriptors: Academic Standards, Energy, Scientific Concepts, Persuasive Discourse
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Tannenbaum, Richard J.; Kannan, Priya – Educational Assessment, 2015
Angoff-based standard setting is widely used, especially for high-stakes licensure assessments. Nonetheless, some critics have claimed that the judgment task is too cognitively complex for panelists, whereas others have explicitly challenged the consistency in (replicability of) standard-setting outcomes. Evidence of consistency in item judgments…
Descriptors: Standard Setting (Scoring), Reliability, Scores, Licensing Examinations (Professions)
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Steiner, Peter M.; Kim, Yongnam – Society for Research on Educational Effectiveness, 2014
In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…
Descriptors: Observation, Research Methodology, Test Bias, Regression (Statistics)
France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015
Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…
Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory
Kim, Sooyeon; Moses, Tim – International Journal of Testing, 2013
The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…
Descriptors: Scoring, Test Format, Licensing Examinations (Professions), Test Items
Güler, Nese – Eurasian Journal of Educational Research, 2014
Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other…
Descriptors: Item Response Theory, Statistics, Scoring, Reliability
Gains in Marking Reliability from Item-Level Marking: Is the Sum of the Parts Better than the Whole?
Wheadon, Christopher; Pinot de Moira, Anne – Educational Research and Evaluation, 2013
Marking of high-stakes examinations in England has traditionally been administered by schools and colleges sending their examination papers directly to examiners. As a consequence, the work of one candidate has, historically, been marked by one examiner, as has work of an entire centre. Previous studies have suggested that the marking of both…
Descriptors: Foreign Countries, Scoring, High Stakes Tests, Reliability
OECD Publishing, 2013
The Programme for the International Assessment of Adult Competencies (PIAAC) has been planned as an ongoing program of assessment. The first cycle of the assessment has involved two "rounds." The first round, which is covered by this report, took place over the period of January 2008-October 2013. The main features of the first cycle of…
Descriptors: International Assessment, Adults, Skills, Test Construction
Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012
We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…
Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers
Sutherland, Kevin S.; McLeod, Bryce D.; Conroy, Maureen A.; Cox, Julia R. – Grantee Submission, 2013
Young children with and at risk for emotional/behavioral disorders (EBD) present challenges for early childhood teachers. Evidence-based programs designed to address these young children's behavior problems exist, but there are a number of barriers to implementing these programs in early childhood settings. Advancing the science of treatment…
Descriptors: Program Implementation, Emotional Disturbances, Behavior Disorders, At Risk Persons
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
