NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 28 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Jordan M. Wheeler; Allan S. Cohen; Shiyu Wang – Journal of Educational and Behavioral Statistics, 2024
Topic models are mathematical and statistical models used to analyze textual data. The objective of topic models is to gain information about the latent semantic space of a set of related textual data. The semantic space of a set of textual data contains the relationship between documents and words and how they are used. Topic models are becoming…
Descriptors: Semantics, Educational Assessment, Evaluators, Reliability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sayin, Ayfer; Sata, Mehmet – International Journal of Assessment Tools in Education, 2022
The aim of the present study was to examine Turkish teacher candidates' competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates…
Descriptors: Foreign Countries, Item Response Theory, Evaluators, Expertise
Tingir, Seyfullah – ProQuest LLC, 2019
Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…
Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Hardcastle, Joseph M.; Herrmann Abell, Cari F.; DeBoer, George E. – Grantee Submission, 2021
We developed assessment tasks aligned to the Next Generation Science Standards (NGSS) that require students to use argumentation and explanation practices along with disciplinary core ideas and crosscutting concepts to make sense of energy-related phenomena. Scoring rubrics were created to evaluate students' ability to make accurate claims, cite…
Descriptors: Academic Standards, Energy, Scientific Concepts, Persuasive Discourse
Peer reviewed Peer reviewed
Direct linkDirect link
Sung, Kyung Hee; Noh, Eun Hee; Chon, Kyong Hee – Asia Pacific Education Review, 2017
With increased use of constructed response items in large scale assessments, the cost of scoring has been a major consideration (Noh et al. in KICE Report RRE 2012-6, 2012; Wainer and Thissen in "Applied Measurement in Education" 6:103-118, 1993). In response to the scoring cost issues, various forms of automated system for scoring…
Descriptors: Automation, Scoring, Social Studies, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Tannenbaum, Richard J.; Kannan, Priya – Educational Assessment, 2015
Angoff-based standard setting is widely used, especially for high-stakes licensure assessments. Nonetheless, some critics have claimed that the judgment task is too cognitively complex for panelists, whereas others have explicitly challenged the consistency in (replicability of) standard-setting outcomes. Evidence of consistency in item judgments…
Descriptors: Standard Setting (Scoring), Reliability, Scores, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015
The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…
Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Steiner, Peter M.; Kim, Yongnam – Society for Research on Educational Effectiveness, 2014
In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…
Descriptors: Observation, Research Methodology, Test Bias, Regression (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
France, Stephen L.; Batchelder, William H. – Educational and Psychological Measurement, 2015
Cultural consensus theory (CCT) is a data aggregation technique with many applications in the social and behavioral sciences. We describe the intuition and theory behind a set of CCT models for continuous type data using maximum likelihood inference methodology. We describe how bias parameters can be incorporated into these models. We introduce…
Descriptors: Maximum Likelihood Statistics, Test Items, Difficulty Level, Test Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Moses, Tim – International Journal of Testing, 2013
The major purpose of this study is to assess the conditions under which single scoring for constructed-response (CR) items is as effective as double scoring in the licensure testing context. We used both empirical datasets of five mixed-format licensure tests collected in actual operational settings and simulated datasets that allowed for the…
Descriptors: Scoring, Test Format, Licensing Examinations (Professions), Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Güler, Nese – Eurasian Journal of Educational Research, 2014
Problem Statement: The most significant disadvantage of open-ended items that allow the valid measurement of upper level cognitive behaviours, such as synthesis and evaluation, is scoring. The difficulty associated with objectively scoring the answers to the items contributes to the reduction of the reliability of the scores. Moreover, other…
Descriptors: Item Response Theory, Statistics, Scoring, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Wheadon, Christopher; Pinot de Moira, Anne – Educational Research and Evaluation, 2013
Marking of high-stakes examinations in England has traditionally been administered by schools and colleges sending their examination papers directly to examiners. As a consequence, the work of one candidate has, historically, been marked by one examiner, as has work of an entire centre. Previous studies have suggested that the marking of both…
Descriptors: Foreign Countries, Scoring, High Stakes Tests, Reliability
OECD Publishing, 2013
The Programme for the International Assessment of Adult Competencies (PIAAC) has been planned as an ongoing program of assessment. The first cycle of the assessment has involved two "rounds." The first round, which is covered by this report, took place over the period of January 2008-October 2013. The main features of the first cycle of…
Descriptors: International Assessment, Adults, Skills, Test Construction
Peer reviewed Peer reviewed
Direct linkDirect link
Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012
We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…
Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers
Sutherland, Kevin S.; McLeod, Bryce D.; Conroy, Maureen A.; Cox, Julia R. – Grantee Submission, 2013
Young children with and at risk for emotional/behavioral disorders (EBD) present challenges for early childhood teachers. Evidence-based programs designed to address these young children's behavior problems exist, but there are a number of barriers to implementing these programs in early childhood settings. Advancing the science of treatment…
Descriptors: Program Implementation, Emotional Disturbances, Behavior Disorders, At Risk Persons
Previous Page | Next Page »
Pages: 1  |  2