Publication Date
In 2025 | 1 |
Since 2024 | 5 |
Since 2021 (last 5 years) | 12 |
Since 2016 (last 10 years) | 37 |
Since 2006 (last 20 years) | 109 |
Descriptor
Evaluation Methods | 128 |
Probability | 128 |
Models | 41 |
Statistical Analysis | 31 |
Comparative Analysis | 21 |
Item Response Theory | 17 |
Scores | 16 |
Foreign Countries | 15 |
Research Methodology | 14 |
Simulation | 14 |
Correlation | 13 |
More ▼ |
Source
Author
Beretvas, S. Natasha | 2 |
Liu, Yan | 2 |
Steiner, Peter M. | 2 |
Tipton, Elizabeth | 2 |
Zumbo, Bruno D. | 2 |
Acredolo, Curt | 1 |
Akbari, Alireza | 1 |
Alexander D. Latham | 1 |
Allen, Jeff | 1 |
Alves, Cecilia | 1 |
Anderson, Kaitlin | 1 |
More ▼ |
Publication Type
Reports - Research | 128 |
Journal Articles | 107 |
Speeches/Meeting Papers | 7 |
Information Analyses | 3 |
Books | 1 |
Non-Print Media | 1 |
Education Level
Audience
Researchers | 2 |
Practitioners | 1 |
Location
Germany | 3 |
United Kingdom (England) | 3 |
Canada | 2 |
Illinois | 2 |
Italy | 2 |
United Kingdom (Scotland) | 2 |
United Kingdom (Wales) | 2 |
California (Los Angeles) | 1 |
Ecuador | 1 |
Georgia | 1 |
Iceland | 1 |
More ▼ |
Laws, Policies, & Programs
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
ACT Assessment | 1 |
Georgia Criterion Referenced… | 1 |
Raven Progressive Matrices | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Wendy Chan – Asia Pacific Education Review, 2024
As evidence from evaluation and experimental studies continue to influence decision and policymaking, applied researchers and practitioners require tools to derive valid and credible inferences. Over the past several decades, research in causal inference has progressed with the development and application of propensity scores. Since their…
Descriptors: Probability, Scores, Causal Models, Statistical Inference
Roderick J. Little; James R. Carpenter; Katherine J. Lee – Sociological Methods & Research, 2024
Missing data are a pervasive problem in data analysis. Three common methods for addressing the problem are (a) complete-case analysis, where only units that are complete on the variables in an analysis are included; (b) weighting, where the complete cases are weighted by the inverse of an estimate of the probability of being complete; and (c)…
Descriptors: Foreign Countries, Probability, Robustness (Statistics), Responses
Marchant, Nicolás; Quillien, Tadeg; Chaigneau, Sergio E. – Cognitive Science, 2023
The causal view of categories assumes that categories are represented by features and their causal relations. To study the effect of causal knowledge on categorization, researchers have used Bayesian causal models. Within that framework, categorization may be viewed as dependent on a likelihood computation (i.e., the likelihood of an exemplar with…
Descriptors: Classification, Bayesian Statistics, Causal Models, Evaluation Methods
Trina Johnson Kilty; Kevin T. Kilty; Andrea C. Burrows Borowczak; Mike Borowczak – Problems of Education in the 21st Century, 2024
A computer science camp for pre-collegiate students was operated during the summers of 2022 and 2023. The effect the camp had on attitudes was quantitatively assessed using a survey instrument. However, enrollment at the summer camp was small, which meant the well-known Pearson's Chi-Squared to measure the significance of results was not applied.…
Descriptors: Summer Programs, Camps, Computer Science Education, 21st Century Skills
Zhipeng Hou; Elizabeth Tipton – Research Synthesis Methods, 2024
Literature screening is the process of identifying all relevant records from a pool of candidate paper records in systematic review, meta-analysis, and other research synthesis tasks. This process is time consuming, expensive, and prone to human error. Screening prioritization methods attempt to help reviewers identify most relevant records while…
Descriptors: Meta Analysis, Research Reports, Identification, Evaluation Methods
Baldwin, Peter; Margolis, Melissa J.; Clauser, Brian E.; Mee, Janet; Winward, Marcia – Educational Measurement: Issues and Practice, 2020
Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the…
Descriptors: Standard Setting (Scoring), Probability, Cutting Scores, Evaluation Methods
Kolarec, Biserka; Nincevic, Marina – International Society for Technology, Education, and Science, 2022
The object of research is a statistics exam that contains problem tasks. One examiner performed two exam evaluation methods to repeatedly evaluate the exam. The goal was to compare the methods for objectivity. One of the two exam evaluation methods we call a serial evaluation method. The serial evaluation method assumes evaluation of all exam…
Descriptors: Statistics Education, Mathematics Tests, Evaluation Methods, Test Construction
Fu, Qiang; Guo, Xin; Land, Kenneth C. – Sociological Methods & Research, 2020
Count responses with grouping and right censoring have long been used in surveys to study a variety of behaviors, status, and attitudes. Yet grouping or right-censoring decisions of count responses still rely on arbitrary choices made by researchers. We develop a new method for evaluating grouping and right-censoring decisions of count responses…
Descriptors: Surveys, Artificial Intelligence, Evaluation Methods, Probability
Schonberg, Christina – Online Submission, 2023
IXL is an end-to-end teaching and learning solution that engages learners in grades Pre-K through 12 with a comprehensive curriculum and a first-of-its-kind assessment suite. A core component of IXL's assessment suite is the IXL Diagnostic, an interim assessment designed by a team of educators and mathematicians that uses Item Response Theory…
Descriptors: Academic Achievement, Achievement Tests, Computer Uses in Education, Elementary School Students
Hung, Su-Pin; Huang, Hung-Yu – Journal of Educational and Behavioral Statistics, 2022
To address response style or bias in rating scales, forced-choice items are often used to request that respondents rank their attitudes or preferences among a limited set of options. The rating scales used by raters to render judgments on ratees' performance also contribute to rater bias or errors; consequently, forced-choice items have recently…
Descriptors: Evaluation Methods, Rating Scales, Item Analysis, Preferences
Käser, Tanja; Schwartz, Daniel L. – International Journal of Artificial Intelligence in Education, 2020
Modeling and predicting student learning in computer-based environments often relies solely on sequences of accuracy data. Previous research suggests that it does not only matter what we learn, but also how we learn. The detection and analysis of learning behavior becomes especially important, when dealing with open-ended exploration environments,…
Descriptors: Inquiry, Learning Strategies, Outcomes of Education, Academic Achievement
Chen, Michelle Y.; Liu, Yan; Zumbo, Bruno D. – Educational and Psychological Measurement, 2020
This study introduces a novel differential item functioning (DIF) method based on propensity score matching that tackles two challenges in analyzing performance assessment data, that is, continuous task scores and lack of a reliable internal variable as a proxy for ability or aptitude. The proposed DIF method consists of two main stages. First,…
Descriptors: Probability, Scores, Evaluation Methods, Test Items
Smith, Trevor I.; Bendjilali, Nasrine – Physical Review Physics Education Research, 2022
Several recent studies have employed item response theory (IRT) to rank incorrect responses to commonly used research-based multiple-choice assessments. These studies use Bock's nominal response model (NRM) for applying IRT to categorical (nondichotomous) data, but the response rankings only utilize half of the parameters estimated by the model.…
Descriptors: Item Response Theory, Test Items, Multiple Choice Tests, Science Tests
Polyzou, Agoritsa; Nikolakopoulos, Athanasios N.; Karypis, George – International Educational Data Mining Society, 2019
Course selection is a crucial and challenging problem that students have to face while navigating through an undergraduate degree program. The decisions they make shape their future in ways that they cannot conceive in advance. Available departmental sample degree plans are not personalized for each student, and personal discussion time with an…
Descriptors: Markov Processes, Course Selection (Students), Undergraduate Students, Decision Making
Carly Oddleifson; Stephen Kilgus; David A. Klingbeil; Alexander D. Latham; Jessica S. Kim; Ishan N. Vengurlekar – Grantee Submission, 2025
The purpose of this study was to conduct a conceptual replication of Pendergast et al.'s (2018) study that examined the diagnostic accuracy of a nomogram procedure, also known as a naive Bayesian approach. The specific naive Bayesian approach combined academic and social-emotional and behavioral (SEB) screening data to predict student performance…
Descriptors: Bayesian Statistics, Accuracy, Social Emotional Learning, Diagnostic Tests