ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	12

Descriptor

Correlation	14
Error of Measurement	14
Scoring	14
Interrater Reliability	5
Scores	5
Computer Assisted Testing	4
Goodness of Fit	4
Psychometrics	4
Test Validity	4
Accuracy	3
Diagnostic Tests	3
Foreign Countries	3
Language Tests	3
Reliability	3
Test Reliability	3
Writing Evaluation	3
At Risk Students	2
Children	2
Comparative Analysis	2
English (Second Language)	2
Essay Tests	2
Evaluation Methods	2
Evaluators	2
Generalizability Theory	2
Item Response Theory	2
More ▼

Source

Educational and Psychological…	2
Advances in Health Sciences…	1
CALICO Journal	1
Developmental Psychology	1
ETS Research Report Series	1
Educational Assessment	1
Grantee Submission	1
Journal of Educational…	1
Language, Speech, and Hearing…	1
National Education Policy…	1
Psychological Assessment	1
More ▼

Publication Type

Journal Articles	11
Reports - Research	10
Reports - Evaluative	3
Speeches/Meeting Papers	2
Opinion Papers	1
Tests/Questionnaires	1

Education Level

Secondary Education	2
Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Grade 2	1
Grade 3	1
Higher Education	1
Postsecondary Education	1
Primary Education	1

Audience

Researchers

Location

United States	2
China	1
China (Shanghai)	1
Germany	1
United Kingdom	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	1
Test of English as a Foreign…	1
Test of Standard Written…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Resolving and Re-Scoring Constructed Response Items in Mixed-Format Assessments: An Exploration of Three Approaches

Peer reviewed

Direct link

Stefanie A. Wind; Yangmeng Xu – Educational Assessment, 2024

We explored three approaches to resolving or re-scoring constructed-response items in mixed-format assessments: rater agreement, person fit, and targeted double scoring (TDS). We used a simulation study to consider how the three approaches impact the psychometric properties of student achievement estimates, with an emphasis on person fit. We found…

Descriptors: Interrater Reliability, Error of Measurement, Evaluation Methods, Examiners

A Polytomous Scoring Approach to Handle Not-Reached Items in Low-Stakes Assessments

Peer reviewed

Direct link

Gorgun, Guher; Bulut, Okan – Educational and Psychological Measurement, 2021

In low-stakes assessments, some students may not reach the end of the test and leave some items unanswered due to various reasons (e.g., lack of test-taking motivation, poor time management, and test speededness). Not-reached items are often treated as incorrect or not-administered in the scoring process. However, when the proportion of…

Descriptors: Scoring, Test Items, Response Style (Tests), Mathematics Tests

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Grantee Submission, 2022

Purpose: Our aim was to evaluate the psychometric properties of the online administered format of the Test of Narrative Language--Second Edition (TNL-2; Gillam & Pearson, 2017), given the importance of assessing children's narrative ability and considerable absence of psychometric studies of spoken language assessments administered online.…

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

Online Administration of the Test of Narrative Language--Second Edition: Psychometrics and Considerations for Remote Assessment

Peer reviewed
PDF on ERIC

Download full text

Direct link

Beula M. Magimairaj; Philip Capin; Sandra L. Gillam; Sharon Vaughn; Greg Roberts; Anna-Maria Fall; Ronald B. Gillam – Language, Speech, and Hearing Services in Schools, 2022

Descriptors: Computer Assisted Testing, Language Tests, Story Telling, Language Impairments

Investigating the Application of Automated Writing Evaluation to Chinese Undergraduate English Majors: A Case Study of "WriteToLearn"

Peer reviewed
PDF on ERIC

Download full text

Liu, Sha; Kunnan, Antony John – CALICO Journal, 2016

This study investigated the application of "WriteToLearn" on Chinese undergraduate English majors' essays in terms of its scoring ability and the accuracy of its error feedback. Participants were 163 second-year English majors from a university located in Sichuan province who wrote 326 essays from two writing prompts. Each paper was…

Descriptors: Foreign Countries, Undergraduate Students, English (Second Language), Second Language Learning

Choosing among Tucker or Chained Linear Equating in Two Testing Situations: Rater Comparability Scoring and Randomly Equivalent Groups with an Anchor

Peer reviewed

Direct link

Puhan, Gautam – Journal of Educational Measurement, 2012

Tucker and chained linear equatings were evaluated in two testing scenarios. In Scenario 1, referred to as rater comparability scoring and equating, the anchor-to-total correlation is often very high for the new form but moderate for the reference form. This may adversely affect the results of Tucker equating, especially if the new and reference…

Descriptors: Testing, Scoring, Equated Scores, Statistical Analysis

Effect of Observation Mode on Measures of Secondary Mathematics Teaching

Peer reviewed

Direct link

Casabianca, Jodi M.; McCaffrey, Daniel F.; Gitomer, Drew H.; Bell, Courtney A.; Hamre, Bridget K.; Pianta, Robert C. – Educational and Psychological Measurement, 2013

Classroom observation of teachers is a significant part of educational measurement; measurements of teacher practice are being used in teacher evaluation systems across the country. This research investigated whether observations made live in the classroom and from video recording of the same lessons yielded similar inferences about teaching.…

Descriptors: Secondary School Mathematics, Mathematics Instruction, Classroom Observation Techniques, Algebra

Optimization of Answer Keys for Script Concordance Testing: Should We Exclude Deviant Panelists, Deviant Responses, or Neither?

Peer reviewed

Direct link

Gagnon, Robert; Lubarsky, Stuart; Lambert, Carole; Charlin, Bernard – Advances in Health Sciences Education, 2011

The Script Concordance Test (SCT) uses a panel-based, aggregate scoring method that aims to capture the variability of responses of experienced practitioners to particular clinical situations. The use of this type of scoring method is a key determinant of the tool's discriminatory power, but deviant answers could potentially diminish the…

Descriptors: Expertise, Oncology, Scoring, Error of Measurement

Propensity Scoring and the Relationship between Sexual Media and Adolescent Sexual Behavior: Comment on Steinberg and Monahan (2011)

Peer reviewed

Direct link

Collins, Rebecca L.; Martino, Steven C.; Elliott, Marc N. – Developmental Psychology, 2011

Longitudinal research has demonstrated a link between exposure to sexual content in media and subsequent changes in adolescent sexual behavior, including initiation of intercourse and various noncoital sexual activities. Based on a reanalysis of one of the data sets involved, Steinberg and Monahan (2011) have challenged these findings. However,…

Descriptors: Sexuality, Mass Media Effects, Adolescents, Evaluation Methods

International Test Score Comparisons and Educational Policy: A Review of the Critiques

Peer reviewed
PDF on ERIC

Download full text

Carnoy, Martin – National Education Policy Center, 2015

Stanford education professor Martin Carnoy examines four main critiques of how international test results are used in policymaking. Of particular interest are critiques of the policy analyses published by the Program for International Student Assessment (PISA). Using average PISA scores as a comparative measure of student achievement is misleading…

Descriptors: Criticism, Reputation, Test Validity, Error of Measurement

The Perceived Stress Reactivity Scale: Measurement Invariance, Stability, and Validity in Three Countries

Peer reviewed

Direct link

Schlotz, Wolff; Yim, Ilona S.; Zoccola, Peggy M.; Jansen, Lars; Schulz, Peter – Psychological Assessment, 2011

There is accumulating evidence that individual differences in stress reactivity contribute to the risk for stress-related disease. However, the assessment of stress reactivity remains challenging, and there is a relative lack of questionnaires reliably assessing this construct. We here present the Perceived Stress Reactivity Scale (PSRS), a…

Descriptors: Stress Variables, Self Efficacy, Factor Structure, Infants

The Test of Standard Written English: A Revalidation with Writing Samples and Implications of Placement Decisions.

Suddick, David E.; And Others – 1985

The Test of Standard Written English (TSWE) is a 50-item multiple choice instrument designed to assess the ability of college students to use English. In this study, based upon a sample of 45 students, the TSWE was revalidated with writing samples. The coefficient of 0.54 was most impressive given that the TSWE scores were restricted to those…

Descriptors: Correlation, Error of Measurement, Essay Tests, Higher Education

Essay Reliability: Form and Meaning.

Download full text

Shale, Doug – 1986

This study is an attempt at a cohesive characterization of the concept of essay reliability. As such, it takes as a basic premise that previous and current practices in reporting reliability estimates for essay tests have certain shortcomings. The study provides an analysis of these shortcomings--partly to encourage a fuller understanding of the…

Descriptors: Analysis of Variance, Correlation, Error of Measurement, Essay Tests

Investigating the Utility of Analytic Scoring for the TOEFL Academic Speaking Test (TAST). TOEFL iBT Research Report. TOEFL iBT-01. ETS RR-06-07

Peer reviewed
PDF on ERIC

Download full text

Xi, Xiaoming; Mollaun, Pam – ETS Research Report Series, 2006

This study explores the utility of analytic scoring for the TOEFL® Academic Speaking Test (TAST) in providing useful and reliable diagnostic information in three aspects of candidates' performance: delivery, language use, and topic development. G studies were used to investigate the dependability of the analytic scores, the distinctness of the…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Oral Language

Anna-Maria Fall	2
Beula M. Magimairaj	2
Greg Roberts	2
Philip Capin	2
Ronald B. Gillam	2
Sandra L. Gillam	2
Sharon Vaughn	2
Bell, Courtney A.	1
Bulut, Okan	1
Carnoy, Martin	1
Casabianca, Jodi M.	1
Charlin, Bernard	1
Collins, Rebecca L.	1
Elliott, Marc N.	1
Gagnon, Robert	1
Gitomer, Drew H.	1
Gorgun, Guher	1
Hamre, Bridget K.	1
Jansen, Lars	1
Kunnan, Antony John	1
Lambert, Carole	1
Liu, Sha	1
Lubarsky, Stuart	1
Martino, Steven C.	1
More ▼