ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	4

Descriptor

Difficulty Level	11
Evaluators	11
Interrater Reliability	11
Standard Setting (Scoring)	6
Test Items	6
Higher Education	5
Scoring	5
Item Analysis	4
Licensing Examinations…	4
Minimum Competency Testing	4
Cutting Scores	3
Estimation (Mathematics)	3
Language Tests	3
Elementary Secondary Education	2
Language Proficiency	2
Latent Trait Theory	2
Mathematics Tests	2
Minimum Competencies	2
Oral Language	2
Rating Scales	2
Scores	2
Second Language Learning	2
Selection	2
Standardized Tests	2
Standards	2
More ▼

Source

Educational Measurement:…	2
Applied Measurement in…	1
Education Sciences	1
Educational and Psychological…	1
Language Assessment Quarterly	1
Studies in Second Language…	1

Author

Lunz, Mary E.	2
Clevinger, Amanda	1
Crossley, Scott	1
Isbell, Daniel R.	1
Jaeger, Richard M.	1
Kim, YouJin	1
Lyness, Scott A.	1
Melican, Gerald J.	1
Mills, Craig N.	1
Peterson, Kent	1
Plake, Barbara S.	1
Reid, Jerry B.	1
Son, Young-A	1
Stahl, John A.	1
Wheeler, Patricia	1
Wyse, Adam E.	1
Yates, Kenneth	1
More ▼

Publication Type

Journal Articles	7
Reports - Research	7
Speeches/Meeting Papers	4
Reports - Evaluative	3
Information Analyses	1
Tests/Questionnaires	1

Education Level

Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
Higher Education	1
Kindergarten	1
Postsecondary Education	1
Primary Education	1

Audience

Location

California

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	1
edTPA (Teacher Performance…	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Measurement Properties of a Standardized Elicited Imitation Test: An Integrative Data Analysis

Peer reviewed

Direct link

Isbell, Daniel R.; Son, Young-A – Studies in Second Language Acquisition, 2022

Elicited Imitation Tests (EITs) are commonly used in second language acquisition (SLA)/bilingualism research contexts to assess the general oral proficiency of study participants. While previous studies have provided valuable EIT construct-related validity evidence, some key gaps remain. This study uses an integrative data analysis to further…

Descriptors: Bilingualism, Imitation, Language Tests, Second Language Learning

Low Inter-Rater Reliability of a High Stakes Performance Assessment of Teacher Candidates

Peer reviewed
PDF on ERIC

Download full text

Lyness, Scott A.; Peterson, Kent; Yates, Kenneth – Education Sciences, 2021

The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen's weighted kappa, the overall IRR estimate was 0.17 (poor strength of…

Descriptors: High Stakes Tests, Performance Based Assessment, Teacher Effectiveness, Academic Language

Regression Effects in Angoff Ratings: Examples from Credentialing Exams

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2018

This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…

Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)

The Role of Lexical Properties and Cohesive Devices in Text Integration and Their Effect on Human Ratings of Speaking Proficiency

Peer reviewed

Direct link

Crossley, Scott; Clevinger, Amanda; Kim, YouJin – Language Assessment Quarterly, 2014

There has been a growing interest in the use of integrated tasks in the field of second language testing to enhance the authenticity of language tests. However, the role of text integration in test takers' performance has not been widely investigated. The purpose of the current study is to examine the effects of text-based relational (i.e.,…

Descriptors: Language Proficiency, Connected Discourse, Language Tests, English (Second Language)

Severity of Grading across Time Periods.

Download full text

Lunz, Mary E.; Stahl, John A. – 1990

Three examinations administered to medical students were analyzed to determine differences among severities of judges' assessments and among grading periods. The examinations included essay, clinical, and oral forms of the tests. Twelve judges graded the three essays for 32 examinees during a 4-day grading session, which was divided into eight…

Descriptors: Clinical Diagnosis, Comparative Testing, Difficulty Level, Essay Tests

Defining Minimal Competence.

Peer reviewed

Mills, Craig N.; And Others – Educational Measurement: Issues and Practice, 1991

An approach is presented to the definition of minimal competence for judges to use in standard setting. Panelists in standard setting must receive training to ensure that differences in rating result from differences in perceptions of item difficulty, not in differences of opinion about the definition of minimal competence. (SLD)

Descriptors: Cutting Scores, Decision Making, Definitions, Difficulty Level

Training Judges to Generate Standard-Setting Data.

Peer reviewed

Reid, Jerry B. – Educational Measurement: Issues and Practice, 1991

Training judges to generate item ratings in standard setting once the reference group has been defined is discussed. It is proposed that sensitivity to the factors that determine difficulty can be improved through training. Three criteria for determining when training is sufficient are offered. (SLD)

Descriptors: Computer Assisted Instruction, Difficulty Level, Evaluators, Interrater Reliability

Effects of Item Context on Intrajudge Consistency of Expert Judgments via the Nedelsky Standard Setting Method.

Peer reviewed

Plake, Barbara S.; Melican, Gerald J. – Educational and Psychological Measurement, 1989

The impact of overall test length and difficulty on the expert judgments of item performance by the Nedelsky method were studied. Five university-level instructors predicting the performance of minimally competent candidates on a mathematics examination were fairly consistent in their assessments regardless of length or difficulty of the test.…

Descriptors: Difficulty Level, Estimation (Mathematics), Evaluators, Higher Education

Variation among Examiners and Protocols on Oral Examinations.

Lunz, Mary E.; And Others – 1989

A method for understanding and controlling the multiple facets of an oral examination (OE) or other judge-intermediated examination is presented and illustrated. This study focused on determining the extent to which the facets model (FM) analysis constructs meaningful variables for each facet of an OE involving protocols, examiners, and…

Descriptors: Computer Software, Difficulty Level, Evaluators, Examiners

The Relationship between Modified Angoff Knowledge Estimation Judgments and Item Difficulty Values for Seven NTE Specialty Area Tests.

Wheeler, Patricia – 1991

The appropriateness of the Angoff method (W. H. Angoff, 1971) for setting standards on tests was studied. Evaluators (judges) from California school districts and teacher training institutions reviewed 15 NTE (National Teacher Examinations) Program Specialty Area Tests published by the Educational Testing Service for their appropriateness in…

Descriptors: Art Education, Biology, Difficulty Level, Elementary Secondary Education

Selection of Judges for Standard Setting: What Kinds? How Many?

Jaeger, Richard M. – 1989

Criteria for the selection of judges (evaluators) for setting item-based standards involved in tests for which cutting scores must be established are investigated. Focus is on cases in which test standards are based on specialists' judgments concerning the difficulty of test items in tests used to determine who will be awarded a diploma, admitted…

Descriptors: College Entrance Examinations, Cutting Scores, Difficulty Level, Estimation (Mathematics)