NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Location
California2
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Isbell, Daniel R.; Son, Young-A – Studies in Second Language Acquisition, 2022
Elicited Imitation Tests (EITs) are commonly used in second language acquisition (SLA)/bilingualism research contexts to assess the general oral proficiency of study participants. While previous studies have provided valuable EIT construct-related validity evidence, some key gaps remain. This study uses an integrative data analysis to further…
Descriptors: Bilingualism, Imitation, Language Tests, Second Language Learning
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lyness, Scott A.; Peterson, Kent; Yates, Kenneth – Education Sciences, 2021
The Performance Assessment for California Teachers (PACT) is a high stakes summative assessment that was designed to measure pre-service teacher readiness. We examined the inter-rater reliability (IRR) of trained PACT evaluators who rated 19 candidates. As measured by Cohen's weighted kappa, the overall IRR estimate was 0.17 (poor strength of…
Descriptors: High Stakes Tests, Performance Based Assessment, Teacher Effectiveness, Academic Language
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Applied Measurement in Education, 2018
This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Crossley, Scott; Clevinger, Amanda; Kim, YouJin – Language Assessment Quarterly, 2014
There has been a growing interest in the use of integrated tasks in the field of second language testing to enhance the authenticity of language tests. However, the role of text integration in test takers' performance has not been widely investigated. The purpose of the current study is to examine the effects of text-based relational (i.e.,…
Descriptors: Language Proficiency, Connected Discourse, Language Tests, English (Second Language)
Lunz, Mary E.; Stahl, John A. – 1990
Three examinations administered to medical students were analyzed to determine differences among severities of judges' assessments and among grading periods. The examinations included essay, clinical, and oral forms of the tests. Twelve judges graded the three essays for 32 examinees during a 4-day grading session, which was divided into eight…
Descriptors: Clinical Diagnosis, Comparative Testing, Difficulty Level, Essay Tests
Peer reviewed Peer reviewed
Mills, Craig N.; And Others – Educational Measurement: Issues and Practice, 1991
An approach is presented to the definition of minimal competence for judges to use in standard setting. Panelists in standard setting must receive training to ensure that differences in rating result from differences in perceptions of item difficulty, not in differences of opinion about the definition of minimal competence. (SLD)
Descriptors: Cutting Scores, Decision Making, Definitions, Difficulty Level
Peer reviewed Peer reviewed
Reid, Jerry B. – Educational Measurement: Issues and Practice, 1991
Training judges to generate item ratings in standard setting once the reference group has been defined is discussed. It is proposed that sensitivity to the factors that determine difficulty can be improved through training. Three criteria for determining when training is sufficient are offered. (SLD)
Descriptors: Computer Assisted Instruction, Difficulty Level, Evaluators, Interrater Reliability
Peer reviewed Peer reviewed
Plake, Barbara S.; Melican, Gerald J. – Educational and Psychological Measurement, 1989
The impact of overall test length and difficulty on the expert judgments of item performance by the Nedelsky method were studied. Five university-level instructors predicting the performance of minimally competent candidates on a mathematics examination were fairly consistent in their assessments regardless of length or difficulty of the test.…
Descriptors: Difficulty Level, Estimation (Mathematics), Evaluators, Higher Education
Lunz, Mary E.; And Others – 1989
A method for understanding and controlling the multiple facets of an oral examination (OE) or other judge-intermediated examination is presented and illustrated. This study focused on determining the extent to which the facets model (FM) analysis constructs meaningful variables for each facet of an OE involving protocols, examiners, and…
Descriptors: Computer Software, Difficulty Level, Evaluators, Examiners
Wheeler, Patricia – 1991
The appropriateness of the Angoff method (W. H. Angoff, 1971) for setting standards on tests was studied. Evaluators (judges) from California school districts and teacher training institutions reviewed 15 NTE (National Teacher Examinations) Program Specialty Area Tests published by the Educational Testing Service for their appropriateness in…
Descriptors: Art Education, Biology, Difficulty Level, Elementary Secondary Education
Jaeger, Richard M. – 1989
Criteria for the selection of judges (evaluators) for setting item-based standards involved in tests for which cutting scores must be established are investigated. Focus is on cases in which test standards are based on specialists' judgments concerning the difficulty of test items in tests used to determine who will be awarded a diploma, admitted…
Descriptors: College Entrance Examinations, Cutting Scores, Difficulty Level, Estimation (Mathematics)