NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 29 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Ozdemir, Burhanettin; Gelbal, Selahattin – Education and Information Technologies, 2022
The computerized adaptive tests (CAT) apply an adaptive process in which the items are tailored to individuals' ability scores. The multidimensional CAT (MCAT) designs differ in terms of different item selection, ability estimation, and termination methods being used. This study aims at investigating the performance of the MCAT designs used to…
Descriptors: Scores, Computer Assisted Testing, Test Items, Language Proficiency
Peer reviewed Peer reviewed
Direct linkDirect link
Silber, Henning; Roßmann, Joss; Gummer, Tobias – International Journal of Social Research Methodology, 2018
In this article, we present the results of three question design experiments on inter-item correlations, which tested a grid design against a single-item design. The first and second experiments examined the inter-item correlations of a set with five and seven items, respectively, and the third experiment examined the impact of the question design…
Descriptors: Foreign Countries, Online Surveys, Experiments, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Schnoor, Birger; Hartig, Johannes; Klinger, Thorsten; Naumann, Alexander; Usanova, Irina – Language Testing, 2023
Research on assessing English as a foreign language (EFL) development has been growing recently. However, empirical evidence from longitudinal analyses based on substantial samples is still needed. In such settings, tests for measuring language development must meet high standards of test quality such as validity, reliability, and objectivity, as…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Longitudinal Studies
Peer reviewed Peer reviewed
Direct linkDirect link
O'Donnell, Shannon; Tavares, Francisco; McMaster, Daniel; Chambers, Samuel; Driller, Matthew – Measurement in Physical Education and Exercise Science, 2018
The current study aimed to assess the validity and test-retest reliability of a linear position transducer when compared to a force plate through a counter-movement jump in female participants. Twenty-seven female recreational athletes (19 ± 2 years) performed three counter-movement jumps simultaneously using the linear position transducer and…
Descriptors: Test Validity, Test Reliability, Females, Athletes
Peer reviewed Peer reviewed
Direct linkDirect link
Brennan, Robert L. – Applied Measurement in Education, 2011
Broadly conceived, reliability involves quantifying the consistencies and inconsistencies in observed scores. Generalizability theory, or G theory, is particularly well suited to addressing such matters in that it enables an investigator to quantify and distinguish the sources of inconsistencies in observed scores that arise, or could arise, over…
Descriptors: Generalizability Theory, Test Theory, Test Reliability, Item Response Theory
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tourangeau, Karen; Nord, Christine; Lê, Thanh; Wallner-Allen, Kathleen; Vaden-Kiernan, Nancy; Blaker, Lisa; Najarian, Michelle – National Center for Education Statistics, 2018
This manual provides guidance and documentation for users of the longitudinal kindergarten-fourth grade (K-4) data file of the Early Childhood Longitudinal Study, Kindergarten Class of 2010-11 (ECLS-K:2011). It mainly provides information specific to the fourth-grade round of data collection. The first chapter provides an overview of the…
Descriptors: Children, Longitudinal Studies, Surveys, Kindergarten
Peer reviewed Peer reviewed
Direct linkDirect link
Yao, Lihua – Psychometrika, 2012
Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…
Descriptors: Item Banks, Test Length, Simulation, Adaptive Testing
Kleinke, David J. – 1976
Data from 200 college-level tests were used to compare three reliability approximations (two of Saupe and one of Cureton) to Kuder-Richardson Formula 20 (KR20). While the approximations correlated highly (about .9) with the reliability estimate, they tended to be underapproximations. The explanation lies in an apparent bias of Lord's approximation…
Descriptors: Comparative Analysis, Correlation, Error of Measurement, Statistical Analysis
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007
Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…
Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models
Peer reviewed Peer reviewed
Zimmerman, Donald W.; And Others – Journal of Experimental Education, 1984
Three types of test were compared: a completion test, a matching test, and a multiple-choice test. The completion test was more reliable than the matching test, and the matching test was more reliable than the multiple-choice test. (Author/BW)
Descriptors: Comparative Analysis, Error of Measurement, Higher Education, Mathematical Models
Peer reviewed Peer reviewed
Huynh, Huynh; Saunders, Joseph C. – Journal of Educational Measurement, 1980
Single administration (beta-binomial) estimates for the raw agreement index p and the corrected-for-chance kappa index in mastery testing are compared with those based on two test administrations in terms of estimation bias and sampling variability. Bias is about 2.5 percent for p and 10 percent for kappa. (Author/RL)
Descriptors: Comparative Analysis, Error of Measurement, Mastery Tests, Mathematical Models
Reid, Jerry B.; Roberts, Dennis M. – 1978
Comparisons of corresponding values of phi and kappa coefficients were made for 270 instances of data generated by a Monte Carlo technique to simulate a test-retest situation. Data were generated for distributions with the same mean but three different levels of standard deviation, standard error of measurement and cutting score. Ten samples of…
Descriptors: Comparative Analysis, Correlation, Criterion Referenced Tests, Cutting Scores
Peer reviewed Peer reviewed
Brennan, Robert L.; Kane, Michael T. – Psychometrika, 1977
Using the assumption of randomly parallel tests and concepts from generalizability theory, three signal/noise ratios for domain-referenced tests are developed, discussed, and compared. The three ratios have the same noise but different signals depending upon the kind of decision to be made as a result of measurement. (Author/JKS)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Error of Measurement, Mathematical Models
Peer reviewed Peer reviewed
Mellenbergh, Gideon J.; van der Linden, Wim J. – Applied Psychological Measurement, 1979
For six tests, coefficient delta as an index for internal optimality is computed. Internal optimality is defined as the magnitude of risk of the decision procedure with respect to the true score. Results are compared with an alternative index (coefficient kappa) for assessing the consistency of decisions. (Author/JKS)
Descriptors: Classification, Comparative Analysis, Decision Making, Error of Measurement
Peer reviewed Peer reviewed
Huck, Schuyler W.; And Others – Educational and Psychological Measurement, 1981
Believing that examinee-by-item interaction should be conceptualized as true score variability rather than as a result of errors of measurement, Lu proposed a modification of Hoyt's analysis of variance reliability procedure. Via a computer simulation study, it is shown that Lu's approach does not separate interaction from error. (Author/RL)
Descriptors: Analysis of Variance, Comparative Analysis, Computer Programs, Difficulty Level
Previous Page | Next Page »
Pages: 1  |  2