NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Type
Journal Articles76
Reports - Evaluative76
Reports - Research2
Speeches/Meeting Papers2
Collected Works - General1
Information Analyses1
Audience
Practitioners1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 76 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Matayoshi, Jeffrey; Cosyn, Eric; Uzun, Hasan – International Journal of Artificial Intelligence in Education, 2021
Many recent studies have looked at the viability of applying recurrent neural networks (RNNs) to educational data. In most cases, this is done by comparing their performance to existing models in the artificial intelligence in education (AIED) and educational data mining (EDM) fields. While there is increasing evidence that, in many situations,…
Descriptors: Artificial Intelligence, Data Analysis, Student Evaluation, Adaptive Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Isbell, Daniel R.; Kremmel, Benjamin – Language Testing, 2020
Administration of high-stakes language proficiency tests has been disrupted in many parts of the world as a result of the 2019 novel coronavirus pandemic. Institutions that rely on test scores have been forced to adapt, and in many cases this means using scores from a different test, or a new online version of an existing test, that can be taken…
Descriptors: Language Tests, High Stakes Tests, Language Proficiency, Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Norris, John; Drackert, Anastasia – Language Testing, 2018
The Test of German as a Foreign Language (TestDaF) plays a critical role as a standardized test of German language proficiency. Developed and administered by the Society for Academic Study Preparation and Test Development (g.a.s.t.), TestDaF was launched in 2001 and has experienced persistent annual growth, with more than 44,000 test takers in…
Descriptors: German, Second Language Learning, Language Tests, Language Proficiency
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Vaheoja, Monika; Verhelst, N. D.; Eggen, T.J.H.M. – European Journal of Science and Mathematics Education, 2019
In this article, the authors applied profile analysis to Maths exam data to demonstrate how different exam forms, differing in difficulty and length, can be reported and easily interpreted. The results were presented for different groups of participants and for different institutions in different Maths domains by evaluating the balance. Some…
Descriptors: Feedback (Response), Foreign Countries, Statistical Analysis, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Isbell, Dan; Winke, Paula – Language Testing, 2019
The American Council on the Teaching of Foreign Languages (ACTFL) oral proficiency interview -- computer (OPIc) testing system represents an ambitious effort in language assessment: Assessing oral proficiency in over a dozen languages, on the same scale, from virtually anywhere at any time. Especially for users in contexts where multiple foreign…
Descriptors: Oral Language, Language Tests, Language Proficiency, Second Language Learning
Peer reviewed Peer reviewed
Direct linkDirect link
Runco, Mark A.; Walczyk, Jeffrey John; Acar, Selcuk; Cowger, Ernest L.; Simundson, Melissa; Tripp, Sunny – Journal of Creative Behavior, 2014
This article describes an empirical refinement of the "Runco Ideational Behavior Scale" (RIBS). The RIBS seems to be associated with divergent thinking, and the potential for creative thinking, but it was possible that its validity could be improved. With this in mind, three new scales were developed and the unique benefit (or…
Descriptors: Behavior Rating Scales, Creative Thinking, Test Validity, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Doskey, Elena M.; Lagunas, Brenda; SooHoo, Michelle; Lomax, Amanda; Bullick, Stephanie – Journal of Psychoeducational Assessment, 2013
The Speed DIAL-4 was developed from the Developmental Indicators for the Assessment of Learning, Fourth Edition (DIAL-4), a screening designed to identify children between the ages of 2 years, 6 months through 5 years, 11 months "who are in need of intervention or diagnostic assessment in the following areas: motor, concepts, language,…
Descriptors: Screening Tests, Young Children, Test Length, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Stucky, Brian D.; Thissen, David; Edelen, Maria Orlando – Applied Psychological Measurement, 2013
Test developers often need to create unidimensional scales from multidimensional data. For item analysis, "marginal trace lines" capture the relation with the general dimension while accounting for nuisance dimensions and may prove to be a useful technique for creating short-form tests. This article describes the computations needed to obtain…
Descriptors: Test Construction, Test Length, Item Analysis, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Goodman, Joshua T.; Willse, John T.; Allen, Nancy L.; Klaric, John S. – Educational and Psychological Measurement, 2011
The Mantel-Haenszel procedure is a popular technique for determining items that may exhibit differential item functioning (DIF). Numerous studies have focused on the strengths and weaknesses of this procedure, but few have focused the performance of the Mantel-Haenszel method when structurally missing data are present as a result of test booklet…
Descriptors: Test Bias, Identification, Tests, Test Length
Peer reviewed Peer reviewed
Direct linkDirect link
Yao, Lihua – Applied Psychological Measurement, 2013
Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Peer reviewed Peer reviewed
Direct linkDirect link
de la Torre, Jimmy; Song, Hao; Hong, Yuan – Applied Psychological Measurement, 2011
Lack of sufficient reliability is the primary impediment for generating and reporting subtest scores. Several current methods of subscore estimation do so either by incorporating the correlational structure among the subtest abilities or by using the examinee's performance on the overall test. This article conducted a systematic comparison of four…
Descriptors: Item Response Theory, Scoring, Methods, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models
Peer reviewed Peer reviewed
Direct linkDirect link
Eggen, Theo J. H. M. – Educational Research and Evaluation, 2011
If classification in a limited number of categories is the purpose of testing, computerized adaptive tests (CATs) with algorithms based on sequential statistical testing perform better than estimation-based CATs (e.g., Eggen & Straetmans, 2000). In these computerized classification tests (CCTs), the Sequential Probability Ratio Test (SPRT) (Wald,…
Descriptors: Test Length, Adaptive Testing, Classification, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Chou, Yeh-Tai; Wang, Wen-Chung – Educational and Psychological Measurement, 2010
Dimensionality is an important assumption in item response theory (IRT). Principal component analysis on standardized residuals has been used to check dimensionality, especially under the family of Rasch models. It has been suggested that an eigenvalue greater than 1.5 for the first eigenvalue signifies a violation of unidimensionality when there…
Descriptors: Test Length, Sample Size, Correlation, Item Response Theory
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6