ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	1

Descriptor

Test Construction	14
Test Reliability	14
Test Validity	8
Higher Education	7
Rating Scales	4
Test Format	4
Test Items	4
Computer Assisted Testing	3
Adaptive Testing	2
Elementary Secondary Education	2
Evaluation Methods	2
Foreign Countries	2
Intelligence Tests	2
Item Analysis	2
Item Response Theory	2
Measurement	2
Multiple Choice Tests	2
Psychometrics	2
Response Style (Tests)	2
Scores	2
Semantic Differential	2
Statistical Analysis	2
Academic Ability	1
Analysis of Variance	1
Behavior Rating Scales	1
More ▼

Source

Applied Psychological…

Author

Barnes, Janet L.	1
Bejar, Isaac I.	1
Budescu, David V.	1
Burisch, Matthias	1
Cudeck, Robert	1
Goh, David S.	1
Hambleton, Ronald K., Ed.	1
Hsu, Louis M.	1
Kaiser, Henry F.	1
Landy, Frank J.	1
Luecht, Richard M.	1
Mann, Irene T.	1
Rounds, James B., Jr.	1
Schmeck, Ronald Ray	1
Serlin, Ronald C.	1
Wang, Wen-Chung	1
Yocom, Peter	1
More ▼

Publication Type

Journal Articles	10
Reports - Research	6
Reports - Evaluative	4
Collected Works - Serials	1
Tests/Questionnaires	1

Education Level

Audience

Location

West Germany

Laws, Policies, & Programs

Assessments and Surveys

Hidden Figures Test	1
Minnesota Importance…	1
Stanford Binet Intelligence…	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

A Critique of Raju and Oshima's Prophecy Formulas for Assessing the Reliability of Item Response Theory-Based Ability Estimates

Peer reviewed

Direct link

Wang, Wen-Chung – Applied Psychological Measurement, 2008

Raju and Oshima (2005) proposed two prophecy formulas based on item response theory in order to predict the reliability of ability estimates for a test after change in its length. The first prophecy formula is equivalent to the classical Spearman-Brown prophecy formula. The second prophecy formula is misleading because of an underlying false…

Descriptors: Test Reliability, Item Response Theory, Computation, Evaluation Methods

Comparability of Multiple Rank Order and Paired Comparison Methods.

Peer reviewed

Rounds, James B., Jr.; And Others – Applied Psychological Measurement, 1978

Two studies compared multiple rank order and paired comparison methods in terms of psychometric characteristics and user reactions. Individual and group item responses, preference counts, and Thurstone normal transform scale values obtained by the multiple rank order method were found to be similar to those obtained by paired comparisons.…

Descriptors: Higher Education, Measurement, Rating Scales, Response Style (Tests)

Contributions to the Method of Paired Comparisons.

Peer reviewed

Kaiser, Henry F.; Serlin, Ronald C. – Applied Psychological Measurement, 1978

A least-squares solution for the method of paired comparisons is given. The approach provokes a theorem regarding the amount of data necessary and sufficient for a solution to be obtained. A measure of the internal consistency of the least-squares fit is developed. (Author/CTM)

Descriptors: Higher Education, Least Squares Statistics, Mathematical Models, Measurement

Implied Orders Tailored Testing: Simulation with the Stanford-Binet.

Peer reviewed

Cudeck, Robert; And Others – Applied Psychological Measurement, 1980

Tailored testing by Cliff's method of implied orders was simulated through the use of responses gathered during conventional administration of the Stanford-Binet Intelligence Scale. Tailoring eliminated approximately half the responses with only modest decreases in score reliability. (Author/BW)

Descriptors: Adaptive Testing, Computer Assisted Testing, Elementary Secondary Education, Intelligence Tests

Construction Strategies for Multiscale Personality Inventories

Peer reviewed

Burisch, Matthias – Applied Psychological Measurement, 1978

Sets of inventory scales were constructed from a common item pool, using variants of what are here called the Inductive, Deductive, and External strategies. Peer ratings for 21 traits served as criteria. Very little variation in validity was attributable to construction strategies. (Author/CTM)

Descriptors: Deduction, Foreign Countries, Higher Education, Induction

Multidimensional Computerized Adaptive Testing in a Certification or Licensure Context.

Peer reviewed

Luecht, Richard M. – Applied Psychological Measurement, 1996

The example of a medical licensure test is used to demonstrate situations in which complex, integrated content must be balanced at the total test level for validity reasons, but items assigned to reportable subscore categories may be used under a multidimensional item response theory adaptive paradigm to improve subscore reliability. (SLD)

Descriptors: Adaptive Testing, Certification, Computer Assisted Testing, Licensing Examinations (Professions)

Contributions to Criterion-Referenced Testing Technology.

Peer reviewed

Hambleton, Ronald K., Ed. – Applied Psychological Measurement, 1980

This special issue covers recent technical developments in the field of criterion-referenced testing. An introduction, six papers, and two commentaries dealing with test development, test score uses, and evaluation of scores review relevant literature, offer new models and/or results, and suggest directions for additional research. (SLD)

Descriptors: Criterion Referenced Tests, Mastery Tests, Measurement Techniques, Standard Setting (Scoring)

Development of a Self-Report Inventory for Assessing Individual Differences in Learning Processes

Peer reviewed

Schmeck, Ronald Ray; And Others – Applied Psychological Measurement, 1977

Five studies are presented describing the development of a self-report inventory for measuring individual differences in learning processes. Factor analysis of items yielded four scales: Synthesis-Analysis, Study Methods, Fact Retention, and Elaborative Processing. There were no sex differences, and the scales demonstrated acceptable reliabilities…

Descriptors: Factor Analysis, Higher Education, Learning Processes, Retention (Psychology)

Ordering Power of Separate versus Grouped True-False Tests: Interaction of Type of Test with Knowledge Levels of Examinees.

Peer reviewed

Hsu, Louis M. – Applied Psychological Measurement, 1979

A comparison of the relative ordering power of separate and grouped-items true-false tests indicated that neither type of test was uniformly superior to the other across all levels of knowledge of examinees. Grouped-item tests were found superior for examinees with low levels of knowledge. (Author/CTM)

Descriptors: Academic Ability, Knowledge Level, Multiple Choice Tests, Scores

Scaling Behavioral Anchors.

Peer reviewed

Barnes, Janet L.; Landy, Frank J. – Applied Psychological Measurement, 1979

Although behaviorally anchored rating scales have both intuitive and empirical appeal, they have not always yielded superior results in contrast with graphic rating scales. Results indicate that the choice of an anchoring procedure will depend on the nature of the actual rating process. (Author/JKS)

Descriptors: Behavior Rating Scales, Comparative Testing, Higher Education, Rating Scales

Empirical versus Random Item Selection in the Design of Intelligence Test Short Forms--The WISC-R Example.

Peer reviewed

Goh, David S. – Applied Psychological Measurement, 1979

The advantages of using psychometric thoery to design short forms of intelligence tests are demonstrated by comparing such usage to a systematic random procedure that has previously been used. The Wechsler Intelligence Scale for Children Revised (WISC-R) Short Form is presented as an example. (JKS)

Descriptors: Elementary Secondary Education, Intelligence Tests, Item Analysis, Psychometrics

An Examination of Methodological Issues Relevant to the Use and Interpretation of the Semantic Differential.

Peer reviewed

And Others; Mann, Irene T. – Applied Psychological Measurement, 1979

Several methodological problems (particularly the assumed bipolarity of scales, instructions regarding use of the midpoint, and concept-scale interaction) which may contribute to a lack of precision in the semantic differential technique were investigated. Results generally supported the use of the semantic differential. (Author/JKS)

Descriptors: Analysis of Variance, Computer Assisted Testing, Higher Education, Rating Scales

A Generative Approach to the Modeling of Isomorphic Hidden-Figure Items.

Peer reviewed

Bejar, Isaac I.; Yocom, Peter – Applied Psychological Measurement, 1991

An approach to test modeling is illustrated that encompasses both response consistency and response difficulty. This generative approach makes validation an ongoing process. An analysis of hidden figure items with 60 high school students supports the feasibility of the method. (SLD)

Descriptors: Construct Validity, Difficulty Level, Evaluation Methods, High School Students

On the Feasibility of Multiple Matching Tests--Variations on a Theme by Gulliksen.

Peer reviewed

Budescu, David V. – Applied Psychological Measurement, 1988

A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)

Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)