The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory

Download

index.pdf

Date

2017-02-01

Author

Sahin, Alper
ANIL, DUYGU

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

287
views

0
downloads

This study investigates the effects of sample size and test length on item-parameter estimation in test development utilizing three unidimensional dichotomous models of item response theory (IRT). For this purpose, a real language test comprised of 50 items was administered to 6,288 students. Data from this test was used to obtain data sets of three test lengths (10, 20, and 30 items) and nine different sample sizes (150, 250, 350, 500, 750, 1,000, 2,000, 3,000 and 5,000 examinees). These data sets were then used to create various research conditions in which test length, sample size, and IRT model variables were manipulated to investigate item parameter estimation accuracy under different conditions. The results suggest that rather than sample size or test length, the combination of these two variables is important and samples of 150, 250, 350, 500, and 750 examinees can be used to estimate item parameters accurately in three unidimensional dichotomous IRT models, depending on test length and model employed.

Subject Keywords

Small samples in IRT, Item parameter estimation, Item response theory, Language testing, Short tests

URI

https://hdl.handle.net/11511/65131

Journal

EDUCATIONAL SCIENCES-THEORY & PRACTICE

DOI

https://doi.org/10.12738/estp.2017.1.0270

Collections

Education and Humanities, Article

Suggestions

OpenMETU
Core

Detecting differential item functioning across age groups of children on the Turkish receptive language test Erol Korkmaz, Habibe Tuğba; Stark, Stephan; Kazak Berument, Sibel; Güven, Ayşe Gül (2012-01-01) This study investigated the use of differential item functioning (DIF) methods for examining the measurement invariance of items in the Turkish Receptive Language Test for children. Two groups of children differing in age were compared. DIF analyses were conducted using Lord’s chi-square test, the likelihood ratio test, and the differential functioning of items and tests (DFIT) method. Overall, 5 out of 38 items were consistently identified as having DIF by the ...
The preditive validity of Başkent University proficiency exam (BUEPE) through the use of the three-parameter IRT model's ability estimates Yeğin, Oya Perim; Berberoğlu, Halil Giray; Department of Educational Sciences (2003) The purpose of the present study is to investigate the predictive validity of the BUEPE through the use of the three-parameter IRT model̕s ability estimates. The study made use of the BUEPE September 2000 data which included the responses of 699 students. The predictive validity was established by using the departmental English courses (DEC) passing grades of a total number of 371 students. As for the prerequisite analysis the best fitted model of IRT was determined by first, checking the assumptions of IRT...
Effects of Content Balancing and Item Selection Method on Ability Estimation in Computerized Adaptive Tests Sahin, Alper; ÖZBAŞI, DURMUŞ (2017-01-01) Purpose: This study aims to reveal effects of content balancing and item selection method on ability estimation in computerized adaptive tests by comparing Fisher's maximum information (FMI) and likelihood weighted information (LWI) methods. Research Methods: Four groups of examinees (250, 500, 750, 1000) and a bank of 500 items with 10 different content domains were generated through Monte Carlo simulations. Examinee ability was estimated by fixing all settings except for the item selection methods mention...
The Effect of Training Data on Hyperspectral Classification Algorithms Özdemir, Okan Bilge; Cetin, Yasemin Yardimci (2013-01-01) In this study, the performance of different hyperspectral classification algorithms with the same training set is investigated. In addition, the effect of the dimension and sampling strategy for the training set selection is demonstrated. Support Vector Machines (SVM), K-Nearest Neighbor (K-NN) and Maximum Likelihood (ML) methods are used. The contribution of using spatial information with spectral information is observed. Meanshift segmentation and window weighting methods are used for spatial information....
A micro-analytic investigation into EFL teachers' language test item reviewing interactions Can, Hümeyra; Hatipoğlu, Çiler; Department of English Language Teaching (2020-9) This study brings an interactional perspective to the construction of syllabus-based language tests and the stage of item reviewing (IR) in particular by using Conversation Analysis (CA). Drawing on a corpus of video-recordings of IR sessions (25 hours) in an English preparatory school at a state university in Turkey, it investigates how EFL teachers review language test items prepared for their students in and through interaction with the item writer who is one of the teachers assigned in the testing offic...

Citation Formats

A. Sahin and D. ANIL, “The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory,” EDUCATIONAL SCIENCES-THEORY & PRACTICE, pp. 321–335, 2017, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/65131.