مقایسه ویژگی‌های روان‌سنجی سؤالات تشریحی و چندگزینه‌ای در دروس ریاضی و علوم پایه ششم آموزش ابتدایی، بر اساس تئوری‌های کلاسیک و سؤال-پاسخ

نوع مقاله : مقاله پژوهشی

نویسندگان

گروه روانشناسی ، دانشکده علوم اقتصادی و اجتماعی، دانشگاه بو علی سینا

10.22034/trj.2025.142154.2067

چکیده

این پژوهش با روش توصیفی به تحلیل ویژگی‌های روان‌سنجی سؤالات تشریحی و چندگزینه‌ای در دروس ریاضی و علوم برای دانش‌آموزان ششم ابتدایی با استفاده از تئوری کلاسیک و تئوری سوال-پاسخ پرداخته است. نمونه آماری شامل 388 دانش‌آموز دختر و پسر ششم ابتدایی شهر همدان بود که به‌صورت تصادفی خوشه ای انتخاب شدند. برای جمع‌آوری اطلاعات، از دو آزمون معلم‌ساخته در دروس علوم و ریاضی که شامل سؤالات چندگزینه‌ای و تشریحی بودند، استفاده شدکه براساس تئوری سوال_پاسخ و تئوری کلاسیک مورد تحلیل قرار گرفت . در تئوری کلاسیک برای سؤالات تشریحی، پارامتر های ضریب تمیز و دشواری و برای سؤالات چندگزینه ای، پارامتر های ضریب تمیز و ضریب تشخیص مورد بررسی قرار گرفت. همچنین، در تئوری سوال-پاسخ برای سؤالات تشریحی پارامترهای تمیز، آستانه و حدس و برای سؤالات چندگزینه ای پارامتر های شیب تشخیص ،آستانه و حدس به تفصیل تحلیل شد. برای تجزیه و تحلیل نتایج، از افزونه e-IRT استفاده شد و پارامترهای مربوط به سؤالات چندگزینه‌ای بر اساس مدل سه‌عاملی و سؤالات تشریحی بر اساس روش پاسخ مدرج تحلیل شدند. نتایج این پژوهش نشان داد که سؤالات تشریحی نسبت به سؤالات چندگزینه‌ای در هردو درس ریاضی و علوم با متوسط ضریب تمیز در درس علوم 208/0و درس ریاضی 55/0 و متوسط ضریب دشواری در درس علوم 591/2 و در درس ریاضی 342/2دارای عملکرد بهتری می باشند و در ارزیابی توانایی‌های دانش‌آموزان مؤثرتر هستند .

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Comparison of the Psychometric Properties of Essay and Multiple-Choice Questions in Math and Science for Sixth-Grade Students Based on Classical Test Theory and Item Response Theory

نویسندگان [English]

  • Afshin Afzali
  • Abolghasem Yaghoobi
  • Mohamad Aref Pilehvarpour
  • Kazhal Azizi
  • Darya Qanei
Department of Psychology, Faculty of Economic and Social Sciences, Bu-Ali Sina University
چکیده [English]

This study analyzes the psychometric properties of essay and multiple-choice questions in math and science for sixth-grade students using Classical Test Theory (CTT) and Item Response Theory (IRT). There are two prominent theories for analyzing test questions: Classical Test Theory (CTT) and Item Response Theory (IRT). In CTT, the unit of analysis is the entire test, while in IRT, the unit of analysis is each individual item. CTT has been a foundational theory in measurement for several decades, defined as a simple linear model stating that the observed score on a test is the sum of the true score and measurement error. This model consists of three components: the observed score, the true score, and the error score. The central idea regarding the relationship between the true score, observed score, and measurement error provides CTT with the ability to explain factors affecting test scores. CTT is based on three assumptions: first, the correlation between error scores and true scores is zero; second, errors have a mean of zero; and third, measurements of parallel tests are uncorrelated. CTT has been used for decades as a model for assessing the reliability and validity of measurement tools. According to the literature, CTT involves three main concepts: (a) the test score, also known as the observed score, (b) the true score, and (c) error scores. CTT focuses on two main aspects: item difficulty and item discrimination. Item difficulty refers to the proportion of individuals who can correctly answer the question. Generally, the more difficult the question, the lower the percentage of individuals answering correctly. The primary index for measuring item difficulty is the difficulty index. On the other hand, item discrimination refers to the ability of an item to differentiate between "high-performing" and "low-performing" individuals. IRT is based on the assumption that the abilities of one or more participants, denoted by θ (theta), are predictable. In Item Response Theory (IRT), important parameters for each item are defined: - Discrimination parameter (a): This parameter indicates how well an item can differentiate between more and less skilled students. The higher the value of this parameter, the better the item discriminates between students. - Difficulty parameter (b): This parameter indicates how hard or easy an item is. If the b parameter for an item is high, the item will be difficult for students with lower ability, and vice versa. - Guessing parameter (c): This parameter shows the likelihood that a student will correctly answer an item by guessing. This parameter is typically used for dichotomous items (yes/no). The sample consisted of 388 sixth-grade students from Hamadan, selected using cluster sampling. Given the study's objective to evaluate the performance of multiple-choice and essay questions in science and math, a survey approach was utilized with methods based on CTT and IRT. The study population included all sixth-grade students in Hamadan during the 2023-2024 academic year, with a sample size of 388 determined using the Morgan table. This sample was selected randomly from 6 schools (3 girls' schools and 3 boys' schools). To collect data, two teacher-made tests for science and math, containing both multiple-choice and essay questions, were used. To assess validity, each test was reviewed by four teachers (with at least 6 years of teaching experience) and was then piloted. After incorporating the experts' feedback, the tests were finalized and used for data collection. Note that a grading (or partial credit) method was used for scoring the essay questions (Saif, 2016). To analyze the results, the e-IRT software was utilized. Parameters for multiple-choice questions were estimated based on the 3-parameter model, while essay questions were analyzed using the Graded Response Model. Results indicated that essay questions performed better than multiple-choice questions in both science and math. Specifically, for essayquestions, the average discrimination index in science was 0.208 and in math was 0.55, while the average difficulty index in science was 2.591 and in math was 2.342, reflecting better discrimination and difficulty for essay questions. Additionally, analysis of essay questions using IRT showed that all four questions in math had a discrimination parameter above 1.35, and in science, the discrimination values for questions were 0.087, 1.090, 0.844, 1.419, and 0.533 respectively. Furthermore, the threshold parameter showed positive changes at each step in both science and math, indicating better discrimination and threshold ability of essay questions. The better performance of descriptive questions compared to multiple-choice questions is due to several factors, which are examined below. Descriptive questions allow students and learners to delve deeply into subjects and demonstrate their critical and analytical thinking abilities. This type of question is particularly suitable for assessing skills that require analysis, evaluation, and synthesis of information (Anderson, 2001). Unlike multiple-choice questions that restrict the student to choosing one option, descriptive questions allow them to express their ideas in detail and creatively (Biggs, 2011). Descriptive questions enable higher levels of learning, such as analysis, synthesis, and evaluation, to be assessed, which is less feasible in other assessment methods. Students can tailor their responses based on their experiences, prior knowledge, and personal opinions. This feature is particularly useful for assessing complex or multifaceted topics (Moon, 2006). Despite the advantages of descriptive questions, they are used less frequently for several reasons. Scoring descriptive questions requires a significant amount of time, and there is a possibility of human error in the evaluation. This issue can lead to differences in scoring between different evaluators (Brown, 2013). Descriptive questions may have less reliability compared to multiple-choice questions because responses may be influenced by non-academic factors such as writing skills, student fatigue, or limited time (Gipps, 1994). Descriptive questions typically cover only one or two topics and cannot fully assess the entire educational content. As a result, they may present an incomplete picture of a student's knowledge and abilities (Race, 2014). This research, despite its valuable findings, has limitations that should be considered when interpreting the results and applying them. One of the limitations of this research is related to the type of questions used. The questions used (descriptive and multiple-choice) may not have fully reflected all aspects of the students' abilities, as each type of question emphasizes different aspects of abilities. The next limitation is the use of classical theory and item response theory as the main analysis methods. Although these methods provided useful information, they may not have covered all the complex aspects of the psychometric characteristics of the questions. This research only focused on math and science subjects, so its results may not be generalizable to other subjects. Additionally, factors such as test conditions and student stress may have affected the results and were not fully controlled.

کلیدواژه‌ها [English]

  • Psychometric properties
  • Classical Test Theory
  • Item Response Theory
  • Discrimination index
  • Threshold index