Comparison of ANGOF-based IRT method and Bookmark method for standard Setting of MSRT language test

Document Type : Research Paper

Authors

1 PhD student in assessment and measurement at Allameh Tabatabai University, Department of counseling and Psychology

2 Master, assessment and measurement, University of Allameh Tabatabai

3 Department of Measurement of Allameh Tabatabai University

https://doi.org/10.34785/J012.2019.698

Abstract

The main purpose of this study was to compare the Angof-based IRT method and the Bookmark method determine standardization of the MSRT language test. For this purpose, one sample of the MSRT test questions (September 1397 test) was randomly selected and the answers to the questions were obtained by 596 subjects from the Ministry of Science. The MSRT test has 100 questions, of which 30 are grammar, 30 are auditory, and 40 are reading comprehension. Two expert panels consisting of 15 TOEFL teaching experts were formed and then standardized using ANGOF-based IRT method, and Bookmark's standards in three separate sections (grammar, listening, Reading comprehension) were identified in three evaluation stages. The findings of the study showed that the cut-off score obtained by Angof's method based on the question-answer theory is 53.66 and the bookmark method is 54.27. Both cut scores were higher than those determined by the Ministry of Science. Also, the findings of the study showed that 73.3% of the participants rejected the traditional and traditional method of the Ministry of Science, which is based on the score of 50, but 26.7% of the participants failed but based on the Angof method. Based on the question-answer theory, rejection rates are 78.5% and acceptance rates are 23.4%. Finally, according to Bookmark, the decline in statistics is 21.5%. The research findings imply that Ministry of Science language test designers need to be revised to determine the standard score of this test.
The main purpose of this study was to compare the Angof-based IRT method and the Bookmark method to determine standardization of the MSRT language test. For this purpose, one sample of the MSRT test questions (September 1397 test) was randomly selected and the answers to the questions were obtained by 596 subjects from the Ministry of Science. The MSRT test has 100 questions, of which 30 are grammar, 30 are auditory, and 40 are reading comprehension. Two expert panels consisting of 15 TOEFL teaching experts were formed and then standardized using ANGOF-based IRT method, and Bookmark's standards in three separate sections (grammar, listening, Reading comprehension) were identified in three evaluation stages. The findings of the study showed that the cut-off score obtained by Angof's method based on the question-answer theory is 53.66 and the bookmark method is 54.27. Both cut scores were higher than those determined by the Ministry of Science. Also, the findings of the study showed that 73.3% of the participants rejected the traditional and traditional method of the Ministry of Science, which is based on the score of 50, but 26.7% of the participants failed but based on the Angof method. Based on the question-answer theory, rejection rates are 78.5% and acceptance rates are 23.4%. Finally, according to Bookmark, the decline in statistics is 21.5%. The research findings imply that Ministry of Science language test designers need to be revised to determine the standard score of this test.
The main purpose of this study was to compare the Angof-based IRT method and the Bookmark method to determine standardization of the MSRT language test. For this purpose, one sample of the MSRT test questions (September 1397 test) was randomly selected and the answers to the questions were obtained by 596 subjects from the Ministry of Science. The MSRT test has 100 questions, of which 30 are grammar, 30 are auditory, and 40 are reading comprehension. Two expert panels consisting of 15 TOEFL teaching experts were formed and then standardized using ANGOF-based IRT method, and Bookmark's standards in three separate sections (grammar, listening, Reading comprehension) were identified in three evaluation stages. The findings of the study showed that the cut-off score obtained by Angof's method based on the question-answer theory is 53.66 and the bookmark method is 54.27. Both cut scores were higher than those determined by the Ministry of Science. Also, the findings of the study showed that 73.3% of the participants rejected the traditional and traditional method of the Ministry of Science, which is based on the score of 50, but 26.7% of the participants failed but based on the Angof method. Based on the question-answer theory, rejection rates are 78.5% and acceptance rates are 23.4%. Finally, according to Bookmark, the decline in statistics is 21.5%. The research findings imply that Ministry of Science language test designers need to be revised toThe main purpose of this study was to compare the Angof-based IRT method and the Bookmark method to determine standardization of the MSRT language test. For this purpose, one sample of the MSRT test questions (September 1397 test) was randomly selected and the answers to the questions were obtained by 596 subjects from the Ministry of Science. The MSRT test has 100 questions, of which 30 are grammar, 30 are auditory, and 40 are reading comprehension. Two expert panels consisting of 15 TOEFL teaching experts were formed and then standardized using ANGOF-based IRT method, and Bookmark's standards in three separate sections (grammar, listening, Reading comprehension) were identified in three evaluation stages. The findings of the study showed that the cut-off score obtained by Angof's method based on the question-answer theory is 53.66 and the bookmark method is 54.27. Both cut scores were higher than those determined by the Ministry of Science. Also, the findings of the study showed that 73.3% of the participants rejected the traditional and traditional method of the Ministry of Science, which is based on the score of 50, but 26.7% of the participants failed but based on the Angof method. Based on the question-answer theory, rejection rates are 78.5% and acceptance rates are 23.4%. Finally, according to Bookmark, the decline in statistics is 21.5%. The research findings imply that Ministry of Science language test designers need to be revised to determine the standard score of this test. determine the standard score of this test.
The main purpose of this study was to compare the Angof-based IRT method and the Bookmark method to determine standardization of the MSRT language test. For this purpose, one sampl of the MSRT test questions (September 1397 test) was randomly selected and the answers to the questions were obtained by 596 subjects from the Ministry of Science. The MSRT test has 100 questions, of which 30 are grammar, 30 are auditory, and 40 are reading comprehension. Two expert panels consisting of 15 TOEFL teaching experts were formed and then standardized using ANGOF-based IRT method, and Bookmark's standards in three separate sections (grammar, listening, Reading comprehension) were identified in three evaluation stages. The findings of the study showed that the cut-off score obtained by Angof's method based on the question-answer theory is 53.66 and the bookmark method is 54.27

Keywords


ACT Inc. (2005d). Developing achievement levels on the 2005 National Assessment of Educational Progress in grade 12 mathematics: Pilot study report to COSDAM. Iowa City, IA: Author.
ACT Inc. (2007a). Developing achievement levels on the 2006 National Assessment of Educational Progress in grade 12 economics:Process report. Iowa City, IA: Author.
Andrich, D. (1978a). Rating formulation for ordered response categories. Psychometrika49(4), 561–573.
Angoff, W.H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.; pp. 508-600).Washington, DC: American Council on Education.
Bechger, T. M., Kuijper, H., & Maris, G. (2009). Standard setting in relation to the common European framework of reference for languages: The case of the state examination of Dutch as a second language. Language Assessment Quarterly6, 126–150.
Behuniak, P., Archambault, F. X., & Gale, R. K. (1982). Angoff and Nedelsky standard setting procedures. Implications for the validity of proficiency score interpretation. Educational and Psychological Measurement42(1), 247–255. doi:10l1177/0013164482421031.
Berk, R. A. (1996). Standard setting: The next generation (where few psychometricians have gone before!). Applied Measurement in Education, 9(3), 215–225.
Brennan, R. L., & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory. Applied Psychological Measurement4(2), 219–240. doi:10.1177/014662168000400209.
Brennan, R. L., & Lockwood, R. E. (1980). A comparison of the Nedelsky and Angoff cutting score procedures using generalizability theory. Applied Psychological Measurement, 4(2), 219–240
Buckendahl, C. W., Smith, R. W., Impara, J. C., & Plake, B. S. (2002). A comparison of Angoff and Bookmark standard setting methods. Journal of Educational Measurement39(3), 253–263. doi:10.1111/j.1745-3984.2002.tb01177.x.
Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage.
Clauser, B. E., Harik, P., Margolis, M. J., McManus, I., Mollon, J., Chis, L., & Williams, S. (2008). An empirical examination of the impact of group discussion and examinee performance information on judgments made in the Angoff standard-setting procedure. Applied Measurement in Education, 22(1), 1–21.
Clauser, B. E., Mee, J., & Margolis, M. J. (2011, April). The effect of data format on integration of performance data into Angoff judgments. Paper presented at the meeting of the National Council on Measurement in Education, New Orleans, LA.
Clauser, B.E., Harik, P., Margolis, M.J., McManus, I.C., Mollon, J., Chis, L., & Williams, S. (2009). Empirical evidence for the evaluation of performance standards estimated using the Angoff procedure. Applied Measurement in Education, 22, 1-21.
Clauser, B.E., Mee, J., Baldwin, S.G., Margolis, M.J., & Dillon, G.F. (2009). Judges’ use of examinee performance data in an Angoff standard-setting exercise for a medical licensing examination: An experimental study. Journal of Educational Measurement, 46(4), 390-407.
Clauser, B.E., Swanson, D.B., & Harik, P. (2002).A multivariate generalizability analysis of the impact of training and examinee performance information on judgments made in an Angoff-style standard-setting procedure. Journal of Educational Measurement, 39, 269-290.
Clauser, J.C. (2013). Examination of the Application of Item Response Theory to the Angoff Standard Setting Procedure. Onpuplished Dissertations. University of Massachusetts – Amherst.
Cohen, A. S., Kane, M. T., & Crooks, T. J. (1999). A generalized examinee-centred method for setting standards on achievement test. Applied Measurement in Education, 12(4), 343-366.
Dawber, T., & Lewis, D. M. (2002). The cognitive experience of bookmark standard setting participants. Paper presented at the annual meeting of the American Educational Research Association, New Orleans, LA. Retrieved from http:// www2.education.ualberta.ca/educ/psych/crame/files/standard_setting.pdf
Ferdous, A. A., & Plake, B. S. (2005). Understanding the factors that influence decisions of panelists in a standard-setting study. Applied Measurement in Education, 18(3), 257–267.
Ferdous, A. A., & Plake, B. S. (2008). Item response theory-based approaches for computing minimum passing scores from Angoff-based standard-setting study. Educational and Psychological Measurement. 68 (5),778-796.
Giraud, G., Impara, J. C., & Plake, B. S. (2005). Teachers’ conceptions of the target examinee in Angoff standard setting. Applied Measurement in Education, 18, 223–232.
Giraud, G., Impara, J. C., & Plake, B. S. (2005). Teachers’ conceptions of the target examinee in Angoff standard setting. Applied Measurement in Education, 18(3), 223–232.
Green, D. R., Trimble, C. S., & Lewis, D. M. (2003). Interpreting the results of three different standard-setting procedures. Educational Measurement: Issues and Practice22(1), 22–32. doi:10.1111/j.1745-3992.2003.tb00113.x.
Halpin, G., Sigmon, G, & Halpin, G. (1983). Minimum competency standards set by three divergent groups of raters using three judgmental procedures: Implications for validity. Educational and Psychological Measurement, 43(1), 185–196. doi:10.1177/001316448304300126.
Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.) Educational measurement (4th ed.). Westport, CT: American Council on Education & Praeger.
Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 433–470). Westport, CT: American Council on Education/Praeger.
Hambleton, R. K., & Pitoniak, M. J. (2006). Setting performance standards. In R. L. Brennan (Ed.), Educational measurement (4th ed.) (pp. 433–470). Westport, CT: Praeger Publishers.
Hambleton, R. K., Jaeger, R. M., Plake, B. S., & Mills, C. (2000). Setting performance standards on complex educational assessments. Applied Psychological Measurement, 24(4), 355-366.
Harsch, C., & Rupp, A. (2011). Designing and scaling level-specific CEFR Writing Tasks. Language Assessment Quarterly8, 1–33.
Hein, S. F., & Skaggs, G. E. (2009). A qualitative investigation of panelists’ experiences of standard setting using two variations of the bookmark method. Applied Measurement in Education, 22(3), 207–228.
Hsieh, M. (2013). An application of Multifaceted Rasch measurement in the Yes/No Angoff standard setting procedure. Language Testing. 30(4). 112-132.
Hurtz, G. M., & Auerbach, M. A. (2003). A meta-analysis of the effects of modifications to the Angoff method on cutoff scores and judgment consensus. Educational and Psychological Measurement, 63(4), 584–601.
Impara, J. C., & Plake, B. S. (1997). Standard setting: An alternative approach. Journal of Educational Measurement, 34(4), 353–366
Jaeger, R. M. (1989). Certification of student competence. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 485–514). New York: Macmillan.
Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research64, 425–461.
Kane, M. (1994). Validating the performance standards associated with passing scores. Review of Educational Research, 64(3), 425–461.
Kane, M. (1994a, October). Examinee-centered vs. task-centered standard setting. Paper presented at the Joint Conference on Standard Setting for Large-Scale Assessments, Washington, DC.
Kane, M. T. (2001). So much remains the same: Conception and status of validation in setting standards. In G. Cizek (Ed.), Standard setting: Concepts, methods, and perspectives (pp. 53-88). Mahwah, NJ: Lawrence Earlbaum Associates, Publishers.
Lewis, D. M., Mitzel, H. C., & Green, D. R. (1996, June). Standard setting: A bookmark approach. In D. R. Green (Chair), IRT-based standard-setting procedures using behavioral anchoring. Symposium conducted at the Council of Chief State School Officers National Conference on Large Scale Assessment, Phoenix, AZ.
Lewis, D. M., Mitzel, H. C., and Green, D. R. (1996, June). Standard setting: A bookmark approach. In D. R. Green (Chair), IRT-based standard-setting procedures using behavioral anchoring. Symposium conducted at the Council of Chief State School Officers National Conference on Large Scale Assessment, Phoenix, AZ.
Livingstone, S. A., & Zieky, M. J. (1989). A comparative study of standard-setting methods. Applied Measurement in Education, 2(2), 121-141.
M. Alimirzaie , A. Moghadam zadeh,A. Minaei , B. Ezanloo , K. Salehi .(2019). Sources of theDifferential ItemFunctioning and its Application inEducation, Journal of Research in Teaching( Vol 7, No 1, Spring 2019 ).
McGinty, D. (2005). Illuminating the “black box” of standard setting: An exploratory qualitative study. Applied Measurement in Education, 18(3), 269–287
McGinty, D. (2005). Illuminating the “black box” of standard setting: An exploratory qualitative study. Applied Measurement in Education, 18(3), 269–287.
Mitzel, H. C., Lewis, D. M., Patz, R. J., and Green, D. R. (2001). The bookmark procedure: Psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards (pp. 249-281). Mahwah, NJ: Lawrence Erlbaum.
National Research Council. (1999). Setting reasonable and useful performance standards. In J. W. Pelligrino, L. R. Jones,. & K. J. Mitchell (Eds.), Grading the nation’s report card: Evaluating NAEP and transforming the assessment of educational progress (pp. 162–184). Washington, DC: National Academy Press.
N. Yousofi , S. Ebadi  , M. Saedi Dovaise . Investigating the L2 Motivation of theUndergraduat Students from thePerspective of the “L2 Motivational Self System”. Journal of Research in Teaching, Vol 5, No 3, Autumn 2017
O’Neill, T. R., Buckendahl, C. W., Plake, B. S., & Taylor, L. (2007). Recommending a nursing specific passing standard for the IELTS examination. Language Assessment Quarterly4, 295–317.
of Applied Measurement2, 187-201.
Pant, H. A., Rupp, A. A., Tiffin-Richards, S. P., & Köller, O. (2009). Validity issues in standard-setting studies. Studies in Educational Evaluation, 35(2–3), 95–101.
Pant, H. A., Rupp, A. A., Tiffin-Richards, S. P., & Köller, O. (2009). Validity issues in standard-setting studies. Studies in Educational Evaluation, 35(2–3), 95–101.
Pellegrino, J. W., Jones, L. R., & Mitchell, K. J. (1999). Grading the nation’s report card: Evaluating NAEP and transforming the assessment of educational progress. Washington, DC: National Academy Press.
Shepard, L. A., Glaser, R., Linn, R. L., & Bohrnstedt, G. (1993). Setting performance standards for student achievement. A report of the National Academy of Education panel on the evaluation of the NAEP trial state assessment: An evaluation of the 1992 achievement levels. Stanford, CA: Stanford University, National Academy of Education.
Shepard, L. A., Glaser, R., Linn, R., & Bohrnstedt, G. (1993). Setting performance standards for student achievement. Stanford, CA: National Academy of Education.
Shepard, L.A. (1995). Implications for standard setting of the National Academy of
Skorupski, W. P., & Hambleton, R. K. (2005). What are panelists thinking when they participate in standard-setting studies? Applied Measurement in Education, 18(3), 233–256.
Stone, G. E. (2001). Objective standard setting (or truth in advertising). Journal
Stone, G. E., Beltyukova, S., & Fox, C. M. (2008). Objective standard setting for judge-mediated examinations. International Journal of Testing8, 180–196. doi: 10.1080/15305050802007083
Tannenbaum, R. J., & Caroline Wylie, E. (2005). Mapping English Language Proficiency Test Scores Onto the Common European Framework. ETS Research Report Series.
Wang, N. (2003). Use of the Rasch IRT model in standard setting: An item-mapping method. Journal of Educational Measurement40(3), 231–253. doi:10.1111/j.1745-3984.2003. tb01106.x.