© 2023 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Diversity is a hot issue discussed in the world of education. With its diversity, Indonesia has great potential to study such things as the geographical diversity equalization factors that affect the quality of education. Implementation of national examinations (NE) as a benchmark and standards from primary to secondary education have a different condition for each location, such as Daerah Istimewa Yogyakarta (DIY) representing the West and Nusa Tenggara Timur (NTT) representing the western region of Indonesia. The focus of this research is to find out how much difference the ability of junior high school students in Indonesia in terms of geographical location. This study uses five DIF detection methods for mathematics NE 2013/2014 school year to analyze students' different abilities. The analysis results show that, in general, the NE questions for the 2013/2014 academic year benefit the focal group / NTT on Algebra, Geometry, and Statistics/Probability material, although with lower ability compared to group reference /DIY. With the analysis carried out, policymakers can take corrective steps to focus more on fixing problems, facilities, and resources teacher power so that problem inequality from aspect geographical there is no future again in Indonesia.
Differential item function, national examinations, bias geographic
Indonesia is a country that is rich in human resources, natural resources, ethnic groups, regional languages, islands, and many others. In addition, Indonesia is also a country intensively making efforts to equalize development in various sectors such as infrastructure development and various quality improvement efforts such as education to compete with other countries. Indonesian students actively participate in assessment activities to find out Indonesia's position at the international level [1]. Such as the Program for International Students Assessment (PISA) and Trends in International Mathematics and Science Study (TIMSS) to determine the ability of Indonesian students nationally. NAs are carried out periodically, evaluated, and standardized every year.
Various countries use individual data to determine the progress of learning outcomes for each student [2, 3]. One way that is done by several countries is to focus on the results of NE such as Japan, Bangladesh, Uruguay, and several other countries [4]. Of course, each country has a different standard and focus [5, 6]. In Indonesia, a national exam evaluates student learning outcomes implemented since 2005, focusing on language, math, and social science scores [7].
One of the subjects that focus on the NE in Indonesia is mathematics; Indonesia actively participates in activities and the national exam. However, assessments such as PISA TIMSS show that the scores obtained by Indonesia are consistently below the average [8]. According to PISA data, Indonesia only got a math score of 386 out of an average of 490 in 2015 [9]. TIMSS data shows that Indonesia got a score of 386 for an average 500 in 2011 [10]. This figure is still far different from neighboring countries such as Malaysia, Vietnam, and Thailand. This fact is a question mark for us because the quality of Indonesian education, especially the results of mathematics exams, is very far from the target in terms of international scale.
Indonesia has many potentials exciting to study education in various conditions, such as geography, ethnicity, and religion. With situations that are unique and diverse. The position also affects education in Indonesia, one of the gaps in the quality of education in geography terms [11]. The location between western Indonesia is close to the center of government compared with the eastern part of Indonesia [3]. So the NE into the pros and cons by various circles in Indonesia, mainly when used as the sole criterion NE graduation for students and the determination of the quality of education a particular area so that impact a variety of fraud [12]. It could be due to one of them because there is inequality in education quality mapping of geo. It can be seen from the test results of the subjects tested, especially the NE in 2013/2014. it is shown that the difference value of mathematics courses is striking between the provinces of Daerah Istimewa Yogyakarta (DIY)=6.25 and Nusa Tenggara Timur (NTT)=5.02. This information is based on data from the education assessment center in 2014.
According to Muttaqin [13], the Indonesian government has made various efforts, such as establishing a 12year compulsory education policy, raising the standard for national exam scores, and decentralizing education. This effort aims to achieve the target of universally equitable education, but the goal of increasing access and quality of education, while reducing inequality, is still far from being expected. Some forms of inequality that occur in Indonesia include:
First, inequality in the group of students from rich and poor economic groups in the age range of 13 to 15 years, 96 percent of students from rich families complete their education in 7 years, while only 80 percent for students from poor households [13]. Second, in terms of access, there are still more than half a million children aged 7 to 15 years who have never set foot in school in their entire life, and a total of more than 1.7 million children drop out of school before completing the nineyear compulsory education [13]. Third, in terms of competence, 55 percent of Indonesian students scored below the average according to the Program for International Student Assessment (PISA), 43 percent averaged, and only 2 percent scored above the average [14]. Fourth, in terms of quality, national examination scores of students in private schools are still lower than their counterparts in public schools, and there is also large variation within and between Islamic private schools.
Based on this explanation, it is clear that the inequality that occurs generally comes from the quality and competence of students, but data from the education assessment center in 2014 shows that several regions, especially those from the western provinces of Indonesia, have better scores than the eastern regions of Indonesia. This is something that is interesting to study considering that there has been no special study that discusses the geographical factor of the education gap.
Since there is an educational gap in Indonesia, one of them is the student competency factor which is generally taken on average based on the UN scores. In fact, some western regions, such as DIY, which have good access to education, have an average math exam score above the national average, while eastern regions like NTT have an average math exam score below the national average the education assessment center in 2014. The mathematics exam given consists of four materials, namely arithmetic operations, geometry, algebra, and statistics and probability. The researcher wants to investigate what items are the source of bias between the western and eastern regions of Indonesia with several DIF detection methods.
The gap results of NE in western and eastern regions of Indonesia can occur because of bias in items about the NA. Discrimination occurs when an imbalanced item is an item about favorable score a particular group [6]. The gap results of the NE because of the inequality score between regions can be caused by test items and, therefore, not test items [1517]. Inequality is caused by the test item that's called biased items. Hambleton et al. [18] express understanding that item bias of individuals who have the same abilities of different groups do not have the same probability of replying with the correct item [18], Items that could potentially bias then analyzed with logic and why these items are relatively more difficult for one group [19]. When an item is somewhat more difficult for a group, and these difficulties are irrelevant to construct tests, that item is biased [20]. Some experts are replacing the term bias items with names Differential Item Function (DIF) [17, 20, 21]. We can express biased items mathematically in terms of probability equations.
Inequality score strived reduced, or at least a significant detected and direction [22, 23]. The detection results can be used for correction or repair items in the past about dating. Thus the necessary measurement and analysis more in some material mathematical possibility of bias when answered by the students [24]. Terms in measuring the test device need to be valid and reliable to obtain the measurement results following what is calculated. To determine the quality of a measuring instrument psychometric test should be conducted on the devices [17, 25]. Experts have established psychometric criteria for a psychological measuring instrument to be declared a good measuring tool [15]. They can provide information that is not misleading.
Invariant identification of interesting measurements to be discussed today. Consistent or measure bias an item known as the Differential Item Functioning (DIF) [26] with a variety of methods that can be performed to identify the functioning of the different items to two or more distinct groups in a test/exam, e.g., methods Item Response Theory (IRT) and a nonIRT [27].
Conceptually, the DIF is said to appear on an item problem. Suppose participants have the same capability to construct measured by the test but from a different group. In that case, they have a different probability of answering the question correctly [28]. For example, measure the same construct only one ability or unidimensional and diverse groups such as men and women. Next [17] argued that an item shows DIF if the test taker has the same ability in a different group and does not have the same probability of responding well.
Various studies on DIF focus on gender bias, a second language, age, and culture [16, 29, 30], but little has been discussed about the geographical conditions. This layout lies within the region with the central government. A regional grouping in Indonesia was divided into three groups Western, Central, and East. This problem becomes a probability for researchers to look at the gap NE by regional groups.
Generally, there are two types of DIF [15] namely 1) uniform DIF and nonuniform DIF. Uniform DIF appears if the advantage of one group over another occurs at every level of ability, and 2) nonuniform DIF appears if the benefit of one group over another does not happen in every capacity. Suppose it is associated with interaction in the statistical analysis of variance. In that case, uniform DIF occurs if there is no interaction between the ability level of participants and group membership, and nonuniform DIF occurs if there is an interaction between the test participant's ability level and group membership [14]. To determine whether a DIF indicates an item or not, a DIF index is needed, which is an index that shows as strong as a DIF indication is in the item. In the context of item response theory, whether or not DIF occurs on a question item lies in the item response function (Item Response). Part) for the item in the group in question. The response function's curve is called the item response curve or characteristic curve. Research detects the existence of DIF on test items, with divided population into two groups, namely the focal group and the reference group. A focal group is a group that is investigated whether there are items that contain DIF in that group. A reference group is a comparison group.
Based on the conditions in the field, which show that the score inequality causes the gap in the quality of education to be one of the most exciting probabilities to study, namely the existence of a bias in the results or DIF NE items based on regional groupings. Furthermore, it is hoped that some items of NE questions that are biased between the western region will be represented from the NE results data for SMP DIY students and the eastern part represented by NTT. Of course, to obtain more accurate results, a DIF detection method will be carried out using the IRT and nonIRT methods.
The gap in the realm of education is a challenge in itself experienced by Indonesia, despite various efforts made, there are still some challenges in its application. In this study, the researchers attempted to analyze the gap in the math UN scores between the western region of Indonesia represented by DIY and the eastern region represented by NTT. The National Examination Response data for the 2013/2014 academic year by students from each region were analyzed for each item using the DIF detection method, both using the IRT and classical/nonIRT approaches. The results of the analysis then produce items that contain DIF, provide information on the effect/magnitude of DIF contained in each item, and provide information on the group that benefits from each item. The items that contain the DIF are analyzed further so that a complete picture is obtained of what materials need to be considered for the Mathematics National Examination questions and what steps need to be taken by policy makers. More complete results can be seen in the results and discussion in this paper.
A fair/good rating item, at least avoids three things, first, item impact, second, item differential function (DIF) and third, item bias [31]. Item impact is evidence that occurs when test takers from different groups have different opportunities to correctly answer an item, which is caused by differences in the actual ability of the two groups to measure the item. DIF appears when test takers from different groups show different opportunities to answer an item correctly, but their abilities match, or they have the same ability. Meanwhile, item bias arises when participants from one group are less likely to answer an item correctly than another group because a number of characteristics of the item or test situation are not relevant to the purpose of the test [32].
DIF arises when items are substantially more difficult for one group than for another, after all differences in the subject matter tested have been accounted for [32]. The DIF analysis is based on the principle of comparing the performance of a focus group (e.g. women) with items with a reference group (e.g. men), controlling for the knowledge being tested. DIF not only means that an item is more difficult for one group than for another but also if participants in one group tend to know more of the test subject than the other group, they will perform better on all test items. Therefore, once DIF is identified on an item, it can be associated with the emergence of item bias or item impact [3335].
Several articles have attempted to identify several sources of item bias that cause DIF. Several previous studies related to DIF compared group performance based on ease of access [13], school status (private and public) [13, 36], and gender [3234].
Generally, previous research only looked at the source of inequality only from differences in gender and school status, while many factors such as location/geography bias as a result of the uneven distribution of educational facilities and infrastructure between the western region which is close to the State Capital and the eastern part of Indonesia. This is what distinguishes the research study from previous research. The method used is adapted to the common DIF detection method, namely the IRT and Classical/nonIRT [15, 35, 36].
3.1 Research desig
This research is an explorative, descriptive study to determine how much a difference is in the ability of junior high school students in Indonesia in terms of geographical location. It refers to detecting differential item functioning (DIF) functional loadings with classical and modern methods (IRT) on the NA Junior High School test Mathematics in the 2013/2014 academic year.
The problem in this research is that the gap results of NE in the western and eastern regions of Indonesia can occur because of bias in the item about NA. Discrimination occurs when there is an imbalance item is an item about favorable score a particular group. The gap results of the NE because of the inequality score between regions can be caused by test items and therefore, not test items.
Based on field conditions that indicate lameness score cause gaps in the quality of education, one probability of bias or DIF result itembased NE regional grouping. Furthermore, the researcher will represent some of the Items Expected to be visible to The NE bias between the western region on the data results of the NE DIY junior high school students and the eastern part represented by NTT. Of course, to obtain more accurate results, the DIF analysis detection method uses IRT and nonIRT. From the analysis, results containing items will then be identified as indicators of DIF within the NE that has the inequality between western and eastern Indonesia.
3.2 Sample and data collection
The sample in this study was students from junior high schools in Yogyakarta and NTT who took the Mathematics National Examination for the 2013/2014 academic year. The sample selection was based on the representation of the western region of Indonesia, represented by the province of DIY, while the eastern region of Indonesia was represented by NTT. Another consideration is the average score of the Mathematics National Examination in DIY is more than the national average, while the average national exam score in NTT is below the national average. These two considerations are the reasons for choosing the two provinces as samples in this study. The data used in this study were obtained from responses from samples working on multiple choice objective mathematical problem items consisting of 40 dichotomous items. In this case, the NTT Group is the Focal group, while DIY is the reference group. In this study, the technique used in collecting data is the documentation technique, by collecting students' responses to the mathematics NE test in DIY and NTT. One thousand students each were selected based on the ability level for each region, which is 30% the lower group, 40% the moderate group, and 30% the upper group.
3.3 Data analysis
According to Cutright [27], methods for detecting DIF can be divided by the number of groups, approaches, and types of DIF itself. The DIF method consists of two groups; the first method is the classical/nonIRT approach. This method includes the mantlehaenzel method, standardization, SIBTEST, and logistic regression. This method uses the row score of each respondent and the second method uses the IRT approach, namely the LRT, Lord, and Raju methods. Any technique can be used without purification/purification. The following groupings DIF detection method we present in the Table 1:
Table 1. DIF detection Method using the R
Approach 
DIF shape 
Number of Groups 

2 
>2 

NonIRT 
Uniform 
MantelHaenszel * Standardization * SUBSET Logistic regression * 
pairwise comparisons Generalized MantelHaenszel* 
NonIRT 
Nonuniform 
Logistic regression * BreslowDay * NU.MH NU.SUBSET 
pairwise comparisons 
ART 
Uniform 
LRT * Lord * Raju * 
pairwise comparisons Generalized Lord * 
ART 
Nonuniform 
LRT * Lord * Raju * 
pairwise comparisons Generalized Lord * 
* Means it can be applied to the package "DifR."
This procedure begins by dividing the data into several groups with a range of specific abilities (ability groups). Then the researchers create a table with two rows and two columns for each group's ability. Focal group (f) is a group harmed by the presence of DIF. In contrast, the reference group is the group that became the basis of comparison for assessing the presence of DIF. If there are M group's abilities, the researchers will compile the Table 1 above as M. The method of detecting DIF on this occasion uses three methods for nonIRT. Classical methods include the MantelHeanzel method, Standardization Method, and Logistic Regression Methods [28, 37]. Modern methods for utilizing the IRT approach are the method of Lord and Raju [28].
DIF detection methods were performed using the R program using the difR package [27, 28]. On this occasion, the data used dichotomy result NE Mathematics by comparing two groups of respondents, a group of students from NTT as a focal group and a group of students of Yogyakarta as the reference group. Researchers took 1000 samples from each group. In this Article, we analyze the 39 items because item 21 did not meet the analysis criteria (could not be answered correctly by the sample used) so it was excluded from the data analysis. The results of DIF analysis are continued by doing analysis UN items that contain DIF. Early analysis is to examine the DIF signification of each item in general and determine the category of difficulty based on the criteria presented in Table 2.
Table 2. Item difficulty category
Category 
Score Interval 
Very Difficult 
x>1.0 
Difficult 
0.5<x≤1 
Medium 
0.5≤x≤0.5 
Easy 
1.0≤x<0.5 
Very Easy 
x<1.0 
Furthermore, DIF analysis is carried out specifically using five selected methods. The results of analysis that produce statistical values based on each DIF method are classified into three levels, namely A, B, and C. This level shows the effect / magnitude of the DIF loaded by each item. Table 3 describes the statistical value limits of the method used.
Table 3. DIF Level interpretation based on five methods
Level 
A 
B 
C 

Method 
MH 
ΔMH<1 
1 ≤ΔMH<1.5 
ΔMH≥ 1.5 
STD 
ΔSTD<1 
1 ≤ΔSTD<1.5 
ΔSTD≥ 1.5 

LOG 
ΔR^{2 }< 0.35 
0.35 ≤ ΔR^{2 }< 0.70 
ΔR^{2 }≥ 0.2 

Lord 
ΔLORD<1 
1 ≤ΔLORD<1.5 
ΔLORD≥ 1.5 

Raju 
ΔRAJU<1 
1 ≤ ΔRAJU <1.5 
 ΔRAJU ≥ 1.5 

DIF effect 
Low 
Medium 
High 
The item about the NE in the study is grouped into four subject matters: arithmetic operation, Algebra, Geometry, and Statistics / Probability. The analysis next is the categorization and count of the number of items that are experiencing DIF are based on the following criteria (Table 4):
Table 4. Categorization of items containing DIF by five methods
No 
Category 
Criteria 
1 
No problem 
Item does not load DIF for all methods 
2 
Less Problem 
There are 12 methods that declare items to load DIF 
3 
Troubled Enough 
Three methods declare an item to load a DIF 
4 
Troubled 
Four methods declare an item to load a DIF 
5 
Very Troubled 
All methods declare items contain DIF 
3.4 Step analysis using the R. program
(1) Prepare data in extension *. CSV on Microsoft excel or can also be in the form of a file extension *. sav in the SPSS program on a specific file.
(2) Open the Program Application and settings directory by clicking FILE → Change dir as Figure 1. Find the folder that contains the folder that contains the files to be analyzed, then click OK.
Figure 1. Change directory
(3) Choose a Package that supports analysis difR, namely mirt, ltm, lme4, deltaPlotR; by clicking Packages → load package, a menu like a Figure 2 will appear. Then select the package to use and click Ok.
Figure 2. Selectable package download
(4) The next step is to import data by writing the following syntax: >data<read.csv(file="DATA UN.csv ", header=T) #(data csv)>data<read.spss(file="DATA UN.sav", to.data.frame =TRUE)#(SPSS data).
(5) Displays data by writing the following syntax: >data. Then the display will appear as shown in Figure 3:
Figure 3. Data analysis
(6) DIF analysis. The first method used is an analysis using the Mantel Heanzel method. The syntax used is as follows: MH<difMH (data, group="Group", focal.name=2, purify=TRUE, nrIter=20, save.output=TRUE, output=c("MHresults", "default")).
(7) Output analysis can be viewed by writing the following syntax in the R Console.
>MH
(8) R console output is as follows: >MH
Detection of Differential Item Functioning using Coat Haenszel method with continuity correction and with item purification.
Results based on asymptotic inference
Convergence reached after five iterations
Matching variable: test score
No set of anchor items was provided
No pvalue adjustment for multiple comparisons
Coat Haenszel Chi square statistics:
stats. P value
it1 1.1462 0.2843
it2 13.7604 0.0002 ***
it3 1.6933 0.1932
it4 44.6885 0.0000 ***
it5 21.8019 0.0000 ***
it6 162.9540 0.0000 ***
it7 0.4404 0.5069
it8 12.7616 0.0004 ***
it9 0.0451 0.8318
it10 25.2172 0.0000 ***
it11 44.4657 0.0000 ***
it12 40.8461 0.0000 ***
it13 207.1702 0.0000 ***
it14 113.9957 0.0000 ***
it15 18.4669 0.0000 ***
it16 70.7143 0.0000 ***
it17 45.0143 0.0000 ***
it18 21.0977 0.0000 ***
it19 1.1558 0.2823
it20 0.0002 0.9886
it21 12.7633 0.0004 ***
it22 37.5032 0.0000 ***
it23 0.7144 0.3980
it24 6.2192 0.0126 *
it25 6.4163 0.0113 *
it26 3.1205 0.0773.
it27 35.7052 0.0000 ***
it28 53.8681 0.0000 ***
it29 0.1717 0.6786
it30 9.6330 0.0019 **
it31 87.9321 0.0000 ***
it32 20.9872 0.0000 ***
it33 0.3241 0.5691
it34 181.3080 0.0000 ***
it35 23.7732 0.0000 ***
it36 63.0233 0.0000 ***
it37 0.0006 0.9803
it38 0.0000 0.9952
it39 10.7247 0.0011 **
Significant. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Detection threshold: 3.8415 (significance level: 0.05)
Items detected as DIF items:
it2
it4
it5
it6
it8
it10
it11
it12
it13
it14
it15
it16
it17
it18
it21
it22
it24
it25
it27
it28
it30
it31
it32
it34
it35
it36
it39
Effect size (ETS Delta scale):
Effect size code:
'A': negligible effect
'B': moderate effect
'C': large effect
alphaMH deltaMH
it1 0.8700 0.3271 A
it2 0.6586 0.9813 A
it3 0.8486 0.3859 A
it4 0.4546 1.8524 C
it5 0.5767 1.2936 B
it6 0.2004 3.7775 C
it7 0.9176 0.2020 A
it8 0.6290 1.0896 B
it9 1.0307 0.0711 A
it10 0.5474 1.4160 B
it11 0.4532 1.8600 C
it12 0.4558 1.8465 C
it13 0.1541 4.3941 C
it14 0.2633 3.1357 C
it15 0.5882 1.2471 B
it16 0.4045 2.1271 C
it17 0.4482 1.8860 C
it18 0.5965 1.2140 B
it19 0.8793 0.3023 A
it20 1.0085 0.0200 A
it21 0.6523 1.0042 B
it22 1.9418 1.5595 C
it23 1.1029 0.2302 A
it24 0.7253 0.7548 A
it25 1.3416 0.6907 A
it26 1.2263 0.4795 A
it27 2.0019 1.6311 C
it28 0.4218 2.0288 C
it29 1.0563 0.1288 A
it30 1.4417 0.8596 A
it31 3.2969 2.8035 C
it32 1.6514 1.1788 B
it33 1.0710 0.1612 A
it34 0.12012 3.7683 C
it35 1.7805 1.3556 B
it36 0.4083 2.1048 C
it37 1.0036 0.0084 A
it38 0.9945 0.0130 A
it39 1.4475 0.8692 A
Effect size codes: 0 'A' 1.0 'B' 1.5 'C'
(for absolute values of 'deltaMH')
The first paragraph shows the DIF detection method used, using the Mantel Heanzel method, and purification carried out for up to 5 iterations. Its output is the statistical value chisquare, pvalue, and the significance level of the item containing the DIF. The second output shows the items containing the DIF. Moreover lastly, it shows the estimated value of alphaMH, deltaMH, and the effect category of DIF for each item.
(9) Visually, items containing DIF using the MantelHeanzel method can be observed using the following syntax: Plot (MH). So that the output appears as in Figure 4:
Figure 4. Coat– Haenszel statistics and detection threshold
The same way can be done to specify DIF and Graph values for other methods with the following syntax:
# DIF detection using standardization method
STD<difStd (data, group="Group", focal.name=2, purify =TRUE, nrIter=20, save.output=TRUE, output=c("Stdresults", "default"))
# Graphics devices
plot (STD)
#DIF detection using logistics regression method
LOG< difLogistic (data, group = "Group", focal.name = 2, purify = TRUE, nrIter = 20, save.output = TRUE, output = c(" Stdresults "," default "))
# Graphics devices
plot (LOG)
# DIF detection using Lord's chisquared method
LORD< difLord (data, group = "Group", focal.name = 2, model = "1PL", purify = TRUE,engine = "lme4", save.output = TRUE, output = c(" LordResults "," defaults "))
# Graphics devices
plot (LORD)
# DIF detection using Raju's area method.
RAJU1< difRaju (data, group = "Group", focal.name = 2, model = "1PL", purify = TRUE,signed = TRUEsave.output = TRUE, output = c(" RAJUresults "," default "))
# Graphics devices
plot (RAJU)
Briefly, the steps of DIF analysis using Program R are presented in the flow chart in Figure 5.
Figure 5. DIF analysis using Program R
The analysis results show the characteristics of the NE items used to detect DIF. The results of the DIF analysis using five different methods are presented in table y, a description of the abilities of NE participants, the relationship between the abilities of NE participants and the distribution of item difficulty, and finally shows the distribution of items containing DIF in NE material.
Table 5 shows that 26 significant items contain DIF, meaning that only 33.3% of items are free from the DIF. There is one very Easy item, nine easy items, 24 medium items, and five difficult items from the difficulty level. Judging from the composition of the level of difficulty of the UN, the NE test kit is quite good because 60% of the items have a level of difficulty in the medium category. In contrast, the other 40% are evenly distributed for other categories.
Table 6 shows the estimated DIF value and DIF load effect level for each item from five different methods that have been analyzed with the R program. In addition to the estimated DIF value, groups that benefit from items containing DIF are also presented based on the DIF value of each method. If the DIF value is positive, the focal group benefits from this item, while the DIF value is negative, the reference group benefits. Especially for the Logistics Regression method, the determination of the group that benefits can be observed by looking at the graph of each item. If the reference group's graph is above focal, as in Figure 6, vice versa, if chart group reference is at on group focal like Figure 7, it means group reference more benefited. If the graph intersects, the group with more wide area _ big more benefited. Full results about the chart of each item for Logistic method regression see the attachment.
Besides the item's characteristics, the researcher also describes ability focal and reference groups. Table 7 shows description ability group focal and reference covers average score, grade maximum, and minimum and deviation standards for each group. The presented score consists of a row or Mark raw of each student and scores on the logit scale.
Table 5. The results of DIF significance and item difficulty
Item Code 
Static Items 
Difficulty Items 

Chisq 
pvalue 
Significance of DIF Load 
Difficulty Index 
Level 

it1 
1498,791 
1.0000 
Significant 
0.7933 
Easy 
it2 
1817,417 
0.9980 
Significant 
0.4638 
Currently 
it3 
1464,414 
1.0000 
Significant 
0.7509 
Easy 
it4 
1542,721 
1.0000 
Significant 
0.4613 
Currently 
it5 
1662,142 
1.0000 
Significant 
0.5773 
Easy 
it6 
1505,309 
1.0000 
Significant 
0.5679 
Easy 
it7 
1536,619 
1.0000 
Significant 
0.5097 
Easy 
it8 
1383,174 
1.0000 
Significant 
0.1254 
Currently 
it9 
2240,014 
0.0000 
Not significant 
0.0197 
Currently 
it10 
1642004 
1.0000 
Significant 
0.3837 
Currently 
it11 
1496,025 
1.0000 
Significant 
0.5256 
Easy 
it12 
1506.807 
1.0000 
Significant 
0.2928 
Currently 
it13 
2135,392 
0.0170 
Not significant 
0.8324 
Difficult 
it14 
1558,552 
1.0000 
Significant 
0.4065 
Currently 
it15 
1525.508 
1.0000 
Significant 
0.3360 
Currently 
it16 
2374,379 
0.0000 
Not significant 
0.2671 
Currently 
it17 
2462,191 
0.0000 
Not significant 
0.5896 
Difficult 
it18 
2155,693 
0.0080 
Not significant 
0.3845 
Currently 
it19 
1925.017 
0.8800 
Significant 
0.2671 
Currently 
it20 
1753,388 
1.0000 
Significant 
0.1614 
Currently 
Item Code 
Static Items 
Difficulty Items 

Chisq 
pvalue 
Significance of DIF Load 
Difficulty Index 
Level 

it21 
2968,048 
0.0000 
Not significant 
0.9288 
Difficult 
it22 
2064,715 
0.1490 
Significant 
0.1093 
Currently 
it23 
2951,127 
0.0000 
Not significant 
0.5251 
Difficult 
it24 
2572.092 
0.0000 
Not significant 
0.8432 
Difficult 
it25 
2228,474 
0.0000 
Not significant 
0.3280 
Currently 
it26 
1650.607 
1.0000 
Significant 
0.0574 
Currently 
it27 
1739,634 
1.0000 
Significant 
0.5077 
Easy 
it28 
2213.45 
0.0010 
Not significant 
0.2345 
Currently 
it29 
1643,148 
1.0000 
Significant 
0.2202 
Currently 
it30 
1630,804 
1.0000 
Significant 
0.0446 
Currently 
it31 
2099,081 
0.0580 
Significant 
1.0948 
Very easy 
it32 
1840,876 
0.9950 
Significant 
0.0332 
Currently 
it33 
1906,943 
0.9290 
Significant 
0.0446 
Currently 
it34 
2869.114 
0.0000 
Not significant 
0.2439 
Currently 
it35 
1796,461 
1.0000 
Significant 
0.5354 
Easy 
it36 
2361,682 
0.0000 
Not significant 
0.1585 
Currently 
it37 
1940,669 
0.8210 
Significant 
0.2834 
Currently 
it38 
2521.012 
0.0000 
Not significant 
0.6308 
Easy 
it39 
1899,488 
0.9440 
Significant 
0.2065 
Currently 
Table 6. MH, STD, Lord, and Raju Hasil results
Items 
MH 
STD 
LOG 
LORD 
RAJU 
DIF detect. results 
Adv. Group 
Material 

MH 
Lv 
Std 
Lv 
R ^{2} 
Lv 
Lord _ 
Lv 
Raju 
Lv 

it1 
0.33 
 
0.19 
 
0.00 
 
0.66 
A 
0.66 
A 
2/5 
Ref. 
Arit. 
it2 
0.98 
A 
0.59 
 
0.00 
A 
0.34 
 
0.34 
 
2/5 
Ref. 
Arit. 
it3 
0.39 
 
0.02 
 
0.00 
 
0.59 
A 
0.59 
A 
2/5 
Ref. 
Arit. 
it4 
1.85 
C 
1.14 
B 
0.01 
A 
0.69 
 
0.69 
 
3/5 
foc. 
Arit. 
it5 
1.29 
B 
0.65 
 
0.00 
A 
0.33 
 
0.33 
 
2/5 
foc. 
Arit. 
it6 
3.78 
C 
2.55 
C 
0.04 
B 
2.23 
C 
2.23 
C 
5/5 
foc. 
Arit. 
it7 
0.20 
 
0.26 
 
0.00 
A 
0.86 
A 
0.86 
A 
3/5 
Ref. 
Arit. 
it8 
1.09 
B 
0.37 
 
0.00 
 
0.48 
A 
0.48 
A 
3/5 
Ref. 
Arit. 
it9 
0.07 
 
0.41 
 
0.02 
A 
0.54 
A 
0.54 
A 
3/5 
Ref. 
Arit. 
it10 
1.42 
B 
0.78 
 
0.00 
A 
0.28 
 
0.28 
 
2/5 
foc. 
Arit. 
it11 
1.86 
C 
1.16 
B 
0.01 
A 
0.73 
 
0.73 
 
3/5 
foc. 
Alg. 
it12 
1.85 
C 
1.08 
B 
0.01 
A 
0.40 
 
0.40 
 
3/5 
foc. 
Alg. 
it13 
4.39 
C 
3.90 
C 
0.07 
B 
3.86 
C 
3.86 
C 
5/5 
foc. 
Alg. 
it14 
3.14 
C 
1.98 
C 
0.02 
A 
1.69 
C 
1.69 
C 
5/5 
foc. 
Alg. 
it15 
1.25 
B 
0.58 
 
0.00 
A 
0.10 
 
0.10 
 
2/5 
foc. 
Alg. 
it16 
2.13 
C 
1.42 
B 
0.03 
A 
2.10 
C 
2.10 
C 
5/5 
foc. 
Alg. 
it17 
1.89 
C 
1.25 
B 
0.01 
A 
1.61 
C 
1.61 
C 
5/5 
foc. 
Alg. 
it18 
1.21 
B 
0.90 
 
0.00 
A 
0.92 
A 
0.92 
A 
4/5 
foc. 
Alg. 
it19 
0.30 
 
0.11 
 
0.01 
A 
0.56 
A 
0.56 
A 
3/5 
Ref. 
Alg. 
it20 
0.02 
 
0.29 
 
0.01 
A 
1.04 
B 
1.04 
B 
3/5 
Ref. 
Alg 
it21 
1.00 
B 
0.57 
 
0.01 
A 
1.85 
C 
1.85 
C 
4/5 
foc. 
Geo 
it22 
1.56 
C 
1.55 
C 
0.04 
B 
2.12 
C 
2.12 
C 
5/5 
Ref. 
Geo 
it23 
0.23 
 
0.70 
 
0.00 
 
0.38 
 
0.38 
 
0/5 
 
Geo 
it24 
0.75 
A 
0.46 
 
0.01 
A 
0.50 
 
0.50 
 
2/5 
foc. 
Geo 
it25 
0.69 
A 
0.89 
 
0.02 
A 
1.04 
B 
1.04 
B 
4/5 
Ref. 
Geo 
it26 
0.48 
 
0.60 
 
0.02 
A 
1.42 
B 
1.42 
B 
3/5 
Ref. 
Geo 
it27 
1.63 
C 
1.52 
C 
0.03 
A 
2.32 
C 
2.32 
C 
5/5 
Ref. 
Geo 
it28 
2.03 
C 
1.43 
B 
0.01 
A 
1.41 
B 
1.41 
B 
5/5 
foc. 
Geo 
it29 
0.13 
 
0.30 
 
0.01 
A 
1.12 
B 
1.12 
B 
3/5 
Ref. 
Geo 
it30 
0.86 
A 
1.03 
B 
0.02 
A 
1.80 
C 
1.80 
C 
5/5 
Ref. 
Geo 
it31 
2.80 
C 
2.69 
C 
0.05 
B 
3.27 
C 
3.27 
C 
5/5 
Ref. 
Geo 
it32 
1.18 
B 
1.14 
B 
0.03 
A 
1.97 
C 
1.97 
C 
5/5 
Ref. 
Geo 
it33 
0.16 
 
0.28 
 
0.01 
A 
0.92 
A 
0.92 
A 
3/5 
Ref. 
Geo 
it34 
3.77 
C 
3.26 
C 
0.08 
C 
3.85 
C 
3.85 
C 
5/5 
foc. 
Geo 
it35 
1.36 
B 
1.48 
B 
0.01 
A 
2.00 
C 
2.00 
C 
5/5 
Ref. 
stats. 
it36 
2.10 
C 
1.45 
B 
0.02 
A 
1.94 
C 
1.94 
C 
5/5 
foc. 
stats. 
it37 
0.01 
 
0.52 
 
0.00 
A 
0.71 
A 
0.71 
A 
3/5 
Ref. 
stats. 
it38 
0.01 
 
0.37 
 
0.00 
 
0.12 
 
0.12 
 
0/5 
 
stats. 
it39 
0.87 
A 
0.97 
A 
0.01 
A 
1.47 
B 
1.47 
B 
5/5 
Ref. 
stats. 
Figure 6. Graph of probability and ability of focal group and Reference group Logistics Regression Method item 27
Figure 7. Graph of probability and ability of focal group and Reference group Logistics Regression Method item 13
Table 7. Description of the ability of the reference and focal group students
Descriptive 
Row Score 
Logit Scale 

Reference 
Focal 
Reference 
Focal 

n 
1000 
1000 
1000 
1000 
average 
23.43 
17.01 
0.42 
0.40 
min 
4.00 
3.00 
2.13 
2.42 
max 
37.00 
37.00 
2.56 
2.57 
SD 
9.67 
6.94 
1.20 
0.75 
Connection Item characteristics and abilities of students from each group are presented in Figure 8 and Figure 9. Figure 8 shows group references have abilities ranging from 2.13 to 2.56 temporal logit scale index. The difficulty of the items is in the range of the interval 2.00 to 2.00. with index difficulty, set up NE test as it should measure with good ability students in groups reference. Temporary that, if compared to Figure 9, shows ability student is in the interval 2.00 to 2.00 while index Item difficulty is in the range of 1.00 to 1.00. index the difficulty of the item given to group reference it turns out no by significant capable describe with good ability group this, thing this because of existence significant difference _ Among ability student with the device the NE test is given. Some groups with ability tall and capable low no could reveal with effective device NE test.
Figure 8. Personitem map reference group
Figure 9. Focal group personitem map
Connection Among ability students and item difficulty index has implications for the advantaged group with device test this. Table 8 describes that generally, items that contain DIF are good group references. Although average ability _ from group reference is more suitable than a focal group, level success students in the focal group are more suitable than group reference to some items with the same ability level. This condition causes a unique case and can be studied more in using various other analyses in the future front.
Figure 10. Number of Items containing DIF and nonDIF based on DIF detection method with R. program
Table 8. DIF function level results and Groups who benefit from the R. program
DIF Level 
CTT 
1PL IRT 

MH 
STD 
LOG 
LORD 
RAJU 

Ref. 
foc. 
Ref. 
foc. 
Ref. 
foc. 
Ref. 
foc. 
Ref. 
foc. 

A 
3 
2 
1 
0 
15 
14 
8 
1 
8 
1 
B 
2 
6 
3 
7 
2 
2 
5 
1 
5 
1 
C 
3 
11 
3 
4 
0 
1 
6 
8 
6 
8 
Total 
8 
19 
7 
11 
17 
17 
19 
10 
19 
10 
Based on Figure 10, it can be seen that by using the nonIRT method/classical method, it was identified that the MH model detected 27 items containing DIF, the Standard model detected 18 items containing DIF. The logistic method identified 34 items containing DIF. Meanwhile, when viewed from the comparison of the three models above, it can be seen that there were 18 items detected containing DIF for the three nonIRT models used. Based on the data obtained, it can be seen that by using the IRT/Modern method with the 1PL model, it was identified that the Lord and Raju model detects DIF for all items, namely 29 items. Raju detected 34 items containing DIF. In the 3PL model, the estimation with Lord and Raju could not be detected because it failed to run the R program. Other information shows that the estimation method using the nonIRT approach has relatively different estimates between one method and another, while the IRT approach is all estimation methods used. Used using the 1PL model gives the same results for all items. Meanwhile, if we look at the items that contain DIF for all the methods used, there are 15 items, namely items 6, 13, 14, 16, 17, 23, 28,29, 31,32,33, 35, 36, 37, and 40. Of the 39 items, two items do not contain DIF for all methods used, namely items 24 and 39.
Besides the Item characteristics and abilities group, the important necessary _ shown is the material loaded on the NE test. The researcher has grouped the distribution of 2013/2014 NE material based on the material that can be used seen in Table 6. Based on Figure 11, NE questions are divided over four materials generals with the distribution of items that are not proportional for each material show that Theory geometry has a proportion compared to Theory other.
After done distribution material, the next step is to categorize items containing DIF with five methods based on the distribution of the material shown in Table 9.
Figure 11. Percentage and number of items on NE materials
Table 9 shows that the geometry material, which has a large proportion in the NA, has the highest number of items containing DIF, namely 12 items that are sufficient to be very problematic. Algebra material containing 9 item questions containing DIF in moderate to very problematic category. Statistics / The probability that even though the number of items is small. Almost all items have problems with DIF, namely 4 out of 5 contain DIF, while for the arithmetic operation material is balanced between items that contain DIF and not, namely five items that have problems while the other 5 are relatively less problematic.
Table 9. NE Material Analysis containing DIF
Material 
No Problem 
Less Problem 
Troubled Enough 
Troubled 
Very Troubled 
Arithmetic operations 
0 
5 
4 
0 
1 
Algebra 
0 
1 
4 
1 
4 
Geometry 
1 
1 
3 
2 
7 
Statistics/Probability 
1 
0 
1 
0 
3 
Amount 
2 
7 
12 
3 
15 
Based on the results obtained, it can be seen that there is a very significant bias in the 2013/2014 NE item items. More than half of the NE questions contain DIF, which indicates that the NE questions tend to benefit the focal group [15, 38], in this case, for groups of students in NTT (representing the eastern region). Furthermore, disadvantaged students in DIY (representing the western region of Indonesia). This condition indicates that the NE questions contain a relative bias significant by location within [11]. Racial and ethnic group differences may reflect actual differences rather than measurement bias [38] know that ethnicity may influence learning culture and learning opportunities [38, 39].
Several things cause differences in the ability of DIY and NTT students to solve the NE questions in terms of this location, such as differences in educational facilities and facilities and infrastructure of the two regions [5, 40]. This situation can be supported by the fact that NTT is one of the target areas for the Bachelor of Education program in the Outermost, Frontier, and Disadvantaged Regions (SM3T) to improve the quality of education in that area. In addition, several studies have shown that differences in student backgrounds such as language [41], socioeconomic, and geography have contributed to students' abilities [5]. Geographical differences in Indonesia are generally the things that affect the primary language used. It may result in differences in students' abilities to understand some of the instructions in the NE questions given [41].
Low ability focal group can be caused by facility factors and several things that might cause a lot of NTT students not to solve the NE questions [3, 5]. Another factor is the still weak ability of students in several learning materials, namely Algebra [42, 43], Geometry [44, 45], and Statistics/probability [46, 47] material. Of course, this needs to be a focus for the education office, schools, teachers, and all stakeholders [48, 49] to focus and concentrate on these materials so that students at NTT can solve problems related to these materials in the future [50]. The material with a reasonably high DIF in this analysis is the same as the TIMSS results in 2011, where Indonesia has a belowexpected value for the three materials. Indonesia only has an increase in the material for arithmetic operations [10]. Another study stated that the primary thing that affects students' ability to solve math problems is the content ability [51, 52]. it is necessary to carry out several policies to improve the content abilities of NTT students, such as improving the quality of learning to improve students' knowledge of mathematical content.
The exciting thing in the study is that with existing limitations, the general focal group /NTT has more opportunities than group /DIY reference for items containing DIF at the same capability level. This situation shows that with good guidance and educational support facilities, the ability focal group /NTT has potency significantly could increase in the future [5]. This condition naturally needs good cooperation Among the government center and region to support progress in several areas in eastern Indonesia.
Some of the limitations in this study can be used by further researchers, such as the sample used only using two provinces, namely DIY and NTT. In the future, analysis can be carried out for several areas involving more than two groups. In addition, the analysis carried out by researchers is still shallow to reveal the items and materials that contain DIF. In addition to geographical factors, many exciting things can be done to understand the bias of the NE items, such as differences in the primary language, learning opportunities, socioeconomics, other factors that are more interesting to study [3, 21, 38].
Geographical location bias is interesting to study in an archipelagic country like Indonesia, with complexities and educational problems. Mathematics NE as a benchmark for the quality of Education in Indonesia Generally profitable focal group /NTT. However, although reality, students' ability in groups is lower than group reference /DIY. This condition may be a consequence of several factors such as facility problems, condition socioeconomics, and knowledge of mathematical content (on Algebra, Geometry, and Statistics/ Probability materials). With the analysis, policymakers can take corrective steps to improve facilities and procure competent teachers. So that inferior math content problems in the focal group could be overcome for avoiding inequality from geographic aspect group in Indonesia.
This study looks at the other side of the gap in the quality of education, especially education in Indonesia, which is rich in pluralism and several problems. One of the problems that the researchers focus on is the geographical gap which is the impact of uneven development, especially educational facilities and infrastructure. Several previous studies have highlighted DIF based on gender, school status, and ease of access. Generally, researchers focus on gender and school status gaps, while only a few have looked at location/geographical aspects. With this research, it is hoped that it can provide a new picture of other aspects that have not been considered in the DIF study. This research can be used as a reference regarding the education gap for regions that have multiple characteristics, both from ethnic groups, cultures and islands such as Indonesia.
In general, the DIF analysis is considered the first step, the statistical step, to decide whether the item is biased towards a particular group. The emergence of DIF must first be seen as an impact, namely the real difference in ability of the two groups. This is important because if an item is detected as DIF, it does not necessarily mean the item is biased. In this case, it is important to consider whether the reason for the difference in group scores on the item is relevant or not, which depends on the object or purpose of the measurement. The first case, DIF is caused by the actual difference, and the second case is caused by bias
The researcher sees that DIF is a necessary condition for item bias to occur, but it is not a sufficient condition. Item impact and item bias both differ in group situations based on relevant characteristics or irrelevant test characteristics. However, if DIF occurs, then this event is not sufficient to prove the occurrence of a bias item; but furthermore, the item should be analyzed (e.g. by content analysis, field evaluation) to assess the presence of item bias in it. Thus, DIF refers to the way items function differently for individuals or groups of test takers of the same ability.
Item bias, a challenge to the validity of the test, causes systematic errors that can give incorrect interpretations of the conclusions made for certain group members. In other words, when an item unfairly favors one group over another, then item bias exists. The question is biased because the item itself contains certain sources of difficulty other than the construct being tested, and this difficulty factor is detrimental to the performance of the test taker. However, differences in performance or ability to answer items do not automatically prove item bias.
[1] Sulisworo, D. (2016). The contribution of the education system quality to improve the nation’s competitiveness of Indonesia. Journal of Education and Learning (EduLearn), 10(2): 127–138. https://doi.org/10.11591/edulearn.v10i2.3468
[2] Voss, R., Richards, T. (2016). Challenges related to teaching mathematics using social justice pedagogies: A secondary school experience. Journal of Education and Practice, 7(17): 68–73.
[3] Retnawati, H., Wutsqo, D.U., Listyani, E., Yp, K.R. (2014). The identifi cation of the diffi culties in solving mathematical problems of junior high school mathematics teachers in Nusa Tenggara Timur and Maluku Utara. Journal of Education, 7(1): 1–13.
[4] UNESCO. (2017). Global Education Monitoring Report 2017/18 (Second). United Nations Educational, Scientific and Cultural Organization. https://doi.org/10.1017/CBO9781107415324.004
[5] Özerk, K., Kerchner, C.T. (2014). Diversity and educational challenges in Oslo and Los Angeles  A metropolitan perspective nr 2. International Electronic Journal of Elementary Education, 6(3): 441–462.
[6] Edelen, M.O., Thissen, D., Teresi, J.A., Kleinman, M., OcepekWelikson, K. (2006). Identification of differential item functioning using item response theory and the likelihoodbased model comparison approach: Application to the MiniMental State Examination. Medical Care, 44(11 SUPPL. 3): 134–142. https://doi.org/10.1097/01.mlr.0000245251.83359.8c
[7] Orlando Edelen, M.O., Thissen, D., Teresi, J.A., Kleinman, M., OcepekWelikson, K. (2006). Identification of differential item functioning using item response theory and the likelihoodbased model comparison approach. Application to the MiniMental State Examination. Medical Care, 44(11 Suppl 3): S13442. https://doi.org/10.1097/01.mlr.0000245251.83359.8c\r0000565020061100100019 [pii]
[8] Ferdhiana, R., Amri, K., Abidin, T.F. (2019). Clustering of districts in indonesia using the 2015 high school social sciences national examination results. ICICOS 2019  3rd International Conference on Informatics and Computational Sciences: Accelerating Informatics and Computational Research for Smarter Society in The Era of Industry 4.0, Proceedings, 3–6. https://doi.org/10.1109/ICICoS48119.2019.8982524
[9] Fahrurrazi, F., Rohiat, R., Somantri, M., Turdja’I, T. (2021). Biplot analysis for marking the quality of schools which compatible with national examination score. Journal of Physics: Conference Series, 1731(1). https://doi.org/10.1088/17426596/1731/1/012031
[10] OECD. (2016). PISA 2015: Results in focus. Pisa 2015, 67, 16. https://doi.org/10.1787/9789264266490en
[11] IEA. (2012). Timss 2011 International Results in Science. In New directions for youth development (Vol. 2012, Issue 136). https://doi.org/10.1002/yd.20038
[12] Irawan, C. (2015). The national examination and the quality of education mapping. 2(1): 97105.
[13] Muttaqin, T. (2018). Determinants of unequal access to and quality of education in Indonesia. Journal Perencanaan Pembangunan: The Indonesian Journal of Development Planning, 2(1): 123. https://doi.org/10.36574/jpp.v2i1.27
[14] BPS. (2021). Statistik Pendidikan 2021 (R. Sinang, R. Sulistyowati, R. Putrianti, G. Anggraeni, & F. W. R. Dewi (eds.)). Badan Pusat Statistik.
[15] OECD. (2012). PISA 2012 Results in Focus: What 15yearolds know and what they can do with what they know. https://www.oecd.org/pisa/keyfindings/pisa2012resultsoverview.pdf
[16] Retnawati, H. (2017). The Dif identification in constructed response items using partial credit model. International Journal of Assessment Tools in Education, January, 7389. https://doi.org/10.21449/ijate.347956
[17] Kottorp, A., Malinowsky, C., LarssonLund, M., Nygård, L. (2018). Gender and diagnostic impact on everyday technology use: A differential item functioning (DIF) analysis of the Everyday Technology Use Questionnaire (ETUQ). Disability and Rehabilitation, 17. https://doi.org/10.1080/09638288.2018.1472816
[18] Hambleton, R.K., Swaminathan, H., Rogers, D.J. (1991). Fundamentals of Item Response Theory (F. Diane S (ed.)). SAGE Publications, Inc.
[19] Simbolon, K., Supriyati, Y., Naga, D.S. (2019). Sensitivity of differential item functioning detection methods on national mathematics examination in north Sumatera Province, Indonesia. International Journal of Engineering and Advanced Technology, 8(5): 15381549. https://doi.org/10.35940/ijeat.E1226.0585C19
[20] Sudaryono, S. (2012). Sensitivity of differential item functioning (Dif) Sensitivitas Metode Pendeteksian Differential Item Functioning (Dif). Jurnal Evaluasi Pendidikan/Journal of Educational Evaluation, 3(1): 8283. https://doi.org/https://doi.org/10.21009/JEP.031.07
[21] French, B.F., Finch, W.H., Antonio, J., Vazquez, V. (2016). Differential Item Functioning on mathematics items using multilevel SIBTEST. Psychological Test and Assessment Modeling, 58(3): 471483.
[22] Jones, R.N. (2019). Differential item functioning and its relevance to epidemiology. Current Epidemiology Reports, 6(2): 174–183. https://doi.org/10.1007/s40471019001945
[23] Meyer, J.P. (2021). Differential Item Functioning. Applied Measurement with JMetrik, 89101. https://doi.org/10.4324/978020311519013
[24] Reinius, M., Rao, D., Manhart, L. E., Wiklander, M., Svedhem, V., Pryor, J., Mayer, R., Gaddist, B., Kumar, S., Mohanraj, R., Jeyaseelan, L., Wettergren, L., Eriksson, L.E. (2018). Differential item functioning for items in Berger’s HIV Stigma Scale: An analysis of cohorts from the Indian, Swedish, and US contexts. Quality of Life Research, 27(6): 16471659. https://doi.org/10.1007/s1113601818414
[25] Alsmadi, Y., Alsmadi, A. (2009). Detecting differential person functioning in emotional intelligence. Journal of Instructional Psychology, December, 2014: 284289.
[26] DeMars, C.E., Jurich, D.P. (2015). The interaction of ability differences and guessing when modeling differential item functioning with the rasch model: Conventional and tailored calibration. Educational and Psychological Measurement, 75(4): 610633. https://doi.org/10.1177/0013164414554082
[27] Cutright, P. (1967). Inequality: A crossnational analysis. American Sociological Review, 32(4): 562578.
[28] Fossett, M., South, S. J. (1983). The measurement of intergroup income inequality: A conceptual review. Social Forces, 61(3): 855871.
[29] Hidayati, V.R., Subanji, S., Sisworo, S. (2020). Students’ Mathematical Connection Error in Solving PISA Circle Problem. JIPM (Jurnal Ilmiah Pendidikan Matematika)/JIPM (Scientific Journal of Mathematics Education), 8(2): 76. https://doi.org/10.25273/jipm.v8i2.5588
[30] Riyati, I., Suparman, S. (2019). Design student worksheets based on problemlearning to enhance mathematical communication. Asian Journal of Assessment in Teaching and Learning, 9(2): 917. https://doi.org/10.37134/ajatel.vol9.no2.2.2019
[31] Bauer, D.J. (2017). A more general model for testing measurement invariance and differential item functioning. Psychol Methods., 22(3): 507526. https://doi.org/10.1037/met0000077.A
[32] Bauer, D.J. (The U. of N. C. (1997). A more general model for testing measurement invariance and differential item functioning. Psychological Methods, 2(4): 403–435. https://doi.org/10.1037/a0030641
[33] Magis, D., Béland, S., Tuerlinckx, F., de Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3): 847862. https://doi.org/10.3758/BRM.42.3.847
[34] Cervantes, V. H. (2017). DFIT: An R Package for Raju’s Differential Functioning of Items and Tests Framework. Journal of Statistical Software, 76(5). https://doi.org/10.18637/jss.v076.i05
[35] Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Ottawa: National Defense Headquarters, January 1999, 157. http://www.researchgate.net/publication/236596822_A_handbook_on_the_theory_and_methods_of_differential_item_functioning_(DIF)_Logistic_regression_modeling_as_a_unitary_framework_for_binary_and_Likerttype_(ordinal)_item_scores/file/60b7d51830c07e4cbc.pdf
[36] Rahayu, W., Sinaga, O., Oktaviani, M., Zakiah, R. (2018). Analysis of mathematical ability of high school students based on item identification of national examination set. Advances in Social Science, Education and Humanities Research, 178(ICoIE 2018): 412–416. https://doi.org/10.2991/icoie18.2019.88
[37] Susongko, P., Arfiani, Y., Kusuma, M. (2021). Determination of gender differential item functioning in tegalstudents’ scientific literacy skills with integrated science (Slisis) test using rasch model. Jurnal Pendidikan IPA Indonesia, 10(2): 270281. https://doi.org/10.15294/jpii.v10i2.26775
[38] Çepni, Z., Kelecioğlu, H. (2021). Detecting differential item functioning using SIBTEST, MH, LR and IRT methods. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 12(3): 267285. https://doi.org/10.21031/epod.988879
[39] Chen, K., Li, W., Wang, S. (2020). An easy‑to‑implement hierarchical standardization for variable selection under strong heredity constraint. Journal of Statistical Theory and Practice, 14(3): 132. https://doi.org/10.1007/s4251902000102x
[40] Rodriguez, H.P., Crane, P.K. (2011). Examining multiple sources of differential item functioning on the clinician & group CAHPS ® survey. Health Services Research, 46(6 PART 1): 17781802. https://doi.org/10.1111/j.14756773.2011.01299.x
[41] Özerk, K. (2013). The norwegian educational system, the linguistic diversity in the country and the education of different minority groups. International Electronic Journal of Elementary Education, 6(1): 4359.
[42] Sinharay, S., Dorans, N.J., Liang, L. (2009). First language of examinees and its relationship to differential item functioning. In Educational Testing Service (Issue March, pp. 165). ETS, Princeton, New Jersey.
[43] Powers, R.A., Craviotto, C., Grassl, R.M. (2010). Impact of proof validation on proof writing in abstract algebra. International Journal of Mathematical Education in Science and Technology, 41(4): 501514. https://doi.org/10.1080/00207390903564603
[44] Winarti, D.W. (2018). Developing spatial reasoning activities within geometry learning. The 6th South East Asia Design Research International Conference.
[45] Arnis, F.M., Syahputra, E., Surya, E. (2019). Analysis of trajectory thinking of middle school students to complete the problem of spatial ability with realistic mathematical education learning. Journal of Education and Practice, 10(20): 103109. https://doi.org/10.7176/JEP
[46] T, A.Y., Ningsih, K. (2019). Character education strengthening of students through the mathematical disposition strategy on statistics elementary. Journal of Education, Teaching, and Learning, 4(1): 15.
[47] Widakdo, W.A. (2017). Mathematical Representation Ability by Using Project Based Learning on the Topic of Statistics Mathematical Representation Ability by Using Project Based Learning on the Topic of Statistics. International Conference on Mathematics and Science Education (ICMScE).
[48] Miller, B.M. (2008). Problembased conversations: Using preservice teachers’ problems as a mechanism for their professional development. Teacher Education Quarterly, 34(5): 7798.
[49] Moyle, K. (2009). National conversations: listening to students’ views of learning with technologies. ACSA 2009 National Biennial Conference.
[50] Shanahan, E.M., Dallacqua, A.K. (2018). Action in teacher education moving beyond “Agreeable” texts and “Boring” tasks: pairing young adult literature and critical literacy in teacher education moving beyond “Agreeable” texts and “Boring” tasks: Pairing. Action in Teacher Education, 40(1): 3857. https://doi.org/10.1080/01626620.2018.1424659
[51] Cohen, A.S., Gregg, N., Deng, M. (2005). The Role of extended time and item content on a highstakes mathematics test. Learning Disabilities Research and Practice, 20(4): 225233. https://doi.org/10.1111/j.15405826.2005.00138.x
[52] Shivaprasad, S., Sadanandam, M. (2021). Optimized features extraction from spectral and temporal features for identifying the Telugu dialects by using GMM and HMM. Ingénierie des Systèmes d’Information, 26(3): 275283. https://doi.org/10.18280/isi.260304