Validity and Reliability Study for the Turkish Version of Number Sense Screener for 60-71 Months Old Children

In this study, it was aimed to carry out the validity and reliability study of Number Sense Screener (NSS)‟s Turkish version for 60-71 year-old children. The universe of the study in which general survey model was used consists of children at 60-71 years of age, who were continuing their education during the spring term of academic year 2017-2018 in public and private kindergartens under the Ministry of National Education within the central district of Aydin/Turkey. As for the sample of the study, 658 children contunuing their education in 12 different preschools pre-determined according to socioeconomic status (low-middle-high) were included through systematic sampling method. Validity and reliability studies were carried out using data from the NSS application. Drawing from the analyses, it was determined that item difficulty values of the assessment tool ranged between 0.08 and 0.96, item discrimination indices ranged between 0.17 and 0.53 and item infit and outfit values ranged between 0.50-1.50. Having used Guttman's lambda-2, Alpha coefficient, Feldt-Gilmer coefficient, Feldt-Brennan coefficient and Raju's beta coefficient, it was determined that reliability coefficient calculated for the entire assessment tool varied between .826 and .837. Following DIF analysis, it was concluded that assessment tool items were not biased towards gender. According to the results of the study, it was concluded that the Turkish version of NSS is a valid and reliable assessment tool for assessing the number sense of children at 60-71 months of age.


Introduction
The fact that its proven by many studies that mathematical concepts and skills acquired during preschool period are the strongest predictor for later achievement in mathematics led to more importance given to the acquisition of mathematical concepts and skills in both national and international education programs (Mazzocco & Thompson, 2005;Jordan, Kaplan, Olah & Locuniak, 2006;Duncan et al., 2007;Lopez, Gallimore, Garnier & Reese, 2007;Romano, Babchishin, Pagani & Kohen, 2010).
The most important mathematical concept emphasized in preschool period is that of number (Charlesworth & Lind, 2010;Onkol, 2012).The concept of number is a cornerstone for mathematics programs (NAEYC, 2010) and is a prerequisite for teaching advanced mathematics skills (Charlesworth, 2012;Nguyen et al., 2016).The National Council of Teachers of Mathematics-NCTM (2000) emphasizes the importance of developing number skills in mathematics programs during early childhood and highlights that the longest duration of preschool education should be allocated for number skills (as cited in Yılmaz, 2012).Number knowledge in preschool period includes many interrelated skills.These skills are often referred to as "number sense" in international literature (Gersten, Jordan, & Flojo, 2005;Jordan, Gluttin & Dyson, 2012;Cekirdekci, Sengul & Dogan, 2016).
Number sense has become one of the focal points of mathematics education at international level in the last two decades.When the studies are examined, however, it is seen that many different definitions about number sense are available (Gersten et al., 2005;Sengul & Gulbagcı Dede, 2013).Berch (2005) suggests that the number sense is a sense about the meanings of numbers.The National Council of Teachers of Mathematics-NCTM (1989) defines number sense as the ability to understand the meaning of numbers, describe different relationships among numbers, identify the relative size of numbers, use referents for measuring objects and events, and think in a flexible way with numbers (as cited in Lago & DiPerna, 2010).Baroody ve Wilkins (1999) describe number sense as a concrete understanding of numerical relationships.In the light of these definitions, the term generally refers to one"s understanding of numbers, ways of representing them, and relationships among numbers (Reys et al., 1999;Gersten et al., 2005).When definitions of number sense are examined, it is seen that two common points are emphasized.First, numbers and the relationships among numbers or in other words, operations are mentioned.The second common point is about numbers and flexibility while using operations (Sengul & Gulbağcı Dede, 2013).In this respect, it was concluded that the special skills defined as part of the number sense include verbal and object count, quantity comparison, numerical definition and basic calculation (Howell & Kemp, 2009;Lago & DiPerna, 2010).Jordan et al. (2006) suggest that the basic components of number sense are counting, number knowledge, number transformation, estimation and number patterns.
In many studies, number sense is defined as predictive of later mathematics achievement.For example, Mazzocco & Thompson (2005) found that learning disorders in mathematics in later years are due to the deprivation of number sense in preschool period.Jordan, Kaplan, Locuniak & Ramineni (2007) found that number sense in preschool period is quite effective in predicting achievement in mathematics in primary school.Additionally, it is predicted that difficulties with number sense faced by children during preschool period are very likely to grow further and continue in later stages of education (Aunio, Hautamä k & Van Luit;2005).As can be seen, number sense is one of the most crucial concepts to be developed in early mathematics and is a key predictor of subsequent mathematical achievement both in short (Aunio & Niemivirta, 2010) and long term.Number sense was also considered to be one of the content standards of mathematics education by National Council for Teachers of Mathematics-NCTM (2000).According to the standards put forth for mathematics education by NCTM, young children should develop a foundational number sense by the end of second grade (NCTM, 2000).
Number sense gradually develops from infancy and matures with the experience and knowledge acquired in the preschool period as a result of these children"s" interaction with their informal or formal environment (Reys & Yang, 1998;Ginsburg Lee & Boyd, 2008;Jordan et al., 2012a).Factors such as the diversity of mathematical concepts used in line with the characteristics of familial environment (socioeconomic status of the family, parents' level of education and so forth) where the children are born into, attitude towards mathematics, playing number-related games at home and being able to make use of house chores for educational purposes result in children having different levels of number sense competency (Howse, Lange, Farran & Boyles, 2003;Penner & Paret 2008;Ivrendi, 2011;Ramani, Siegler, & Hitti, 2012;Laski & Siegler, 2014, Gulec & İvrendi, 2017).Garon-Carrier et al. (2018) also suggest that children acquire different levels of skills about number sense in their home setting.Pittalis, Pitta-Pantazi and Christou (2016), on the other hand, emphasize that the number sense is highly personal.It is highly important to evaluate the number sense of preschool children due to individual differences and its impacts on primary school period and to develop early intervention programs especially for children in the risk group (who have low number sense) (Aunio et al.,2005;Jordan et al., 2006).Because of the importance of number sense, there is a need for valid and reliable assessment tools for assessing preschool children (Gersten & Chard, 1999).Lago and DiPerna (2010) put forth that a variety of tests have been developed to measure number sense, yet emphasize that a very few of the assessment tools examine reliability and validity.Number Sense Screener (NSS) is one of the assessment tools used in many different studies in order to measure number sense of preschool children and its validity and reliability are proven in line with longitudinal studies (Jordan et al., 2006;Lago & DiPerna;2010;Jordan et al., 2012a;Dyson, Jordan, Beliakoff, Hassinger-Das, 2015;Starr, DeWind & Brannon, 2017).Number Sense Screener is an important assessment tool for being a psychoeducational assessment instrument to monitor number development, predict later mathematics achievement, and identifies children at risk and to prepare effective intervention programs accordingly (Jordan et al., 2012a).
When examining the literature related to the subject in Turkey, there is an increase in the body of research dedicated to the number skills of children in preschool period in parallel with the greater importance given to the concept of number in the international literature in recent years and there has been a development of various assessment tools for assessing the number skills of children preschool in this direction or their versions adapted to Turkish language.(e.g., Onkol, 2012;Olkun, Fidan, Babacan Ozer, 2013;Pekince & Daglıoglu, 2017;Yilmaz & Inal Kiziltepe, 2017).In an attempt to evaluate the number sense of preschool children, there is another assessment tool called "Assessing Number Sense Instrument" developed by Wakefield andIvrendi (2008) (as cited in Ivrendi, 2011).From this point of view, this study aims to conduct the validity reliability study of the Number Sense Screener (NSS), which is frequently used in international literature to evaluate the number sense of preschool children, for children at 60-71 months of age.It is believed that the validity and reliability study of the Turkish version of NSS is crucial in terms of providing a valid and reliable standard to determine the number sense levels of preschool children in this group of age, identify children with lower number sense, prepare early intervention programs, measure the success of applied training programs and determine the factors affecting number sense.

Research Model
The research model falls within general survey since the study is to determine the current situation with analysis of the data obtained.In general survey, within a universe consisting of a large number of elements, a general survey of the universe is carried out on the entirety of the universe or on a group, sample or sample taken from it (Karasar, 2010).

Participants
The universe of the study consisted of children at 60-71 months of age attending public and private kindergartens under the Ministry of National Education in Efeler district center of Aydın province in the spring term of the academic year 2017-2018.The sample of the study consisted of 658 children attending 12 preschool education institutions which were determined according to socioeconomic level (low-middle-high income) in Aydın, Turkey.The mean age of children is 6.8 and the standard deviation is 5.7.342 of the children were girls (52.0%) and 316 were boys (48.0%).The study was carried out in the preschool education institutions which volunteered to participate in the study in accordance with the legal permits obtained from the Provincial Directorate of National Education.Moreover, permission was obtained from the parents of the children included in the sample through the form developed by the researcher.
Sampling of children was determined by systematic sampling method.The participants of the study were not selected according to any specific characteristic they may possess accordingly, and hence the validity of the research was enhanced.Systematic sampling is a method whereby a starting point (X th) is taken from a unit and selecting every subsequent "x" unit for the sample (Cıngı, 1994).To this end, a list of the classes in which 60-71 month-old-children were enrolled, was obtained in the schools included in the sampling.In line with this list, the assessment tool (second, fourth, sixth, eighth...) was applied to the children determined by the number of "two" selected as the initial unit.In the absence of parental leave form of children selected through systematic sampling method, the assessment tool was applied to the next child in the list.

Data Collection Tool
The data were collected through "Number Sense Screener (NSS), which consists of six subtests.

Number Sense Screener (NSS):
Assessment tool, developed for the purpose of assessing the early numerical competencies of the first grade children of preschool and primary school (e.g., Jordan et al., 2006;Jordan et al., 2007) is a short name for the research-based assessment tool named "Number Sense Brief (33 items)".Assessment tool is of six subtests and its duration of application varies between 15 and 20 minutes.In this study, the validity and reliability analyses of the NSS form consisting of 29 items were performed (Jordan et al., 2012a).Information on the contents and number of items of the Number Sense Screener's subtests is given below: Counting Skills: This subtest consists of three items.The subtest includes items containing counting principles (one-to-one correspondence, cardinality, and stable order) and rhythmic counting (counting rhythmically until a predetermined number) (Jordan et al., 2007;Jordan et al., 2012a).

Number Recognition:
In the number recognition subtest consisting of four items, children are asked to name the numbers shown to them (such as 13, 37) (Jordan et al., 2009;Jordan et al., 2012a).Griffin (2002) consists of seven items.In this segment of the test, children are asked to know what number comes after a pre-given number or what number comes two numbers after that pre-given number (e.g., 4) and also given two numbers (e.g., 4 and 3), children were expected to tell which number was bigger or smaller.Furthermore, children were shown a series of numbers, each of which was placed in a corner of an equilateral triangle (e.g., 7, 3, and 6).Children were then asked to identify which number was closer (e.g., 6) to the number in the top corner of the triangle (Jordan, et al., 2012a;Dyson, Jordan & Glutting, 2013).

Number Comparisons: Number comparison subtest adapted from
Nonverbal Calculation: In this subtest, there are a total of four items: three additions and one subtraction.The nonverbal calculation task was adapted from Levine et al. 1992.Items of this section are asked to the children, using a white mat and a box (the cover of the box has an opening cut into the side through which dots can be pushed into the box) and 10 black dots of the same size.For example, three buttons are placed on the box, which can be seen by the children for the collection process and the examiner says "See?Here are three dots".Letting children observe the buttons, buttons are then taken back into the box and the box cover is put back on.Following that, another black dot is placed on the mat and the examiner says "" Here is another black dot "" and asks children to watch attentively and black dot is slided with box lid and put back into the box.After that, pages related to the item in the assessment tool are opened and the children are asked to point to the option showing the total number of black dots inside the box.Because this is a nonverbal task, the child is not penalized if he or she points to the correct rectangle with dots but says the wrong word (Jordan, et al., 2012a;Jordan, et al., 2012b).
Story Problems: Story problems subtest consists of a total of five items including three additions and two subtractions.In this part, the children are informed that to find the right answers, they can use their fingers to ask for help or use the number list (given with the assessment tool) or use paper and pencil to that end.The addition problems are phrased simply, following this basic format: "Susan has m pennies.Jim gives her n more pennies.How many pennies does Susan have now?"Similarly, the subtraction problems are phrased, "Susan has m pennies.Jim takes away n of her pennies.How many pennies does Susan have now?" (Jordan, Glutting, Ramineni;Watkins, 2010;Jordan et al., 2012a).
Number Combinations: Number combinations section consists of six items including four additions and two subtractions and verbally asked as follows: "How much is m and n" and "How much is n take away m?" In this section, as in story problems test, children are informed that to find the right answer, they can use fingers to ask for help or use number list or use paper and pencil (Locuniak & Jordan, 2008;Jordan et al.,2012a).
The items included in the subtests are evaluated as true, false and no response; one point for correct answers, zero score for no response and false answers.For each test, total score is calculated through summing the correct answers given by children for all the six tests.Total number sense score is obtained by the summation of the scores of all subtests (Jordan et al., 2007;Jordan et al., 2012a).Accordingly, maximum score that be achieved for this assessment tool is 29.
The Turkish version of NSS was applied individually to the children included in the group under study by a researcher or examiners trained about the assessment tool, in a quieter setting other than children"s usual classrooms provided by their preschool institutions, complying with the rules and directives in the application booklet.Six examiners selected from among Preschool Education Department"s undergraduate and graduate students were informed in detail about the assessment tool and how to apply it.Afterwards, how each examiner applied the assessment tool to a child (between 60-71 months old) not included in the study group was observed and examiners were given necessary feedbacks according to the notes taken during observation.Furthermore, examiners were divided into dual groups.One of the practitioners in the dual group did not apply the assessment tool to a different child between 60-71 months old, which was not included in the study group, and then calculated the score of the child obtained from the assessment tool in accordance with the sign-up form.The other examiner in the dual group calculated the score obtained by the child by filling in the answers given by the child into his/her own sign-up form.Following that, whether each two examiner scored the tasks the same way was checked.

Adaptation of NSS into Turkish
The adaptation of the NSS into Turkish has been started with the translation of the assessment tool into Turkish with the original instruction booklet in English language.First, forms (instructions booklet and assessment tool) were translated into Turkish by three language experts and then translated back into English through back-translation technique.Forms that had been translated into Turkish were then translated back into English using back-translation technique.Forms were reviewed by a language expert who had a good command of both Turkish and English and it was concluded that forms were in accord with one another in terms of expression and cohesion.Forms translated into Turkish were reviewed by a Turkish language expert and the necessary changes were made and then they were finalized.
Expert opinion was used to determine the qualitative and quantitative efficacy of the assessment tool items in assessing and encompassing (content validity) a trait which is aimed to be assessed (McGartland et al.,2003).To this end, forms were sent out to seven field experts (preschool education, class education, mathematics education) and they were asked to evaluate the suitability of instructions and test items of NSS in terms of expression, Turkish culture, evaluation the number sense of children at 60-71 months of age, the fields that include subtest items.
With the consensus of the experts, the items they found appropriate were taken as they were into the Turkish version of the assessment tool and the items proposed to be corrected were modified accordingly.In this respect, the writing of the number four was changed (4 instead of 4) in such a way that it is used more frequently in Turkish language.In an effort to have an idea about comprehensibility of the assessment tool by children and the approximate duration of application, the Turkish version of the NSS was applied on a small group (n = 10) and it was observed that the scale items were understood by the children.

Data Analysis
Content validity of Number Sense Screener was conducted through Lawshe technique in line with expert opinions and content validity ratios (CVR) and content validity indices (CVI) were calculated.In the reliability and validity analyses of Number Sense Screener, Rasch analysis through jMetrik 4.111 software which is based CMLE=Conditional Maximum Likelihood Estimation and PROX=Normal Approximation methods was used.
Table 1 shows the acceptable ranges for the measures to be used for unweighted and standardized goodness-of-fit statistics performed by Linacre (2002) in the evaluation of the results obtained from the software.Unproductive for assessment process yet not unwieldy either.0.5 -1.5 Highly suitable for assessment.≤ 50 Less productive for assessment process yet not that ineffective.Can produce high reliability and discrimination values.
Std.WMS and Std.UMS ≤ 3 Data do not fit the model.A larger sample may be needed.2.0 -2.9 Data cannot be predicted clearly.-1.9 -1.9 Data have a plausible predictability.≤ -2.0 Data are highly predictable.Other dimensions may limit the answer patterns.
When Table 1 is examined, it is determined that the ideal range for WMS and UMS values is between 0.50 and 1.50.
When Table 1 is examined, similarly, it is determined that the ideal range for standardized WMS and UMS values is between -1.90 and 1.90.The columns WMS and Std.WMS show the weighted and standardized mean square outfit statistics, respectively.Following these, the columns UMS and Std.UMS show the weighted and standardized mean square deviation statistics.(Guzeller, Eser &Aksu, 2018).
The reliability coefficients for the assessment tool subtests and for the entire assessment tool were calculated by using Guttman's lambda-2, Alpha coefficient, Feldt-Gilmer coefficient, Feldt-Brennan coefficient and Raju's beta coefficient methods.Moreover, differential item functioning (DIF) analysis was carried out so as to determine whether assessment tool items are biased towards gender.The Mantel-Haenszel contingency table approach was used to identify items displaying DIF.

Results
As part of the validity and reliability studies of Number Sense Screener (NSS) for 60-71 months old children, this section includes content validity indices difficulty and discrimination values of items, standard deviation values, reliability analyses for the entirety of assessment tool and its items, and differential item functioning (DIF) results to determine whether the item statistics and assessment tool items obtained through Rash analysis are biased toward gender.
In Table 2, content validity indices for the sum of subtests and NSS are presented.The technique developed by Lawshe requires at least 5and at most 40 expert opinions.(akt: Yurdugul, 2005).Seven experts were consulted in this study.
Content validity ratios were obtained through a collection of different expert opinions on the measurement items.Veneziano and Hooper (1997) suggest that the minimum content validity ratio for the seven experts should be 0.99.CVI is calculated based on the total CVR averages of items that are significant at the level of p <0.05 and that are to be taken into the final form.(akt: Yurdugul, 2005).
According to the Lawshe technique, the CVI values for each subtest of the assessment tool were obtained as 1.00.Accordingly, it was decided that each item should stay in the measure and accepted that the test had content validity.
Descriptive statistics of difficulty, discrimination and standard deviation values of NSS items are shown in Table 3.
When Table 3 is examined, it is see that the difficulty values of NSS items vary between 0.08 and 0.96, discrimination indices vary between 0.17 and 0.53.According to this result, it was determined that all the items in the assessment tool except for the 7th were at acceptable levels, 7th item was found to have low discrimination value and be highly difficult.In Table 4, it is seen that the reliability coefficient calculated according to five different methods is above the value of .80 value which is considered as critical for the assessment tools.At the same time, it was determined that the 7th item which has lower difficulty value than other items and found difficult by children has a significantly high reliability coefficient.According to this result, it was determined that the NSS materials meet the assumption of reliability in terms of internal consistency.
The reliability coefficients for the sum of the assessment tool and the 95% confidence interval and standard error values for this value are shown in Table 5.  5 is examined, it is seen that the reliability coefficients determined by different methods for the total of 29 items vary between .826 and .837.According to this result, the results obtained from the sum of the assessment tool are considered to be reliable as well.
In terms of validity and reliability, it is not sufficient to examine only item difficulty and discrimination indices when determining of substances that should be included in an assessment tool.For this reason, goodness-of-fit indices with converted discrimination indices were calculated via Rash analysis being one of the 1-parameter logistic models.Table 6 shows the item statistics obtained through Rasch analysis according to the following parameters: maximal number of iterations = 150, convergence criteria= 0.005 and extreme score adjustment= 0.3.) and Weighted Mean Square (WMS) fit statistics are the goodness-of-fit statistics for the items.Within these values, WMS is considered to be the infit measure, while UMS stands for outfit.As can be seen in the Table, the assessment tool consisting of 29 items is suitable for the assessment process to be conducted according to WMS and UMS values (Linacre, 2002).As the infit and outfit values of the items were in the range of 0.50-1.50,the-goodness-of-fit statistics of the substances were found to be quite satisfactory.According to standardized infit and outfit values, it was determined that the 9th, 17th, 26th and 29th items of the assessment tool did not provide a model -data compatibility.
DIF analysis was carried out of find out whether NSS items were biased towards gender.The results obtained based on the common odds ratio with Mantel-Haenszel method are shown in Table 7.  7, it is determined that all items except for the 10th item have a negligible level of item function.The Item Characteristic Curve of the 10th item, determined to be biased at an intermediate (B-) level, is shown in Figure 1.When Figure 1 is examined, it is seen that girls who were specified as the reference group scored higher than boys when the total of points were in the range of 0-20.However, boys scored higher when the total points were in the range of 20-25 and the case changes again as the range goes above 25.This result validates that 10th item had a negligible level of DIF.On the other hand, it was concluded that the results obtained for the entirety of 29th item were not biased such that it would leave any group at an advantage or disadvantage since an item has to have a level C DIF to be determined as biased (Koyuncu, Aksu & Kelecioglu, 2018).

Discussion and Conclusions
The validity and reliability study of the Number Sense Screener (NSS), which was used frequently for preschool period in abroad, was performed in order to introduce a valid and reliable assessment tool into literature and to assess the number sense of 60-71-month-old Turkish children.As part of the validity study, content validity of the assessment tool was examined and evaluated in accordance with seven field experts through Lawshe Technique.According to the Lawshe technique, the CVI values for each subtest of NSS were obtained as 1.00.Accordingly, it was decided that each item should stay in the assessment tool and accepted that the assessment tool had content validity.
Results of the analyses show that the difficulty values of NSS vary between 0.08 and 0.96, discrimination indices vary between 0.17 and 0.53.According to this result, it was determined that all the items in the assessment tool except for the 7th were at acceptable levels, 7th item was found to have low discrimination value and be a difficult question for the children at 60-71 months of age.In addition to this, reliability coefficient of the 7th item varied between .824 and .835and in the light of expert opinion, it was not to be removed from the assessment tool.In this item being a part of number recognition subtest, children are asked to name the number "124".According to the Ministry of Education"s Preschool Education Program which is currently implemented in Turkey and last updated in 2013, a study associated with numbers should first include numbers from 1-20.In other words, studies with only single and two-digit numbers are found necessary in accordance with the education program in preschool education institutions.Therefore, children included in the sample are thought to have difficulty naming the number "124" as it is a three-digit number.
For the items included in the assessment tool, it was found that the item reliability coefficients calculated by using Guttman's lambda-2, Alpha coefficient, Feldt-Gilmer coefficient, Feldt-Brennan coefficient and Raju's beta coefficient methods were found to be above .80which is accepted to be the critical threshold.In addition, it was determined that the reliability coefficients calculated using the same methods for the entire assessment tool varied between .826 and .837.High reliability coefficients obtained indicate that the assessment tool is reliable.The original reliability and validity study of Number Sense Screener was conducted on children in the same sampling group, in different periods of time.Assessment tool was applied to children included in the sample in three different periods of time as follows: fall term of kindergarten (mean age=5,8 and SD=3.7), spring term of kindergarten (mean age=6,2 and SD=3.7) and fall term of first grade (mean age=6,8 and SD=3.7).As a result of the analyzes, it was determined that the reliability coefficient for the entirety of assessment tool was .82 for the fall term of kindergarten, .86 for the spring term of kindergarten, .87 for the fall term of first grade and .85 for average (Jordan et al., 2012a).In the reliability and validity study conducted by Jordan et al. (2006), Cronbach's alpha (α) coefficient, calculated for the entirety of the assessment tool, was also found to be above the critical value of .80.These results show similarity with the reliability coefficients obtained from the Turkish version.
The infit and outfit values of the Turkish version of the NSS were found to be in the range of 0.50-1.50.In the literature, it is put forth that the ideal range for infit and outfit values is between 0.50 and 1.50 (Guzeller et al., 2018).In this respect, it can be said that the goodness-of-fit statistics of the assessment tool are in the ideal range.In the results of the Rasch analysis of the original assessment tool using the Winsteps software, similarly, it was determined that the infit and outfit values were in the range of 0.50-1.50for the first and third items (item 1 outfit = 0.43, item outfit = 0.42) (Jordan et al., 2012a).
As a result of the DIF analysis conducted to determine whether the NSS items were biased towards gender, only the 10th item was found to be intermediate-level (B-) biased.In order for an item to be specified as biased towards a particular group, it must show a minimum C level of DIF (Koyuncu et al., 2018).Therefore, it can be concluded that the 29 items in the Turkish version of the assessment tool are not biased in a way that would leave a certain group advantageous or disadvantageous in terms of gender.It was examined with the Mantel-Haenszel method whether the assessment tool items in the original form of the NSS were biased towards gender and it was decided that the assessment tool was not biased towards gender since only a negligible level of item function was determined in one item (Jordan et al., 2012a).
It was determined that the results of validity and reliability analysis carried out for 72-84 month-old children also indicate that reliability coefficients for the entirety of the assessment tool range between .876 and .884,infit and outfit values are in the ideal range (0.50-1.50) as considered in the literature (Uyanık-Aktulun, 2019).
In the light of the findings obtained from the study, it was concluded that the Turkish version of NSS is a valid and reliable assessment tool for assessing the number sense of children between 60-71 months of age.One of the strengths of this study is that it evaluates the validity and reliability of the data obtained from NSS using different methods.On the other hand, there are certain limitations to this study.The findings obtained from this study are limited to 658 children at 60-71 months old, who still continue their education in preschool institutions in a province of Aydın/Turkey.In this respect, the validity and reliability of the assessment tool can be re-tested through expanding the group under study and new studies to be conducted.Moreover, cross-cultural studies can be carried out to compare and evaluate the number sense of children in preschool period, with different language versions of NSS.Additionally, it is considered that examining the relationship between the scales related to number skills adapted to Turkish version of NSS and 60-71 months old children is crucial in terms of introducing multiple assessment tools to the field to evaluate the number sense and number skills of children in this group of age.

Figure 1 .
Figure 1.Item Characteristic Curve for item 10

Table 1 .
Interpretation of parameter-level mean-square fit statistics

Table 2 .
Content validity indices for the total of NSS and its subtests

Table 3 .
Descriptive statistics for NSS items Reliability values for NSS items obtained through different methods are given in Table4.

Table 4 .
Reliability values for NSS items

Table 5 .
Reliability Analysis Results for NSS

Table 6 .
Rash Analysis Results for NSS Items