Adaptation of Teachers ’ Self-Efficacy Towards Teaching Thinking Skills Scale Into English

Teaching thinking skills is core of the curriculum in many cultures. Although curricula have mutual points, the results were different from each other. Having different results of teaching thinking curriculums applied in different countries indicates importance of personal differences of teachers. Defining level of teacher’s self-efficacy makes a contribution to determining success level of curriculum. Developing a global scale contributes both researchers and practitioner while teaching thinking. The aim of this study is to adapt Teachers’ Self-efficacy towards Teaching Thinking Skills Scale (TSTS), developed in Turkish, into English. The scale consists of 20 items. Each item has 5 point Likert type. It has 3 factors as Design, Practice and Academic Competence. For this purpose, firstly linguistic equivalence was analyzed. For linguistic equivalence, both versions of the scale were applied to 28 candidate teachers of English with 20 day intervals. In the second step, Confirmatory Factor Analyses was administrated to data taken from 144 native English teachers. After the analysis, good level fitting indices were found. Cronbach Alpha coefficient value is .94. Construct validity (Convergence and Discriminating validity) study on correlations between sub-dimensions and average explained variance value has good sufficient level. Scale items were found discriminating. Results showed that English version of the scale was statistically valid and reliable.


Introduction
'Even in our century, students don't know how to think' (McGrane & Sternberg, 1992: 339).Despite many efforts on curriculum development in order to grow up new thinking generation, we aren't at the desired level.If many countries applied similar programs, why obtained results are different?So, other variables on teaching thinking such as learners' and teachers' individual differences should be reconsidered again.Teachers' self-efficacy level which affect nearly teachers' all attitudes in classroom is one of them (Bandura & Schunk, 1981;Zimmerman & Kitsantas, 2005).Assessing teachers' level of self-efficacy becomes one of the important part in understanding causes of the different results.

Literature Review
In the last decade, there has been a new tendency in curriculum and instruction which is called thinking skills (Dilekli & Tezci, 2015;Zohar, 2013).However, definition or the scope of this term was not definitely defined up to 2001 (Costa, 2001;McGuinness, 1999;McGregor, 2007;Nispet, 1990;Wilks, 2005;Hashim, 2004;Tebbs, 2000;Alnesyan, 2012).60 scientists defined scope of the term as problem solving, critical thinking, creative thinking and decision making (Costa, 2001).Definition of the term, the most difficult part of the problem, was solved.The second step was to develop or shape new curricula for teaching thinking.For this purpose, three approaches were applied in education.The first one was subject/content free programs for teaching thinking (Eg; Instrumental Enrichment by Fuerstein, Teaching Thinking Program by Sommerset, Top Ten Tactics by Lake and Needham, CoRT by De Bono) (Dilekli, 2015).Content free approach was criticized because of neglecting the real value of the knowledge or disciplines.An alternative approach was aroused which is based on content (Eg: Cognitive Acceleration Program, Thinking through Series by Adey and Sayer; Philosophy for Children Program, by Lipman) (McGregor, 2007).However, this approach was also criticized because of the problems occurred during transfer.Transferring thinking skills from one discipline to another was very difficult for learners, and they couldn't transfer this skill with the desired level of competency (Winch, 2010).The last approach launched by Swarts and Parks (2004:9) was called as infused approach.In this approach, components of thinking skills were defined as key competencies of all the teaching curricula.In this approach, contents or subjects are used as tools for teaching thinking.At this point, educational activities done in the classroom become one of the most important part of teaching thinking (Dilekli &Tezci, 2015).
The term self-efficacy is defined as people's beliefs about their capabilities to produce designated levels of performance that exercise influence over events that affect their lives (Bandura, 1994:72).Self-efficacy is effective on our feelings, way of thinking and motivation (Zimmerman & Bandura & Martinez-Ponz, 1992).Self-efficacy makes learners and teachers more enthusiastic on their jobs.As teaching thinking is a long process, teachers and learners should be patient to reach the desired level.Teachers having low self-efficacy levels give up teaching thinking when they face a problem (Onosko, 1991;Hashim, 2004;Alnesyan, 2012).Similarly, Ashton and Webb (1986) indicated that teachers with low self-efficacy were easily discouraged while teaching the subjects needing long time and effort.However, teachers having high self-efficacy level were more successful in teaching thinking skills and their students' level of self-efficacy increased meaningfully (Hampton, 1996).Measuring teachers' self-efficacy level allows us to understand teachers' attitudes and behaviors (Kaya, 2008).Assessing teachers' self-efficacy level helps us to design better curriculum for teaching thinking.However, assessing self-efficacy is not easy, so it is difficult to mention about a global instrument assessing self-efficacy.Notwithstanding, there are many other self-efficacy culture specific scales for different disciplines or specific subject-matters.

Participants
In this study, it was aimed to analyze linguistic validity and reliability of TSTS scale's English version.For this purpose, the translation of the scale was checked by two native English Language experts knowing Turkish and two native Turkish Language experts knowing English.These experts were academic staff of Turkish and English languages teaching departments with Ph.D. degree.Furthermore, in order to check linguistic equivalence of the translated version of the scale, in the first step, it was applied to 28 candidate teachers of English face to face with 20-day interval.10 of these candidate teachers were male and 18 of them were female.In the second step, the data were collected from 144 volunteer teachers whose native language is English for validity and reliability analysis of TSTS scale.112 (77.8 %) of them were female and 32 (22.2%) of them were male.Participants' professional seniority were as follows; 8 (5.6%) of them had 0-5 year experience, 22 (15.3%) of them had 6-10 year experience, 24 (16.7%)them had 11-15 year experience, 27 (18.8%) of them had 16-20 year experience and 63 (43.8%) of them had 21 and over year experience.The data were collected via internet by sending e-mails directing to the survey link.Besides, the teachers having this link were free to share other volunteer teachers.

Data Collection Tool
Turkish version of TSTS scale consisted of 20 items.The scale is 5 point Likert scale, 5 = Strongly Agree, 4= Agree, 3 = Undecided, 2= Disagree and 1= Strongly Disagree.After Explanatory Factor Analysis (EFA), it was found that the scale developed by Dilekli and Tezci (2015) has three factors.The first factor called Academic Competence (Competence) explains 27,878% of the variance and consisted of 8 items, the second factor Practice explains 22.637% variance and consisted of 8 items, the last factor Design explains 12.259% of the variance and consisted of 4 items.The total explained variance is 62.774%.Cronbach Alpha Coefficient values were as follows; for Academic Competence (Competence) .89,for Practice .93 and for design .74;for overall the scale, it was found .95.
For linguistic equivalence, correlational analysis and paired samples t-test were administrated to the data collected form 28 candidate teachers of English.Then confirmatory factor analysis (CFA) were administrated to the data collected via emails from 144 teachers whose native language is English.CFA is a statistical technique used to verify the factor structure of a set of observed variables or their underlying latent constructs exists latent variables (Büyüköztürk, Şekercioğlu & Çokluk, 2014).CFA analysis was used to test the relationships among factors and which variables related to which factors, the level of relationships among the factors and whether the data fit a hypothesized measurement model (Bentler & Bonett, 1980;Tabachnick & Fidell, 2007).In other words, CFA was used to test meaningfulness of the relationships between the observed variables and the theoretical structure defined by Explanatory Factor Analysis (EFA).
For defining fitness of the factor structure, different indices can be used.Firstly, Chi-Square and p value were checked.Later, AGFI (Adjusted Goodness of Fit Index) and GFI (Goodness Fit Index) values were analyzed.The value 1 indicates perfect fit for AGFI and GFI.Furthermore, RMSEA (Root Mean Square Error of Approximation), RMR (the Root Mean Residual) and SRMR (the Standardized Root Mean Square Residual) values were calculated.PGFI (Parsimony Goodness-of-Fit Index) is another indices showing the simplicity.Another indices CFI (Comparative Fit Index) is used for small sampling groups.NFI (The Normed Fit Index) and NNFI (Non-Normed Fit Index) indices let researchers compare and analyze the results without required scattering of the Chi-Square.PGFI (Parsimony Goodness-of-Fit Index) is another indices showing plainness of the scale (Bollen 1989;Hoyle 1995;Jöreskog & Sörbom, 1996;Tabachnick & Fidell, 2007).In order to convergent validity, Average Variance Extracted (AVE) and Composite Reliability (CR) indices were checked.AVE indices should be higher than .50,but if CR index is higher than .60,having higher than .40value for AVE index is acceptable (Fornell & Larcker, 1981).In order to see internal consistency of the adopted scale, Cronbah's Alpha analysis was administrated.For defining item discriminating indices, top and low group 27% technique was used.Furthermore, for item discriminating of a dimension (Practice, Design and Competence) and items independent sample t test were administrated between the groups formed by means of the 27% technique.

Translation Process and Linguistic Equivalence
Firstly, the scale was translated by the researchers.Later, three experts, teaching English, whose mother tongue is Turkish checked the translation and in this step no correction was made.Then, three experts, teaching English, whose mother tongue is English and checked the translation, again no correction was made at this stage.After forming English version of the scale, both versions of the scale were applied to 28 candidate teachers of English at 20 day intervals.T-test and correlational analysis results of these applications were given in Table 1.Analysis results showed that there were positive and meaningful relationships among the scale items.The lowest correlation (r=.574, p<.05) was seen in item 7 and the highest correlation (r=.862, p<.05) was seen in item 13.The results obtained from both versions of scale showed medium and perfect degree of correlation.However, according to the t test results, both versions of the scale items didn't have meaningful relationship (p>.05).

Results of CFA Analysis
CFA analysis was administrated in order to determine whether the factor structure of English version, which consists of three factors, was the same its Turkish version or not.According to the results of CFA analysis, the value of CFA was found as  2 /df =305.88/167=1.83and the RMSEA value in this study was found =.76.These values are between acceptable levels.However, some other values showing the simplicity of the scale were found low (GFI=.88;AGFI=.78).After the proposed three modifications (between item 19 and 20; item 5 and 9; 13 and 14), better fitting indices were seen.Analysis results were given in Table 2.

Convergent and Discriminant Validity
Average Variance Extracted (AVE) values and Composite Reliability (CR) scores were shown in Table 3.All AVE values were higher than .50,which provide evidence for the convergent validity of our scale.All CR values were expected to be higher than .70(Gouveia & Soares, 2015).Having over .50factor load and AVE values show the convergent validity of the scale (Fornel & Larcker, 1981;Peterson, 2000).The lowest factor load was .61 in item 1 and the highest factor load .86 in item 16.The rest were between these values.As it was seen in the Table 3. AVE values were found for Design dimension .56;for Practice dimension .63 and for Competence dimension .67.

Reliability and Discrimination Indices
Cronbach Alpha and Omega reliability were calculated for reliability and internal validity.Furthermore, item total variances correlations and item discriminations indices were calculated.The results were shown in Table 4.These values indicated that all of the items on the other dimensions were moderately correlated with total scores on their respective dimension.The t values for the top 27 and lowest 27 percent of the students were significant and ranged from 5.669 to 19.411, which provides evidence supporting the discriminatory power of the scale items.

Discussion
The aim of this study is to adapt into English TSTS scale developed in Turkish.Firstly, the scale translated into English and the translated scale applied to 28 candidate teachers of English at 20 day intervals in order to analyze its understandability.Paired samples t test and correlation analysis were administrated with the data taken from this application.It was found medium and high correlation between Turkish and English versions of the scale, but it was not found meaningful relationship between the two versions of the scale in paired samples t test analysis.As a result, linguistic equivalence analysis depicted that English version of the scale is understandable and clear and refers to similar meanings with Turkish version.
In the second step of the adaptation process, translated scale was applied to 144 native English teachers.After the CFA administrated to 144 participants' data, it was seen that the scale has acceptable indices in respect for RMSEA, RMR, SRMR on the other hand some GFI and AGFI values were found low.Then three proposed corrections based on error variances were administrated.With these proposed modifications, GFI and AGFI values reached acceptable level.Such as; RMSEA went down from .076 to .060 and RMR from .66 to .65,SRMR from .58 to 57.GFI and AGFI vales found .90 and .85which were between acceptable levels.Consequently, English version of TSTS scale have the same factor structure with its Turkish version.This finding is similar to Dilekli and Tezci (2015) who developed this scale.But, Dilekli and Tezci (2015) didn't make modifications based on error variances.
Convergent validity and discriminant validity analyses were administrated.All the factor load values were higher than .50(the values range from .61-.86) and each dimension AVE indices were higher than .50,these values showed that the scale has convergent validity.According to Fornel andLarcker (1981), andPeterson (2000) this study is higher than the study by Dilekli and Tezci (2015).There are many teachers' self-efficacy scale special for some disciplines (Eg; Enochs & Riggs, 1990;Tschannen-Moran, Woolfolk Hoy, & Hoy, 1998), but there are very limited number of scale for assessing teachers' self-efficacy for teaching thinking skills.One of these scale developed by Tebbs (2000) has 4 factors.The scale was adapted by Kaya (2008) into Turkish.However, Kaya found that the adopted version has a different construct from the original version of the scale.This shows that linguistic and cultural differences have an important role in adaptation process.Stes, De Maeyer & Van Petegem, (2010) indicated that translation and cultural differences are effective factors in the adaptation process.Similarly, Meyer & Eley (2006) draw attention that any scale shouldn't be used without adaptation of a scale.Schwarzer, Bäßler, Kwiatek, Schröder, & Zhang (1997) studied on adaptation of a scale, originally in German, into Spanish and Chinese culture.Although they found meaningful relationship between means and gender, they didn't get satisfactory results in reliability and validity analysis.However, Luszczynska, Scholz, & Schwarzer (2005) found that self-efficacy is a global construct in their meta-analysis study from different cultures.TSTS scale can be used to define level of self-efficacy of the teachers teaching any discipline as its items are prepared based on infused teaching thinking skill approach.Moreover, it is a self-report survey which makes easier to collect data.As TSTS items were prepared according to the infused approach principles, it gave opportunity to compare teacher self-efficacy levels teaching different disciplines.
Onosko (1991) found that teachers' self-efficacy is one of the main problem in teaching thinking skills.Having low self-efficacy results in selecting wrong or inappropriate teaching techniques in class (Hoy & Spero, 2005).Andersen, Dragsted, Evans, & Sorensen ( 2004) and Pearson & Moomaw (2005) found that teachers having low self-efficacy aren't ready to take risk or trying new techniques in the classroom.From this respect, giving education on teaching thinking during the preservice period in education faculty will increase their self-efficacy level.Klassen, Tze, Betts & Gordon (2011) concluded that as self-efficacy scales based on self-report measures, there may be inconsistency between the findings of the scale and teachers' practice in the classroom.In fact, the disagreements about self-efficacy scale arise from that the most scales are for special disciplines or teachers' personal characteristic effecting the measurement process.For this reason, measurements related to self-efficacy are more related to attitudes of the teachers towards practice than actual practice.The adopted scale in this study based on infusion approach principles explained by McGuinnes (1997) and Swarts and Parks (1994) as pedagogical principles of thinking skills.Bandura (1997) indicated that self-efficacy scale should be focused on belief based on the current ability.But, some study on developing self-efficacy scales either measures abilities (Ho & Hau, 2004) or capacity to apply special abilities or skills.For this reason, during the adaptation process of a scale, translation of items is very important step as it may cause different semantic results.Therefore, the structural equivalence of the scale, both linguistically and culturally should be ensured that the translation is as accurate as possible (Orakcı, 2018).Klassen et all. (2009) adapted TSES scale developed by Tschannen-Moran & Woolfolk Hoy (2001).During the adaptation process, researchers collected data from 6 different countries and reached similar validity and reliability results.Moreover, they also found relationships between different constructs and self-efficacy of teachers from different countries.Values from different cultures and level of importance to the values, beliefs and cultural attitudes could affect educational values and practices.However, having similar results between adapted and original versions of the scale supports the idea that self-efficacy is a global construct (Luszczynska and et all., 2005).Similarly, the study by Scholz, Doña, Sud & Schwarzer (2002) reached similar results in their study, whose data were collected from 25 different countries, about teachers' self-efficacy.

Limitations
In this study, it is not analyzed for teaching critical thinking skills and criterion validity of the scale for defining perception of self-efficacy.It is also not analyzed relationship with different constructs (Eg; self-confidence, atttudes).
Researches that handle relationship between the scale and different constructs make a significant contribution to global value of the scale.
from .076 to .060 and NFI indices increased from .95 to .96.These indices show excellent fitting levels.AGFI indices showing the simplicity of the model increased from .82 to .85.The path of coefficients, from observed variables to latent variables, were meaningful.The standardized path diagram was given in Figure 1.

Figure 1 .
Figure 1.Parameters of Standardized CFAAs it was seen in the Figure1, in Design dimension, the highest path coefficient was seen .82(t=11.38,p<.05) in item 2 and the lowest path coefficient was .61(t=7.53,p<.05) in item 1.For Practice dimension, the highest path coefficient was .82(t=11.83and t=11.73,p<.05) in items 5 and 9, the lowest path coefficient was .70 (t=9.51,p<.05) in item 12.For Competence dimension, the highest path coefficient was .86 (t= 12.79, p<.05) in item 16 and the lowest one .76(t=10.56,p<.05) in item 20.

Table 1 .
Correlation and t Test Analysis Results of Turkish and English Versions of the Scale

Table 2 .
Goodness-of-fit indices for CFA according to three factor model After the proposed modifications based on the error variance were made, the fitting indices RMSEA value went down

Table 3 .
AVE, CR Values and Correlations among Dimensions Note: Square root of average variances extracted are shown on diagonal.

Table 4 .
Reliability and Item Total Correlations alpha values providing evidence for the reliability of the scale were found .85 for Design, .94 for Practice, and .95for Competence factors.Omega reliability indices of all sub factors are the same Cronbach's Alpha values.These values indicated that the participants replied in a consistent manner to the scale items.Corrected item total correlation scores ranged from .49 to .63 for Design, .58 to .75 for Practice, .60 to .71 for Competence dimensions.
, having higher than .50AVE value shows convergent validity of a scale.For discriminant validity, correlation analyses were administrated between square root of AVE values and relationships of each sub-dimensions of scale.The results indicated that English version of the scale has discriminant validity.Omega Reliability and Cronbach Alpha values were found high, so it could be said that English version of the scale has internal consistency.Top and low 27% technique was administrated for item discriminating values.The results showed that items were discriminating.The reliability co-efficient found in