SlideShare a Scribd company logo
Presented By:
Rabia Umer
Noor Fatima
1
 Factor Analysis
 Procedure used to reduce a large amount of
questions into few variables (Factors) according to
their relevance.
 Used to know how many dimensions a variable has
 E.g. Organizational Support and Supervisory
Support
 Interdependence technique
2
definition
 Provides a tool for analyzing the structure of
interrelationships (Correlations) among variables by
defining a set of variables which are highly correlated
known as Factors.
 Factors are assumed to represent dimensions within
data.
3
Factor analysis is commonly used
in:
Data reduction
Scale development
The evaluation of the psychometric
quality of a measure, and
The assessment of the dimensionality
of a set of variables.
4
types
 Exploratory
 When the dimensions/factors are theoretically unknown
 Exploratory Factor Analysis (EFA) is a statistical
approach to determining the correlation among the
variables in a dataset. This type of analysis provides a
factor structure (a grouping of variables based on strong
correlations).
5
 Confirmatory
 When researcher has preconcieved thoughts about the
actual structure of data based on theoretical support or
prior research.
 Researcher may wish to test hypothesis involving issues
as which variables should be grouped together on a
factor.
6
Example
 Retail firm identified 80 characteristics of retail stores
and their services that consumers mentioned as
affecting their patronage choice among stores.
 Retailer want to find the broader dimensions on which
he can conduct a survey
 Factor analysis will be used here
7
8
Factor analysis decision process
 Objectives of factor analysis
 Designing a factor analysis
 Assumptions in factor analysis
 Deriving factors and assessing overall fit
 Interpreting the factors
 Validation of factor analysis
 Additional use of factor analysis research
9
10
Objective
 Data summarization
 Definition of structure
 Data reduction
 Purpose is to retain the nature and character of original
variables but reduce their numbers to simplify the
subsequent multivariate analysis
11
12
 Both type of factor analysis use correlation matrix as
an input data.
 With R type we use traditional correlation matrix
 In Q type factor analysis there would be a factor matrix
that would identify similar individuals
13
Difference between Q analysis and
cluster analysis
 Q type factor analysis is based on the intercorrelations
between respondents while cluster analysis forms
grouping based on distance based similarity measure
between repondant’s scores on variables being
analyzed.
 Variable selection and measurement issues
 Metric variables should be there
 If non metric then use dummy variables to represent
catagories of non metric variables
 ]=If all non metric then use boolean factor ana
14
15
assumptions
 Basic asumption: some underlying structure does exist
in set of selected variables (ensure that observed
patterns are conceptually valid).
 Sample is homogenous with respect to underlying
factor structure.
 Departure from normality, homoscedasticity and
linearity can apply to extent that they diminish
observed correlation
 Some degree of multicollinearity is desirable
16
 Researcher must ensure that data matrix has sufficient
correlations to justify application of factor analysis(No
equal or low correlations).
 Correlation among variables can be analyzed by partial
correlation (Correlation which is unexplained when
effect of other variables taken into account). High
partial correlation means factor analysis is
inappropriate.Rule of thumb is to consider correlation
above 0.7 as high.
17
 Another method of determining appropriateness of
factor analysis is Bartlett test of sphericity which
provide statistical significance that correlation matrix
has significant correlation among at least some
variables
 Bartlett test should be significant i.e. less than 0.05
this means that the variables are correlated highly
enough to provide a reasonable basis for factor
analysis. This indicate that correlation matrix is
significantly different from an identity matrix in which
correlations between variables are all zero.
18
19
 Another measure is measure of sampling adequacy.
This index ranges from 0 to 1 reaching 1 when each
variable is perfectly correlated. This must exceed 0.5
for both the overall test and individual value
20
4 stage: deriving factors and
assessing overall fit
 Method of extraction of factors is decided here
 Common factor analysis
 Component factor analysis
 Number of factors selected to represent underlying
structure in data
 Factor extraction method
 Decision depend on objective of factor analysis and
concept of partioning the variance of variable
21
 Variance is value that represent the total amount of
dispersion of values about its mean.
 When variable is correlated it shares variance with
other variables and amount of sharing is the squared
correlation.e.g 2 variables having .5 correlation will
have .25 shared variance
 Total variance of variable can be divided in 3 types of
variances
 Common variance variance in variable which is shared
with all other variables in analysis. Variable’s
communality is estimat of shared variance
22
 Specific variance
 Variance associated with only specific variable. This variance cant be explained
by correlation
 Error variance
 Unexplained variance
 Common analysis consider only the common or shared variance
 Component analysis consider the full variance. It is more appropriate when
data reduction is a primary concern.Also when prior reserrch shows that
specific and error variance reprresent a relatively small proportion of total
variance .
 Common analysis is mostly used when primary objective is to identify the
latent dimensions and researchers have little knowledge abtout number of
specific and error variance.
 In most applications both common and component analysis arrive at
essentially identical results if number of variables exceed 30.or communlaities
exceed .6 for most variables.
23
Criteria for number of variables to
extraxt
 An exact quantitative base for deciding number of
factors to extract has not been developed. Different
stopping criteria's are as follow:
 Latent root criteria
 With component analysis each variable contributes a
value of 1 to the total Eigen values. Thus only the factors
having the latent roots or Eigen values greater than 1 are
considered significant.
 This method is suitable when number of variables is
between 20 and 50.
24
 A priori criterion
 Researcher already knows how many factors to extract. Thus researcher
instruct the computer to stop analysis when specified number of factors
have been extracted.
 Percentage of variance criterion
 Approach based on achieving a specified cumulative percentage
of total variance extracted by successive factors.
 In natural sciences factor analysis can’t be stopped until the
extracted factors account for 95% of variance
 In social sciences criteria can be 60% variance
 Scree test criterion
 Proportion of unique variance is substantially higher in the later
variables. The scree test is used to identify the optimum number
of factors to be extracted before the amount of unique variance
begins to dominate the common variance structure.
25
 Scree test is derived by plotting the latent roots against
the number of factors in their order of extraction. The
shape of resulting curve is used as a criteria for cutting
off point.
 The point at which the curve first begin to straighten
out is considered to indicate the maximum number of
factors to be extracted.
 As a general rule scree test results in at least 1 and
sometimes 2 or 3 more factors being extracted than
does latent root criteria.
26
27
Stage 5: Interpretting the factors
 Three process of factor interpretation includes:
 Estimate the factor matrix
 Initial un rotated factor matrix is computed containing
factor loading for each variables.
 Factor loadings are correlation of each variable and
factor
 They indicate the degree of correspondence between the
variable and factor
 Higher loading indicates that variable is representative
of factor
 They achieve objective of only data reduction.
28
29
 Factor rotation
 As un rotated factor don’t provide the required information
that provide adequate interpretation of data. Thus we need
the rotational method to achieve simpler factor solutions.
 Factor rotation improves interpretation of data by reducing
ambiguities
 Rotation means that reference axes of factors are turned
about the origin until some position has been achieved.
 Un rotated factor solution extract factors in order of their
variances extracted (i.e first factors that accounts for the
largest variance and then so on)
 Ultimate effect of rotation is to redistribute the variance from
earlier factors to later ones
30
 Two methods of factor rotation includes
 Orthogonal factor rotation
 Axes are maintained at 90 degree.
 Mostly used as almost all software include it
 More suitable when research goal is data reduction
 Oblique factor rotation
 Axes are rotated but they don’t retain the 90 degree angle between
reference axes.
 Oblique is more flexible
 Best suited to the goal of obtaining several theoretically meaningful
factors
In factor matrix column represent factor with each row corresponding
to variable loading across factor
31
 Major orthogonal factor rotation approaches include:
 Quartimax
 Goal is to simplify the rows of factor matrix i.e. it focus on rotating
the intial factor so that variable loads high on one factor and as low
as possible on other factors.
 Varimax
 Goal is to simplify the columns of factor matrix. It maximizes the
sum of variance of required loading of factor matrix
 With this some high loadings (close to +1 or -1 are likely as are some
loadings near zero.
 Equimax
 Compromise between quartimax and varimax. It hasn’t gain wider
acceptance
32
 Oblique rotation
 SPSS provide OBLIMIN for factor rotation.
 Selecting among variables
 No specific rule for that
 Mostly programs have varimax
 Judging the significance of factor loading
 Ensuring practical significance
 Making preliminary examination of factor matrix in terms of
factor loading.
 Factor loading is correlation of variable and factor, the squared
loading is amount of variable’s total variance accounted for by
the factor.
 .50 loading denotes that 25% of variance is accounted for by the
factor.
 Loading must exceed .70 for factor to account for .50 variance
 Loadings .5 are considered practically significant and .7 are
indicative of well defined structure 33
34
 Assessing statistical significance
 Concept of statistical power is used to determine factor
loadings significant for various samples.
 In sample of 100 factor loading of .55 and above are
significant
 In sample of 50 factor loading of .75 is significant
 For sample of 350 factor loading of .3 is significant
35
Interpreting factor matrix
 It’s a 5 step process
 Step 1: Evaluate factor matrix of loadings
 Factor loading matrix contains the factor loading of each variable
on each factor.
 If oblique rotation is used it provides 2 matrices
 Factor pattern matrix: loadings that show unique contribution of
each variable to factor
 Factor structure matrix: Simple correlation between variables and
factor but loading contain both unique variance and correlation
among factors
 Identify significant loadings for each variable
 Interpretation start with the first variable on first factor then move
horizontally. When highest loading for factor is identified underline it and
then move to 2nd variable
 Cross loadings
 Use different rotation methods firstly to remove cross loading
 Or delete variable
36
 Assess communalities of variables
 Once significant loadings identified, look for variables
that are not adequately accounted for by factor analysis
 Identify variable lacking at least 1 significant loading
 Examine each variable communality i.e amount of
variance accounted for by factor solution for each
variable
 Identify all variables with communalities less than .5 as
not having sufficient explanation
37
 Re specify the factor models if needed
 Done when researcher finds that a variable has no significant
loadings or with a significant loading a variable’s
communality is low. Following remedies are considered there
 Ignore those problametic variables and interpret solutions if
objective is solely data reduction.
 Evaluate each of variables for possible deletion.
 Employ an alternative rotation method.
 Decrease/increase number of factors retained
 Modify type of factor model used (common versus
component)
38
 Label the factors
 Researcher will assign a name or label to factors that
accurately reflect variables loading on that factor.
 Step 6: validation of factor analysis
 Assessing the degree of generalizability of results to
population and potential influences of cases on the
overall result
 i. Use of confirmatory practice
 The most direct method of validating the results
 Require separate software called as LISREL
39
 Assessing factor structure stability
 Factor stability is dependant on sample size and on
number of cases per variable.
 Researcher may split sample into two subsets and
estimate factor model for each subset.
 Comparison of 2 resulting factor matrices will provide
assessment of robutness of solution across sample
40
Running program
41
42
43
44
45
46

More Related Content

Factor Analysis in Research

  • 2.  Factor Analysis  Procedure used to reduce a large amount of questions into few variables (Factors) according to their relevance.  Used to know how many dimensions a variable has  E.g. Organizational Support and Supervisory Support  Interdependence technique 2
  • 3. definition  Provides a tool for analyzing the structure of interrelationships (Correlations) among variables by defining a set of variables which are highly correlated known as Factors.  Factors are assumed to represent dimensions within data. 3
  • 4. Factor analysis is commonly used in: Data reduction Scale development The evaluation of the psychometric quality of a measure, and The assessment of the dimensionality of a set of variables. 4
  • 5. types  Exploratory  When the dimensions/factors are theoretically unknown  Exploratory Factor Analysis (EFA) is a statistical approach to determining the correlation among the variables in a dataset. This type of analysis provides a factor structure (a grouping of variables based on strong correlations). 5
  • 6.  Confirmatory  When researcher has preconcieved thoughts about the actual structure of data based on theoretical support or prior research.  Researcher may wish to test hypothesis involving issues as which variables should be grouped together on a factor. 6
  • 7. Example  Retail firm identified 80 characteristics of retail stores and their services that consumers mentioned as affecting their patronage choice among stores.  Retailer want to find the broader dimensions on which he can conduct a survey  Factor analysis will be used here 7
  • 8. 8
  • 9. Factor analysis decision process  Objectives of factor analysis  Designing a factor analysis  Assumptions in factor analysis  Deriving factors and assessing overall fit  Interpreting the factors  Validation of factor analysis  Additional use of factor analysis research 9
  • 10. 10
  • 11. Objective  Data summarization  Definition of structure  Data reduction  Purpose is to retain the nature and character of original variables but reduce their numbers to simplify the subsequent multivariate analysis 11
  • 12. 12
  • 13.  Both type of factor analysis use correlation matrix as an input data.  With R type we use traditional correlation matrix  In Q type factor analysis there would be a factor matrix that would identify similar individuals 13
  • 14. Difference between Q analysis and cluster analysis  Q type factor analysis is based on the intercorrelations between respondents while cluster analysis forms grouping based on distance based similarity measure between repondant’s scores on variables being analyzed.  Variable selection and measurement issues  Metric variables should be there  If non metric then use dummy variables to represent catagories of non metric variables  ]=If all non metric then use boolean factor ana 14
  • 15. 15
  • 16. assumptions  Basic asumption: some underlying structure does exist in set of selected variables (ensure that observed patterns are conceptually valid).  Sample is homogenous with respect to underlying factor structure.  Departure from normality, homoscedasticity and linearity can apply to extent that they diminish observed correlation  Some degree of multicollinearity is desirable 16
  • 17.  Researcher must ensure that data matrix has sufficient correlations to justify application of factor analysis(No equal or low correlations).  Correlation among variables can be analyzed by partial correlation (Correlation which is unexplained when effect of other variables taken into account). High partial correlation means factor analysis is inappropriate.Rule of thumb is to consider correlation above 0.7 as high. 17
  • 18.  Another method of determining appropriateness of factor analysis is Bartlett test of sphericity which provide statistical significance that correlation matrix has significant correlation among at least some variables  Bartlett test should be significant i.e. less than 0.05 this means that the variables are correlated highly enough to provide a reasonable basis for factor analysis. This indicate that correlation matrix is significantly different from an identity matrix in which correlations between variables are all zero. 18
  • 19. 19
  • 20.  Another measure is measure of sampling adequacy. This index ranges from 0 to 1 reaching 1 when each variable is perfectly correlated. This must exceed 0.5 for both the overall test and individual value 20
  • 21. 4 stage: deriving factors and assessing overall fit  Method of extraction of factors is decided here  Common factor analysis  Component factor analysis  Number of factors selected to represent underlying structure in data  Factor extraction method  Decision depend on objective of factor analysis and concept of partioning the variance of variable 21
  • 22.  Variance is value that represent the total amount of dispersion of values about its mean.  When variable is correlated it shares variance with other variables and amount of sharing is the squared correlation.e.g 2 variables having .5 correlation will have .25 shared variance  Total variance of variable can be divided in 3 types of variances  Common variance variance in variable which is shared with all other variables in analysis. Variable’s communality is estimat of shared variance 22
  • 23.  Specific variance  Variance associated with only specific variable. This variance cant be explained by correlation  Error variance  Unexplained variance  Common analysis consider only the common or shared variance  Component analysis consider the full variance. It is more appropriate when data reduction is a primary concern.Also when prior reserrch shows that specific and error variance reprresent a relatively small proportion of total variance .  Common analysis is mostly used when primary objective is to identify the latent dimensions and researchers have little knowledge abtout number of specific and error variance.  In most applications both common and component analysis arrive at essentially identical results if number of variables exceed 30.or communlaities exceed .6 for most variables. 23
  • 24. Criteria for number of variables to extraxt  An exact quantitative base for deciding number of factors to extract has not been developed. Different stopping criteria's are as follow:  Latent root criteria  With component analysis each variable contributes a value of 1 to the total Eigen values. Thus only the factors having the latent roots or Eigen values greater than 1 are considered significant.  This method is suitable when number of variables is between 20 and 50. 24
  • 25.  A priori criterion  Researcher already knows how many factors to extract. Thus researcher instruct the computer to stop analysis when specified number of factors have been extracted.  Percentage of variance criterion  Approach based on achieving a specified cumulative percentage of total variance extracted by successive factors.  In natural sciences factor analysis can’t be stopped until the extracted factors account for 95% of variance  In social sciences criteria can be 60% variance  Scree test criterion  Proportion of unique variance is substantially higher in the later variables. The scree test is used to identify the optimum number of factors to be extracted before the amount of unique variance begins to dominate the common variance structure. 25
  • 26.  Scree test is derived by plotting the latent roots against the number of factors in their order of extraction. The shape of resulting curve is used as a criteria for cutting off point.  The point at which the curve first begin to straighten out is considered to indicate the maximum number of factors to be extracted.  As a general rule scree test results in at least 1 and sometimes 2 or 3 more factors being extracted than does latent root criteria. 26
  • 27. 27
  • 28. Stage 5: Interpretting the factors  Three process of factor interpretation includes:  Estimate the factor matrix  Initial un rotated factor matrix is computed containing factor loading for each variables.  Factor loadings are correlation of each variable and factor  They indicate the degree of correspondence between the variable and factor  Higher loading indicates that variable is representative of factor  They achieve objective of only data reduction. 28
  • 29. 29
  • 30.  Factor rotation  As un rotated factor don’t provide the required information that provide adequate interpretation of data. Thus we need the rotational method to achieve simpler factor solutions.  Factor rotation improves interpretation of data by reducing ambiguities  Rotation means that reference axes of factors are turned about the origin until some position has been achieved.  Un rotated factor solution extract factors in order of their variances extracted (i.e first factors that accounts for the largest variance and then so on)  Ultimate effect of rotation is to redistribute the variance from earlier factors to later ones 30
  • 31.  Two methods of factor rotation includes  Orthogonal factor rotation  Axes are maintained at 90 degree.  Mostly used as almost all software include it  More suitable when research goal is data reduction  Oblique factor rotation  Axes are rotated but they don’t retain the 90 degree angle between reference axes.  Oblique is more flexible  Best suited to the goal of obtaining several theoretically meaningful factors In factor matrix column represent factor with each row corresponding to variable loading across factor 31
  • 32.  Major orthogonal factor rotation approaches include:  Quartimax  Goal is to simplify the rows of factor matrix i.e. it focus on rotating the intial factor so that variable loads high on one factor and as low as possible on other factors.  Varimax  Goal is to simplify the columns of factor matrix. It maximizes the sum of variance of required loading of factor matrix  With this some high loadings (close to +1 or -1 are likely as are some loadings near zero.  Equimax  Compromise between quartimax and varimax. It hasn’t gain wider acceptance 32
  • 33.  Oblique rotation  SPSS provide OBLIMIN for factor rotation.  Selecting among variables  No specific rule for that  Mostly programs have varimax  Judging the significance of factor loading  Ensuring practical significance  Making preliminary examination of factor matrix in terms of factor loading.  Factor loading is correlation of variable and factor, the squared loading is amount of variable’s total variance accounted for by the factor.  .50 loading denotes that 25% of variance is accounted for by the factor.  Loading must exceed .70 for factor to account for .50 variance  Loadings .5 are considered practically significant and .7 are indicative of well defined structure 33
  • 34. 34
  • 35.  Assessing statistical significance  Concept of statistical power is used to determine factor loadings significant for various samples.  In sample of 100 factor loading of .55 and above are significant  In sample of 50 factor loading of .75 is significant  For sample of 350 factor loading of .3 is significant 35
  • 36. Interpreting factor matrix  It’s a 5 step process  Step 1: Evaluate factor matrix of loadings  Factor loading matrix contains the factor loading of each variable on each factor.  If oblique rotation is used it provides 2 matrices  Factor pattern matrix: loadings that show unique contribution of each variable to factor  Factor structure matrix: Simple correlation between variables and factor but loading contain both unique variance and correlation among factors  Identify significant loadings for each variable  Interpretation start with the first variable on first factor then move horizontally. When highest loading for factor is identified underline it and then move to 2nd variable  Cross loadings  Use different rotation methods firstly to remove cross loading  Or delete variable 36
  • 37.  Assess communalities of variables  Once significant loadings identified, look for variables that are not adequately accounted for by factor analysis  Identify variable lacking at least 1 significant loading  Examine each variable communality i.e amount of variance accounted for by factor solution for each variable  Identify all variables with communalities less than .5 as not having sufficient explanation 37
  • 38.  Re specify the factor models if needed  Done when researcher finds that a variable has no significant loadings or with a significant loading a variable’s communality is low. Following remedies are considered there  Ignore those problametic variables and interpret solutions if objective is solely data reduction.  Evaluate each of variables for possible deletion.  Employ an alternative rotation method.  Decrease/increase number of factors retained  Modify type of factor model used (common versus component) 38
  • 39.  Label the factors  Researcher will assign a name or label to factors that accurately reflect variables loading on that factor.  Step 6: validation of factor analysis  Assessing the degree of generalizability of results to population and potential influences of cases on the overall result  i. Use of confirmatory practice  The most direct method of validating the results  Require separate software called as LISREL 39
  • 40.  Assessing factor structure stability  Factor stability is dependant on sample size and on number of cases per variable.  Researcher may split sample into two subsets and estimate factor model for each subset.  Comparison of 2 resulting factor matrices will provide assessment of robutness of solution across sample 40
  • 42. 42
  • 43. 43
  • 44. 44
  • 45. 45
  • 46. 46

Editor's Notes

  1. n