Overcoming Heywood cases in factor analysis: a maximum penalized likelihood approach

Philipp Sterzinger

Department of Statistics, University of Warwick

philipp.sterzinger@warwick.ac.uk
warwick.ac.uk/fac/sci/statistics/staff/research_students/sterzinger   psterzinger


IMPS 2024
Prague

16 July 2024

Joint with




Irini Moustaki

Ioannis Kosmidis

Outline

Normal-linear factor model

Normal-linear factor model

Data

\(x_1,x_2,\ldots,x_n \in \Re^p\)

Model

\[X_i = \mu + \Lambda Y_i + E_i\]

Unobservables

Latent factors: \(Y_i \sim \mathrm{N}(0_q, I_q)\)

Unique errors: \(E_i \sim \mathrm{N}(0_p,\Psi)\)

Parameters

Conditional mean: \(\mu \in \Re^p\)

Factor loadings: \(\Lambda \in \Re^{p \times q}\)

Specific variances \(\Psi\): \(p \times p\) diagonal, postive definite matrix

Normal-linear factor model

Log-likelihood

\(X_i \sim \mathrm{N}(\mu, \Sigma)\), for \(\Sigma = \Lambda \Lambda^\top + \Psi\) so that

\[ \ell(\Sigma; S) = - \frac{n}{2} \left \{ \mathrm{det}(\Sigma) + \mathrm{tr}\left(\Sigma^{-1}S \right) \right\} + C \,, \] where \(S = 1/n \sum_{i = 1}^n (x_i - \bar{x})(x_i - \bar{x})^\top\), \(\bar{x} = 1/n \sum_{i = 1}^n x_i\)

Maximum likelihood (ML) estimator

\(\hat{\Lambda}, \hat{\Psi} = \arg \max \,\ell(\Lambda \Lambda^\top + \Psi; S)\)

Motivating Example

Job application data

Data from Hirose et al. (2011) which scores job applicants (n = 48) according to a range of criteria (p = 15)


Form Appearance Ability Likeability Selfconf. Lucidity Honesty Salesmanship Experience Drive Ambition Grasp Potential Keenness Suitability
6 7 2 5 8 7 8 8 3 8 9 7 5 7 10
9 10 5 8 10 9 9 10 5 9 9 8 8 8 10
7 8 3 6 9 8 9 7 4 9 9 8 6 8 10
5 6 8 5 6 5 9 2 8 4 5 8 7 6 5
6 8 8 8 4 5 9 2 8 5 5 8 8 7 7
7 7 7 6 8 7 10 5 9 6 5 8 6 6 6
9 9 8 8 8 8 8 8 10 8 10 8 9 8 10
9 9 9 8 9 9 8 8 10 9 10 9 9 9 10
9 9 7 8 8 8 8 5 9 8 9 8 8 8 10
4 7 10 2 10 10 7 10 3 10 10 10 9 3 10
4 7 10 0 10 8 3 9 5 9 10 8 10 2 5
4 7 10 4 10 10 7 8 2 8 8 10 10 3 7
6 9 8 10 5 4 9 4 4 4 5 4 7 6 8
8 9 8 9 6 3 8 2 5 2 6 6 7 5 6
4 8 8 7 5 4 10 2 7 5 3 6 6 4 6
6 9 3 7 8 9 8 9 8 8 7 6 8 6 10
8 7 7 7 9 5 8 6 6 7 8 6 6 7 8
6 8 8 4 8 8 6 4 3 3 6 7 2 6 4
6 7 8 4 7 8 5 4 4 2 6 8 3 5 4
4 8 7 8 8 9 10 5 2 6 7 9 8 8 9
3 8 6 8 8 8 10 5 3 6 7 8 8 5 8
9 8 7 8 9 10 10 10 3 10 8 10 8 10 8
7 10 7 9 9 9 10 10 3 9 9 10 9 10 8
9 8 7 10 8 10 10 10 2 9 7 9 9 10 8
6 9 7 7 4 5 9 3 2 4 4 4 4 5 4
7 8 7 8 5 4 8 2 3 4 5 6 5 5 6
2 10 7 9 8 9 10 5 3 5 6 7 6 4 5
6 3 5 3 5 3 5 0 0 3 3 0 0 5 0
4 3 4 3 3 0 0 0 0 4 4 0 0 5 0
4 6 5 6 9 4 10 3 1 3 3 2 2 7 3
5 5 4 7 8 4 10 3 2 5 5 3 4 8 3
3 3 5 7 7 9 10 3 2 5 3 7 5 5 2
2 3 5 7 7 9 10 3 2 2 3 6 4 5 2
3 4 6 4 3 3 8 1 1 3 3 3 2 5 2
6 7 4 3 3 0 9 0 1 0 2 3 1 5 3
9 8 5 5 6 6 8 2 2 2 4 5 6 6 3
4 9 6 4 10 8 8 9 1 3 9 7 5 3 2
4 9 6 6 9 9 7 9 1 2 10 8 5 5 2
10 6 9 10 9 10 10 10 10 10 8 10 10 10 10
10 6 9 10 9 10 10 10 10 10 10 10 10 10 10
10 7 8 0 2 1 2 0 10 2 0 3 0 0 10
10 3 8 0 1 1 0 0 10 0 0 0 0 0 10
3 4 9 8 2 4 5 3 6 2 1 3 3 3 8
7 7 7 6 9 8 8 6 8 8 10 8 8 6 5
9 6 10 9 7 7 10 2 1 5 5 7 8 4 5
9 8 10 10 7 9 110 3 1 5 7 9 9 4 4
0 7 10 3 5 0 10 0 0 2 2 0 0 0 0
0 6 10 1 5 0 10 0 0 2 2 0 0 0 0


Want to use exploratory factor analysis to find common factors underlying the data

Job application data

Fitting a normal-linear factor model with 4 factors in R using the psych package yields

fa_fit <- fa(job_data, nfactors = 4)
 
Warning: simpleWarning in fac(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate, : An ultra-Heywood case was detected. Examine the results carefully


Estimated specific variances

fa_fit$uniquenesses
 
FormAppearanceAbilityLikeabilitySelfconf.
0.509321362 0.706583340 0.576785093 0.297299960 0.133265973
LucidityHonestySalesmanshipExperienceDrive
0.197023150 0.640299618 0.096908170 0.250289966 0.248683074
AmbitionGraspPotentialKeennessSuitability
0.157873233 0.137698609 0.095711717 -0.004485347 0.207264486


The ML estimate does not exist (Heywood (1931) case)

Maximum softly-penalized likelihood

Maximum penalized likelihood

Add penalty function \(P(\Psi, \Lambda)\) that diverges to \(-\infty\) whenever \(\Psi \to 0\), or \(\Psi,\Lambda \to \infty\)

Penalized log-likelihood

\[\ell^*(\Lambda, \Psi; S) = \ell(\Lambda \Lambda^\top + \Psi; S) + P(\Lambda, \Psi)\]

Maximum penalized likelihood (MPL) estimator

\[\bar{\Lambda}, \bar{\Psi} = \arg \max \, \ell^*(\Lambda, \Psi; S) \tag{1}\]

Existence

If \(P(\cdot, \cdot)\) is continuous and bounded from above, then the MPL estimates of Equation 1 exist

Soft penalisation

Distort MPL estimates away from ML estimates through penalisation

Scale penalty by sequence \(c_n\) to make \(c_n P(\cdot, \cdot)\) asymtpotically negligible

Maximum softly-penalized likelihood (MSPL) estimator

\[\bar{\Lambda}, \bar{\Psi} = \arg \max \, \ell(\Lambda \Lambda^\top + \Psi; S) + c_n P(\Lambda, \Psi)\]

Soft penalisation

Choose \(c_n\) to preserve ML asymptotics whilst guaranteeing existence of estimates

Consistency

\(c_n = o_p(n)\)

\[(\bar{\Lambda}, \bar{\Psi}) \overset{p}{\to}( \Lambda, \Psi) \]

Asymptotic normality

\(c_n = o_p(n^{1/2})\)

\[\sqrt{n} \left( \begin{bmatrix} \bar{\Lambda} \\ \bar{\Psi} \end{bmatrix} - \begin{bmatrix} \hat{\Lambda} \\ \hat{\Psi} \end{bmatrix} \right) = o_p(1) \]

Candidate penalties

Huber loss

\[ P(\Psi) =- \sum_{j =1}^p g(\Psi_{ii}), \quad g(x) = \begin{cases} \frac{1}{2} \log(x)^2 & \textrm{if } |\log(x)| < 1, \\ |\log(x)| - \frac{1}{2} & \textrm{else} \end{cases} \]

Akaike (1987)

\[ P(\Lambda, \Psi) = -\frac{\rho n}{2} \mathrm{tr}\left(\Psi^{-1/2} \Lambda \Lambda^\top \Psi^{-1/2}\right) \]

Hirose et al. (2011)

\[ P(\Psi) = -\frac{\rho n}{2} \mathrm{tr}\left(\Psi^{-1/2} S \Psi^{-1/2}\right) \]

Simulation

Simulation setup

Based on Cooperman & Waller (2022)

  • \(q = 3\)
  • \(n \in \{50, 100, 400\}\)
  • item-to-factor ratio \(3:1, 5:1, 8:1\)
  • Two loading matrix designs

Loading Matrix A

Items Loading 1 Loading 2 Loading 3
1 0.80 0 0
2 0.65 0 0
3 0.45 0 0
4 0 0.80 0
5 0 0.65 0
6 0 0.45 0
7 0 0 0.80
8 0 0 0.65
9 0 0 0.45

Loading Matrix B

Items Loading 1 Loading 2 Loading 3
1 0.80 0 0
2 0.80 0 0
3 0.80 0 0
4 0 0.80 0
5 0 0.80 0
6 0 0.80 0
7 0 0 0.30
8 0 0 0.30
9 0 0 0.30

Huber loss penalty with \(c_n = n^{-1/2}\) on \(\Psi\) (Huber[\(\Psi\)]) and \(\Psi,\Lambda \Lambda^\top\) (Huber[\(\Psi, \Lambda \Lambda'\)])

Akaike (1987), Hirose et al. (2011) penalties with \(\rho \in \{1, n^{-3/2} \}\)

% Used

Bias

Probabiliy of underestimation

Root mean-squared error

Model selection

Take away

Maximum softly-penalized likelihood framework is a robust alternative to ML estimation in factor analysis with desirable theoretical properties and strong finite-sample performance

  • Penalisation guarantees existence of estimates

  • Soft scaling preserves ML asymptotics

  • Strong finite-sample & model selection performance

  • Simple Huber loss penalty favourable to penalties of Hirose et al. (2011) and Akaike (1987)

Contact

Philipp Sterzinger

philipp.sterzinger@warwick.ac.uk
warwick.ac.uk/fac/sci/statistics/staff/research_students/sterzinger
psterzinger

References

Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.
Cooperman, A. W., & Waller, N. G. (2022). Heywood you go away! Examinig causes, effects, and treatments for Heywood cases in exploratory factor analysis. Psychological Methods, 27, 156–176.
Heywood, H. (1931). On finite sequences of real numbers. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character, 134(824), 486–501.
Hirose, K., Kawano, S., Konishi, S., & Ichikawa, M. (2011). Bayesian information criterion and selection of the number of factors in factor analysis models. Journal of Data Science, 9(2), 243–259.