The European Journal of Personality promotes the development of all areas of current empirical and theoretical personality psychology. Welcome to the EJP Blog, the landing page for news related to the European Journal of Personality.

How (Un)Common Are Personality Trait-by-Trait Interactions?

A post by Colin E. Vize, Brinkley M. Sharpe, Joshua D. Miller, Donald R. Lynam, and Christopher J. Soto

We know many things about personality traits. Neuroticism predicts multiple forms of psychopathology. Conscientiousness is related to academic achievement. Agreeableness and Conscientiousness are negatively related to antisocial behaviour and aggression. But these are main effects; just simple correlations of personality traits with outcome variables. Many personality theorists have been interested in moving beyond such simple effects to talk about how traits might interact with each other to produce outcomes. Does one’s level of impulse control (i.e., Conscientiousness) influence the relation between Agreeableness and antisocial behaviour? Perhaps one’s level of Agreeableness is especially important for predicting antisocial behaviour when one has poorer impulse control. Tests of such interactive effects abound.

Though these types of effects may be theoretically interesting, there are substantial methodological difficulties in being able to reliably detect these trait-by-trait interactions. These difficulties include very small interaction effect sizes[1], the increased measurement error inherent in the product terms that are used to test for interactions[2], and the suboptimal multivariate distributions in most personality research[3], to name a few. All these factors combine to negatively impact our ability to detect significant interactions. These concerns with interactions have been well-documented in the literature, dating back almost 40 years (e.g., Busemeyer & Jones, 1983). We (Vize et al., 2022) have shown previously that, under realistic conditions, sample sizes of 1,300 or greater are required to achieve adequate power to detect interaction effects between personality traits related to psychopathy. However, there is limited work researchers can point to regarding the overall frequency and replicability of trait-by-trait interactions in personality research.

In our recently accepted registered report at the EJP, my colleagues and I sought to examine the base-rates of replicable trait-by-trait interactions using data from the Life Outcomes of Personality Replication project (LOOPR; Soto, 2019). In our registered report, we wanted to examine 75 life outcomes (e.g., life satisfaction, violent behaviour, occupational interests) included in the LOOPR dataset and test all possible two-way interactions between the Big Five domains for each outcome. To do this, we randomly split the LOOPR data into two partitions (n ≥ 1,350 per outcome) to examine how many interactions showed evidence of a strong replication effect (i.e., both interactions were significant, similar in magnitude, and were of the same sign). This made for a grand total of 750 two-way interaction tests (75 outcomes, 10 two-way interactions across the five domains of the Big Five) in each of the two partitions. Statistical power was 80% or higher to detect our smallest effect of interest (a change in R2 of .01) across the 75 outcomes. We also examined how much of an impact including demographic covariates in our models had on results, as well as different types of regression models (e.g., Tobit and ordinal regression models). Across the 750 tests, only 40 trait-by-trait interactions (5.33% of the original 750 tests) showed evidence of strong replicability based on our preregistered criteria. Though the results were in line with our preregistered hypotheses that robust trait-by-trait interactions would be rare, this is a sobering finding.  

There are three primary takeaways from these results. First, replicable trait-by-trait interactions are rare. Approximately 5% of our interaction tests demonstrated evidence of replicability. Importantly, if we had not included any covariates, model robustness checks, tests of replicability, or correction for family-wise error rate, there were a total of 132 significant interaction effects present in at least one of the partitions. This highlights how easily a false-positive trait-by-trait interaction could make its way into the published literature without critical safeguards to help ensure the result is not a false-positive. This, in turn, suggests that published trait-by-trait interactions should be taken with a grain of salt until we can carefully evaluate the replicability of these effects. Second, trait-by-trait interactions are consistently small. The figure below, taken from our EJP paper, shows that the majority of trait-by-trait interactions add almost no utility beyond simple main effects. Of the interactions that did show evidence of replicability, the increment in variance explained in the outcomes was modest (a median ΔR2 of .01), and the very largest effect was still small (ΔR2 of .02).

Third, the results clearly suggest that researchers should be sceptical in the search for trait-by-trait interactions using multiple regression even when power to detect small effects is high. Given the sample sizes required for adequate power, researchers should be more careful in the search for trait-by-trait interactions in general.

Taken together, we believe it is worth considering whether testing for trait-by-trait interactions should be continued. The effect sizes are orders of magnitude smaller than simple main effects of personality traits. The sample sizes required are far larger than those typically employed in personality research. To our minds, the typical theoretical justifications for many tests of trait-by-trait interactions are not compelling; most statements about interactions are often as consistent with two main effects as they are interactions (i.e., individuals who are low in both Agreeableness and Conscientiousness will be especially antisocial). Nonetheless, there may be research contexts where a strong theoretical rationale can be offered for testing a trait-by-trait interaction. In such cases, we recommend taking important steps to ensure that interaction tests are as informative as possible. Researchers should make use of simulations to conduct informed power analyses for trait-by-trait interactions. By “informed,” we mean analyses that use feasible estimates of effect size. Interaction effect sizes are expected to be very, very small—on the order of a 1% increment in the variance accounted for. Once detected, researchers should move towards more informative visualizations of interaction patterns that can highlight relations in the data that are not apparent from tables or descriptions of results (see McCabe et al., 2018 for examples). They should provide a strong rationale for why interaction effects can provide incremental utility beyond additive effects of personality traits. Finally, researchers should preregister their hypotheses about which variables should interact to predict which outcomes and the exact nature of those interactions. Ultimately, we hope that the results of our study can help inform future expectations about tests of trait-by-trait interactions and motivate more rigorous tests of these rare personality effects.  


[1]In nearly all empirical investigations of interaction effects, the effect sizes are orders of magnitude smaller than main effects. For example, in their review of 30 years of management research, Aguinis and colleagues (2005) found that the median effect size for interaction terms across 261 studies was a Cohen’s f2  of .002.

[2]Though factors like the correlation between variables can impact the reliability of the product term, the reliability of the product term is generally equal to the product of the component reliabilities. So even if a researcher has two reliable variables (e.g., internal consistency values of .80), the product term will have suboptimal reliability of .64.

[3]In their seminal article, McClelland & Judd (1993) highlight that while experimentalists can use optimal research designs that increase the ability to detect interactions, non-experimental field studies must contend with a lack of “extreme” scores which impacts the ability to detect interactions. 

References

Aguinis, H., Beaty, J. C., Boik, R. J., & Pierce, C. A. (2005). Effect size and power in assessing moderating effects of categorical variables using multiple regression: A 30-year review. Journal of Applied Psychology, 90, 94–107.

Busemeyer, J. R., & Jones, L. E. (1983). Analysis of multiplicative combination rules when the causal variables are measured with error. Psychological Bulletin, 93(3), 549–562. https://doi.org/10.1037//0033-2909.93.3.549

McCabe, C. J., Kim, D. S., & King, K. M. (2018). Improving present practices in the visual display of interactions. Advances in Methods and Practices in Psychological Science, 1(2), 147-165.

McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114(2), 376–390. https://doi.org/10.1037/0033-2909.114.2.376

Soto, C. J. (2019). How replicable are links between personality traits and consequential life

outcomes? The Life Outcomes of Personality Replication Project. Psychological Science, 30, 711-727.

Vize, C. E., Baranger, D. A. A., Finsaas, M. C., Goldstein, B. L., Olino, T. M., & Lynam, D. R. (2022). Moderation effects in personality disorder research. Personality Disorders: Theory, Research, and Treatment. Advance online publication. https://doi.org/10.1037/per0000582




A Conversation with Björn Hommel

Are more physically attractive people’s personalities judged more accurately?