Reporting Regression Analysis

Reporting Regression Analyses in Research Manuscripts

In research, regression analysis is arguably one of the most used statistical techniques. When reporting and particularly using regression analysis to examine relationships between variables and to make predictions or to test a hypothesis, proper reporting is essential for transparency and replicability.

Introduction

In research, regression analysis is arguably one of the most used statistical techniques. When reporting and particularly using regression analysis to examine relationships between variables and to make predictions or to test a hypothesis, proper reporting is essential for transparency and replicability. Using the articles found on the Good Publishing Practices for Educational Research website, I will summarise key points on how to report regression analyses when writing research manuscripts. The focus will be on best practices, expectations, and variations by field.

1. Define the Purpose of the Analysis

The first way to enhance transparency is to clearly state the reason for the regression analysis in your research manuscript. A reader must understand the purpose delineated in the statement of the problem, such as exploring the relationships among variables, predicting an outcome, or the subject of a hypothesis test. Lee (2022) rightly states that a clearly articulated purpose provides the reader insight into the type of analysis authors have conducted, such as “To examine the relationship between BMI and systolic blood pressure.” (Lee, 2022)

2. Identify and Summarise Variables

The following is the required detection of dependent (outcome) and independent (predictor) variables in the analysis. In medical and clinical research, it will be important to summarise the attributes of each variable for reproducibility. A description of the variables with summary statistics, as applicable, should be provided, including means, standard deviations (SD), ranges, and medians (Boe et al., 2024). For example, Collins et al. (2024) identify the need for care when reporting this information to ensure the reader can appropriately understand the properties of each variable and be able to interpret the analysis of those variables.
Before the results of regression analysis can be valid, various assumptions must be satisfied, and it would be prudent to report if these assumptions are verified. Lee (2022) would consider these assumptions as action items to perform some statistical validity tests on linear regression models, including verifying linearity with residuals, verifying homoscedasticity of residuals, and verifying the residuals were normally distributed. Fulfilling such assumptions gives evidence that the model fits adequately the actual data and that the findings are trustworthy (Nieminen, 2022). Also, there could be bias that must be addressed if it is present, such as contamination in linear regression models (Goldsmith-Pinkham et al., 2024), which will affect the interpretation of the regression coefficients.
In order to report accurate regression results, treatment of data issues, which could include outliers and missing values, is very important. For example, Zech et al. (2022) stated that it is important to document how the outliers were determined and whether these outliers were excluded or treated with winsorisation, as appropriate. Treatment of missing data would also be important whether missing data was excluded or an imputation procedure was utilised (Whittaker & Schumacker, 2022). Treatment of the data issues will better enable the analysis, ensure that the analysis is as robust as possible, and is important to ensure that one’s readers trust the results generated.

Providing the regression equation is extremely beneficial to provide transparency in the regression analysis so the analysis may be replicated. In the case of simple regression, this would involve stating the equation that identifies the relationship between the dependent variable and the independent variable. For example, the model could be stated as:

Y = β₀ + β₁X₁ + β₂X₂ + … + ε

The regression statement should include all coefficients and predictors when reporting the full equation, especially in more complicated regression models (Whittaker & Schumacker, 2022). Reporting the regression equation allows for transparency and replication by other researchers.

6. Report Details for Multiple Regression

In the multivariate regression protocol, additional information would need to be provided than that above, such as the alpha level for determining variables in the univariate analyses (Boe et al., 2024), the collinearity examination (Nieminen, 2022), whether interaction effects were examined, etc. Therefore, the method used to select the variable selection method should also be provided (e.g., forward steps, backward selection, best subset) (Goldsmith-Pinkham et al., 2024). Providing this information is vital for either analysis reproducibility or understanding how the decisions were made with respect to model selections.

7. Report Regression Coefficients and Statistics

A key reporting context of regression analyses is reporting the regression coefficients (β), for each independent variable. Confidence Intervals (CIs) and p-values should be reported for each coefficient, as these indicate the credibility of the assessment and the significance of the relationship between the variables (Collins et al., 2024). Nieminen (2022) exemplified the importance of reporting standardised regression coefficients when conducting meta-analyses or when combining studies.

8. Evaluate Model Fit and Validation

When discussing how well the model explained variance in the outcome variable, you may summarise model fit statistics (for simple regression use R²; for multiple regression, use adjusted R²; Boe et al., 2024). You would want to summarise a few model validation techniques (e.g., cross-validation; bootstrapping techniques) that provide evidence of the models’ robustness (Whittaker & Schumacker, 2022). Validation is integral when creating a generalisable regression model because it concerns not only that population and sample but also other data using potentially similar sample sizes and populations.

9. Visual Representation

When using simple regression, you could take advantage of graphical displays such as scatter plots to provide a helpful interpretation of the relationship between the dependent variable and the independent variable(s). When using scatter plots, Lee (2022) suggests including the regression line, confidence intervals, and data boundaries. While there is no quantitative evidence that supports the use of graphs to supplement your writing, they may facilitate the reader’s thought process when considering the data and model.

10. Name the Statistical Software Used

It is important you disclose the statistical software or program (e.g., SPSS, R, STATA, Python). This allows any readers to replicate the analysis using the same tools and methods; determining and knowing the tools is an important step in reproducibility (Zech et al., 2022).

Conclusion

Reporting regression analyses in research manuscripts takes careful consideration and attention to detail, as well as following a few recommended conventions or practices. From stating the purpose of the analysis to outlining issues of data handling, model assumptions, reporting coefficients, and more-all these steps are important for ensuring the work can be reproduced and are clearly stated and justifiable. Following our recommendations will help authors shed light on their research credibility and hopefully help garner acceptance in peer-reviewed journals.

Need Professional Help with Reporting Regression Analyses for Your Research Work?
PhD Assistance Research Lab has dedicated services to help you report your regression analyses accurately and transparently in your study.
Contact us today to ensure that your study complies with the best statistical reporting practices to enhance the credibility and reproducibility of your research to successfully publish your work!

References

1. Lee, S. W. (2022). Regression analysis for continuous independent variables in medical research: statistical standard and guideline of Life Cycle Committee. Life cycle, 2.

2. Boe, L., Vingan, P. S., Kim, M., Zhang, K. K., Rochlin, D., Matros, E., … & Nelson, J. A. (2024). Methods in regression analysis in surgical oncology research‐best practice guidelines. Journal of Surgical Oncology, 129(1), 183-193.

3. Collins, G. S., Moons, K. G., Dhiman, P., Riley, R. D., Beam, A. L., Van Calster, B., … & Logullo, P. (2024). TRIPOD+ AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ, 385.

4. Whittaker, T. A., & Schumacker, R. E. (2022). A beginner’s guide to structural equation modeling. Routledge.

5. Zech, A., Hollander, K., Junge, A., Steib, S., Groll, A., Heiner, J., … & Rahlf, A. L. (2022). Sex differences in injury rates in team-sport athletes: a systematic review and meta-regression analysis. Journal of Sport and Health Science, 11(1), 104-114.

6. Andaur Navarro, C. L., Damen, J. A., Takada, T., Nijman, S. W., Dhiman, P., Ma, J., … & Hooft, L. (2022). Completeness of reporting of clinical prediction models developed using supervised machine learning: a systematic review. BMC Medical Research Methodology, 22, 1-13.

7. Nieminen, P. (2022). Application of standardized regression coefficient in meta-analysis. BioMedInformatics, 2(3), 434-458.

8. Goldsmith-Pinkham, P., Hull, P., & Kolesár, M. (2024). Contamination bias in linear regressions. American Economic Review, 114(12), 4015-4051.

9. Barat, M., Jannot, A. S., Dohan, A., & Soyer, P. (2022). How to report and compare quantitative variables in a radiology article. Diagnostic and Interventional Imaging, 103(12), 571-573.

10. Garcia, M., & Thompson, R. (2024). Centralized vs Federated: A Comparative Study of Machine Learning Models in Education. Journal of Artificial Intelligence and Education, 30(3), 200–218.

11. Patel, S., & Garcia, L. (2024). Federated Learning for Collaborative Mental Health Data Sharing. Journal of Education and Data Science, 15(1), 121–140.