Preview

Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki

Advanced search

On the solvability of the optimization problem for constructing quite interpretable linear regressions

https://doi.org/10.26907/2541-7746.2025.4.627-640

Abstract

This article is devoted to improving the technique for constructing interpretable regression models in which parameters are estimated using the ordinary least squares method. A definition of quite interpretable linear regressions is provided. The main requirements for such regressions include the consistency between the signs of the parameter estimates and the substantive meanings of the variables, the significance of the estimates, the low degree of multicollinearity, and the high quality of approximation. Whether a model belongs to the class of quite interpretable regressions or not depends on its significance level. In terms of mixed 0-1 integer linear programming, which has made progress in recent years due to computational advances, an optimization problem is formulated for constructing quite interpretable linear regressions with a fairly large number of linear constraints. The problem is proved to be solvable under certain conditions. The proposed mathematical framework can be successfully applied to processing big data, as the number of constraints in the formulated problem does not depend on the sample size, unlike existing foreign analogues. 

About the Author

M. P. Bazilevskiy
Irkutsk State Transport University
Russian Federation

Mikhail P. Bazilevskiy, Cand. Sci. (Engineering), Associate Professor, Department of Mathematics



References

1. Molnar C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Lulu.com, 2020. 320 p.

2. Doshi-Velez F., Kim B. Towards a rigorous science of interpretable machine learning. Ver. 2. arXiv Preprint 1702.08608. 2017. https://doi.org/10.48550/arXiv.1702.08608.

3. Gorbach A.N., Tseitlin N.A. Pokupatel’skoe povedenie: analiz spontannykh posledovatel’nykh i regressionnykh modelei v marketingovykh issledovaniyakh [Buying Behavior: Analysis of Spontaneous Sequences and Regression Models in Marketing Research]. Kyiv, Osvita Ukrainy, 2011, 220 p. (In Russian)

4. Aivazyan S.A., Mkhitaryan V.S. Prikladnaya statistika i osnovy ekonometriki [Applied Statistics and Basics of Econometrics]. Moscow, Yuniti, 1998, 1005 p. (In Russian)

5. Konno H., Yamamoto R. Choosing the best set of variables in regression analysis using integer programming. J. Global Optim., 2009, vol. 44, no. 2, pp. 273–282. https://doi.org/10.1007/s10898-008-9323-9.

6. Miyashiro R., Takano Y. Mixed integer second-order cone programming formulations for variable selection in linear regression. Eur. J. Oper. Res., 2015, vol. 247, no. 3, pp. 721–731. https://doi.org/10.1016/j.ejor.2015.06.081.

7. Miyashiro R., Takano Y. Subset selection by Mallows’ 𝐶𝑝 : A mixed integer programming approach. Expert Syst. Appl., 2015, vol. 42, no. 1, pp. 325–331. https://doi.org/10.1016/j.eswa.2014.07.056.

8. Tamura R., Kobayashi K., Takano Y., Miyashiro R., Nakata K., Matsui T. Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor. J. Global Optim., 2019, vol. 73, no. 2, pp. 431–446. https://doi.org/10.1007/s10898-018-0713-3.

9. Tamura R., Kobayashi K., Takano Y., Miyashiro R., Nakata K., Matsui T. Best subset selection for eliminating multicollinearity. J. Oper. Res. Soc. Jpn., 2017, vol. 60, no. 3, pp. 321–336. https://doi.org/10.15807/jorsj.60.321.

10. Bertsimas D., King A., Mazumder R. Best subset selection via a modern optimization lens. Ann. Stat., 2016, vol. 44, no. 2, pp. 813–852. https://doi.org/10.1214/15-AOS1388.

11. Bertsimas D., King A. OR forum—an algorithmic approach to linear regression. Oper. Res., 2016, vol. 64, no. 1, pp. 2–16. https://doi.org/10.1287/opre.2015.1436.

12. Bazilevskiy M.P. Reduction the problem of selecting informative regressors when estimating a linear regression model by the method of least squares to the problem of partial-Boolean linear programming. Model., Optim. Inf. Technol., 2018, vol. 6, no. 1 (20), pp. 108–117. (In Russian)

13. Koch T., Berthold T., Pedersen J., Vanaret C. Progress in mathematical programming solvers from 2001 to 2020. EURO J. Comput. Optim., 2022, vol. 10, art. 100031. https://doi.org/10.1016/j.ejco.2022.100031.

14. Bazilevskiy M.P. Subset selection in regression models with considering multicollinearity as a task of mixed 0-1 integer linear programming. Model., Optim. Inf. Technol., 2018, vol. 6, no. 2 (21), pp. 104–118. (In Russian)

15. Bazilevskiy M.P. Selection of informative regressors significant by Student’s t-test in regression models estimated using OLS as a partial Boolean linear programming problem. Proc. Voronezh State Univ. Series: Syst. Anal. Inf. Technol., 2021, no. 3, pp. 5–16. https://doi.org/10.17308/sait.2021.3/3731. (In Russian)

16. Bazilevskiy M.P. Optimization problems of subset selection in linear regression with control of its significance using F-test. Izv. Samara Sci. Cent. Russ. Acad. Sci., 2024, vol. 26, no. 6, pp. 200–207. (In Russian)

17. Chung S., Park Y.W., Cheong T. A mathematical programming approach for integrated multiple linear regression subset selection and validation. Pattern Recognit., 2020, vol. 108, art. 107565. https://doi.org/10.1016/j.patcog.2020.107565.

18. Bertsimas D., Li M.L. Scalable holistic linear regression. Oper. Res. Lett., 2020, vol. 48, no. 3, pp. 203–208. https://doi.org/10.1016/j.orl.2020.02.008.

19. Foerster E., Renz B. Metody korrelyatsionnogo i regressionnogo analiza [Methods of Correlation and Regression Analysis]. Мoscow, Finans. Stat., 1983. 303 p. (In Russian)

20. Lebedeva A.V., Ryabov V.M. On the numerical solution of system of linear algebraic equations with ill-conditioned matrices. Vestn. St. Petersb. Univ. Math. Mech. Astron., 2019, vol. 6, no. 4, pp. 619–626. https://doi.org/10.21638/11701/spbu01.2019.407. (In Russian)


Review

For citations:


Bazilevskiy M.P. On the solvability of the optimization problem for constructing quite interpretable linear regressions. Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki. 2025;167(4):627-640. (In Russ.) https://doi.org/10.26907/2541-7746.2025.4.627-640

Views: 55


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2541-7746 (Print)
ISSN 2500-2198 (Online)