Verification of data integration in an integrated system of databases on the properties of inorganic substances and materials
https://doi.org/10.26907/2541-7746.2025.2.367-383
Abstract
Due to the increasing heterogeneity of data models and schemas in the modern world, robust data integration is a high-priority issue. Data integration systems have been extensively deployed across various domains, including astronomy, land use management, and materials science. However, data integration programs can be very complicated. Thus, formal verification of their correctness has emerged as an important task.
In this article, an approach to verify the correctness of data integration in an integrated system of databases on the properties of inorganic substances and materials is considered. The system, developed at the A.A. Baikov Institute of Metallurgy and Materials Science of the Russian Academy of Sciences, employs a two-stage data integration process: during the first stage, the source data marked for deletion, modification, or insertion are converted into an intermediate XML representation; in the second stage, the system invokes the corresponding procedures for XML elements in the target integrated database and updates it accordingly. The data integration programs are implemented by combining an imperative programming language with a declarative language of relational databases. Verification is performed by defining the semantics of the data schemas and data integration programs in a formal specification language and proving the correctness of data integration using automated provers.
About the Author
S. A. StupnikovRussian Federation
Sergey A. Stupnikov, Leading Researcher
Moscow
References
1. Masmoudi M., Ben Abdallah Ben Lamine S., Karray M.H., Archimede B., Zghal H.B. Semantic data integration and querying: A survey and challenges. ACM Comput. Surv., 2024, vol. 56, no. 8, art. 209. https://doi.org/10.1145/3653317.
2. The Binary Star Database. INASAN, 2025. https://bdb.inasan.ru/.
3. Major Basic Research Project. Current scientific tasks in adapting Russia’s land use potential to unprecedented global challenges (economic crisis, climate change, and crisis in global natural resource management). V.V. Dokuchaev Soil Sci. Inst., 2023. https://www.esoil.ru/activities/projects_programs/minobr/knp_2020/. (In Russian)
4. Integrated system of databases on the properties of inorganic substances and materials. A.A. Baikov Inst. Metall. Mater. Sci., Russ. Acad. Sci., 2005. https://imet-db.ru/. (In Russian)
5. White N., Matthews S., Chapman R. Formal verification: Will the seedling ever flower? Philos. Trans. R. Soc. A, 2017, vol. 375, no. 2104, art. 20150402. https://doi.org/10.1098/rsta.2015.0402.
6. Guagliardo P., Libkin L. A formal semantics of SQL queries, its validation, and applications. Proc. VLDB Endowment (PVLDB), 2017, vol. 11, no. 1, pp. 27–39. https://doi.org/10.14778/3151113.3151116.
7. Rahim L.Ab., Whittle J. A survey of approaches for verifying model transformations. Software Syst. Model., 2015, vol. 14, no. 2, pp. 1003–1028. https://doi.org/10.1007/s10270-013-0358-0.
8. Abrial J.-R. The B-Book: Assigning Programs to Meanings. New York, NY, Cambridge Univ. Press, 1996. xxxiv, 779 p. https://doi.org/10.1017/CBO9780511624162.
9. Atelier B: The industrial tool to efficiently deploy the B Method. 2025. http://www.atelierb.eu/.
10. Butler M., K¨orner P., Krings S., Lecomte T., Leuschel M., Mejia L.-F., Voisin L. The first twenty-five years of industrial use of the B-Method. In: ter Beek M.H., Ni˘ckovi´c D. (Eds.) Formal Methods for Industrial Critical Systems (FMICS 2020). Ser.: Lecture Notes in Computer Science. Vol. 12327. Cham, Springer, 2020, pp. 189–209. https://doi.org/10.1007/978-3-030-58298-2_8.
11. Stupnikov S. Semantics and verification of entity resolution and data fusion operations via transformation into a formal notation. In: Kalinichenko L., Kuznetsov S., Manolopoulos Y. (Eds.) Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2016). Ser.: Communications in Computer and Information Science. Vol. 706. Cham, Springer, 2017, pp. 145–162. https://doi.org/10.1007/978-3-319-57135-5_11.
12. Stupnikov S.A. Query-driven verification of data integration in the RDF data model. Lobachevskii. J. Math., 2023, vol. 44, no. 1, pp. 205–218. https://doi.org/10.1134/S1995080223010389.
13. Kiselyova N.N., Dudarev V.A., Zemskov V.S. Computer information resources of inorganic chemistry and materials science. Russ. Chem. Rev., 2010, vol. 79, no. 2, pp. 145–166. https://doi.org/10.1070/RC2010v079n02ABEH004104.
14. Kiselyova N.N., Dudarev V.A., Stolyarenko A.V. Integrated system of databases on the properties of inorganic substances and materials. High Temp., 2016, vol. 54, no. 2, pp. 215–222. https://doi.org/10.1134/S0018151X16020085.
15. Kiselyova N.N., Dudarev V.A., Korzhuyev M.A. Database on the bandgap of inorganic substances and materials. Inorg. Mater.: Appl. Res., 2016, vol. 7, no. 1, pp. 34–39. https://doi.org/10.1134/S2075113316010093.
16. Steinberg D., Budinsky F., Paternostro M., Merks E. EMF: Eclipse Modeling Framework. The Eclipse Ser. Addison-Wesley, 2008. 744 p.
17. ATL – a model transformation technology. 2025. https://eclipse.org/atl/.
Review
For citations:
Stupnikov S.A. Verification of data integration in an integrated system of databases on the properties of inorganic substances and materials. Uchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki. 2025;167(2):367-383. (In Russ.) https://doi.org/10.26907/2541-7746.2025.2.367-383