Individual and pair representativeness of sampling points selection in interpolation tasks of the heavy metals distribution in the topsoil

№3 (2023)

УДК 504.064.2.001.18
https://doi.org/10.47148/1609-364X-2023-3-63-70

Baglaeva E.M., Sergeev A.P., Shichkin A.V., Buevich A.G., Butorova A.S.

AbstractAbout the AuthorsReferences
The article is devoted to the problem of choosing a representative selection of a subset for an artificial neural network in the tasks of interpolation of the distribution of metals in the topsoil. Environmental data, often used to build artificial neural network models, are datasets at irregular points. The traditional division of the input data into training and test subsets occurs randomly, which transfers to a number of problems. For selection in the training subset, the question of individual and collective representativeness of points is asked, sending them a request for data on the content of the element in the soil in a given area. The most representative in terms of individual representativeness arise with the maximum reference points, their presence in the training subset of the ANN measurement of error and an increase in the correlation between the results of model calculations and natural measurements on the test subset. When assessing the pairwise representativeness of the identified synergy effects, which, when included, achieve high model reliability) and anti-synergy (the parameters of using less information to describe the content of the elements than separately the points of view included in the pair). The various sampling locations have different information and unequal meaning for feature interpolation.
Elena M. Baglaeva
Сandidate of Physical and Mathematical Sciences,
Senior Researcher
Institute of Industrial Ecology UB RAS
20, S. Kovalevskoy str., Ekaterinburg, 620990, Russia
e-mail: e.m.baglaeva@urfu.ru

Aleksandr P. Sergeev
Candidate of Physical and Mathematical Sciences
Leading Researcher, Acting Head of the Laboratory
Institute of Industrial Ecology UB RAS
20, S. Kovalevskoy str., Ekaterinburg, 620990, Russia
e-mail: sergeev@ecko.uran.ru

Andrey V. Shichkin
Researcher
Institute of Industrial Ecology UB RAS
20, S. Kovalevskoy str., Ekaterinburg, 620990, Russia
e-mail: and@ecko.uran.ru

Alexander G. Buevich
Researcher
Institute of Industrial Ecology, UB RAS
20, S. Kovalevskoy str., Ekaterinburg, 620990, Russia
e-mail: bag@ecko.uran.ru

Anastasia S. Butorova
Research Engineer
Institute of Industrial Ecology UB RAS
20, S. Kovalevskoy str., Ekaterinburg, 620990, Russia
1st year Postgraduate Student
Institute of Radio Electronics and Information Technologies —
RtF of the Ural Federal University named after B.N. Yeltsin
19, Mira str., Ekaterinburg, 620002, Russia
e-mail: a.s.butorova@urfu.ru

1. Buslaeva O.V., Korolev V.A. Indeterminacies in the environmental-geological systems and their systematization. Engineering Geology World. 2013;(6):56–62.
2. GOST 17.4.3.01-2017. Mezhgosudarstvennyi standart. Okhrana prirody. Pochvy. Obshchie trebovaniya k otboru prob [Interstate standard. Protection of nature. Soils. General sampling requirements]. Moscow: Standartinform; 2018. 8 p.
3. Kurguzov K.V., Fomenko I.K., Sirotkina O.N. Probabilistic and statistical approaches to uncertainty assessment in lithotechnogenic systems. Geoekologiya. Inzheneraya geologiya, gidrogeologiya, geokriologiya. 2020;(2):80–89. DOI: 10.31857/S0869780920020071.
4. Mokrushin A.A., Tarasov D.A., Sergeev A.P., Buevich A.G., Baglaeva E.M. Selection of type and structure of artificial neural networks for estimation of chemical elements distribution in topsoil. Ecological Systems and Devices. 2017;(8):36–48.
5. RD 52.18.156-93. Metodicheskie ukazaniya. Okhrana prirody. Pochvy. Metody otbora predstavitel’nykh prob pochvy i otsenka zagryazneniya sel’skokhozyaistvennogo ugod’ya ostatochnymi kolichestvami pestitsidov [Methodical instructions. Protection of nature. Soils. Methods for taking representative soil samples and assessing contamination of agricultural land with pesticide residues]. Available at: https://docs.cntd.ru/document/1200041909 (accessed 11.09.2023).
6. Radomskaya V.I., Borodina N.A. Assessment of anthropogenic contamination in an urban territory by the example of Blagoveshchensk city. Geoekologiya. Inzheneraya geologiya, gidrogeologiya, geokriologiya. 2019;(6):79–93. DOI: 10.31857/S0869-78092019679-93.
7. SanPin 1.2.3685-21 Gigienicheskie normativy i trebovaniya k obespecheniyu bezopasnosti i (ili) bezvrednosti dlya cheloveka faktorov sredy obitaniya [Hygienic standards and requirements for ensuring the safety and (or) harmlessness of environmental factors for humans]. Available at: http://publication.pravo.gov.ru/document/0001202102030022 (accessed 29.06.2023).
8. Taseyko O.V., Sugak E.V. Representativeness of urban station for air quality monitoring. Modern problems of science and education. 2014;(6). Available at: https://science-education.ru/ru/article/view?id=15560 (accessed 29.06.2023).
9. Shichkin A.V., Buevich A.G., Sergeev A.P., Baglaeva E.M., Subbotina I.E. Forecasting the content of abnormally distributed chrome in soil by hybrid models based on artificial neural networks. Geoekologiya. Inzheneraya geologiya, gidrogeologiya, geokriologiya. 2018;(3):86–96. DOI: 10.7868/S0869780318030109.
10. Baglaeva E.M., Sergeev A.P., Shichkin A.V., Buevich A.G. The Effect of Splitting of Raw Data into Training and Test Subsets on the Accuracy of Predicting Spatial Distribution by a Multilayer Perceptron. Mathematical Geosciences. 2020;52:111–121. DOI: 10.1007/s11004-019-09813-9.
11. Demyanov V., Gloaguen E., Kanevski M. A special issue on data science for geosciences. Mathematical Geosciences. 2020;52:1–3. DOI: 10.1007/s11004-019-09846-0.
12. Fernandez Jaramillo J. M., Mayerle R. Sample selection via angular distance in the space of the arguments of an artificial neural network. Computers and Geosciences. 2018;114:98–106. DOI: 10.1016/j.cageo.2018.02.003.
13. Kramm T., Hoffmeister D. Assessing the influence of environmental factors and datasets on soil type prediction with two machine learning algorithms in a heterogeneous area in the Rur catchment, Germany. Geoderma Regional. 2020;22:e00316. DOI: 10.1016/j.geodrs.2020.e00316.
14. Malof J.M., Reichman D., Collins L.M. How do we choose the best model? The impact of cross-validation design on model evaluation for buried threat detection in ground penetrating radar. Proceedings Volume 10628, Detection and sensing of mines, explosive objects, and obscured targets XXIII. 2018;10628:106280C. DOI: 10.1117/12.2305793.
15. Nath A., Subbiah K. The role of pertinently diversified and balanced training as well as testing data sets in achieving the true performance of classifiers in predicting the antifreeze proteins. Neurocomputing. 2018;272:294–305. DOI: 10.1016/j.neucom.2017.07.004.
16. Timofeeva Y.O., Kosheleva Y., Semal V., Burdukovskii M. Origin, baseline contents, and vertical distribution of selected trace lithophile elements in soils from nature reserves, Russian Far East. Journal of Soils and Sediments. 2018;18(3):968–982. DOI: 10.1007/s11368-017-1847-5.
17. Wang X., An Sh., Xu Y., Hou H., Chen F., Yang Y., Zhang Sh., Liu R. A back propagation neural network model optimized by mind evolutionary algorithm for estimating Cd, Cr, and Pb concentrations in soils using Vis-NIR diffuse reflectance spectroscopy. Applied Sciences. 2020;10(1):51. DOI: 10.3390/app10010051.
18. Zhong L., Guo X., Xu Zh., Ding M. Soil properties: Their prediction and feature extraction from the LUCAS spectral library using deep convolutional neural networks. Geoderma. 2021;402:115366. DOI: 10.1016/j.geoderma.2021.115366.

Key words: representativeness of points, sampling, heavy metals, artificial neural networks, choice of training subset

Section: Geoecology