Comparing Approaches to Improving Representativity of Spectroscopic Data using Variational Autoencoders
Mushchina A. S.1,2, Isaev I. V.1, Sarmanova O. E.1,2, Burikov S. A.1,2, Dolenko T. A.1,2, Dolenko S. A.1
1Lomonosov Moscow State University, Skobeltsyn Institute of Nuclear Physics, Moscow, Russia
2Department of Physics, Lomonosov Moscow State University, Moscow, Russia
Email: anastasemusa@gmail.com
Solving inverse problems in optical spectroscopy of multicomponent solutions to determine component concentrations is a complex task. One effective approach to solving this problem is the use of artificial neural networks. However, one of the major challenges in this approach is the limited representativity of experimental data, due to the complexity and high cost of large-scale physical experiments. In this paper, algorithms for generating additional model data using variational autoencoders are explored and compared to enhance the representativity of the training dataset. The results show that the most promising approach is the use of a standard (unconditioned) variational autoencoder, generating patterns from the uniform distribution in the latent space. Further research should focus on identifying the optimal distribution in the latent space for generating patterns. Keywords: data generation, inverse problem of spectroscopy, multicomponent solutions, artificial neural networks.
- Yu.N. Vodyanitsky, D.V. Ladonin, A.T. Savichev. Zagryaznenie pochv tyazhelymy metallami (Tipographia Rosselkhozakademii, M., 2012 (in Russain))
- G.A. Teplaya. Astrakhan bulletin of ecological education, 1 (23), 182 (2013)
- Y. Fa, Y. Yu, F. Li, F. Du, X. Liang, X. Liu. J. Chromatography A, 1554, 123 (2018). https://doi.org/10.1016/j.chroma.2018.04.017
- N.G. Carpenter, D. Pletcher. Anal. Chim. Acta, 317, 287 (1995). https://doi.org/10.1016/0003-2670(95)00384-3
- C. Pasquini, I.B.S. Cunha. Analyst, 120 (11), 2763 (1995). https://doi.org/10.1039/AN9952002763
- N. Porter, B.T. Hart, R. Morrison, I.C. Hamilton. Anal. Chim. Acta, 308, 313 (1995). https://doi.org/10.1016/0003-2670(94)00330-O
- C. Neuhold, K. Kalcher, W. Diewald, X. Cai, G. Raber. Electroanalysis, 6, 227 (1994). https://doi.org/10.1002/elan.1140060309
- B. Saad, F.W. Pok, A.N.A. Sujari, M.I. Saleh. Food Chem., 61 (1-2), 249 (1998). https://doi.org/10.1016/S0308-8146(97)00024-1
- V.K. Maurya, R.P. Singh, L.B. Prasad. Orient. J. Chem., 34 (1), 100 (2018). http://dx.doi.org/10.13005/ojc/340111
- C. Shorten, T.M. Khoshgoftaar. J. Big. Data, 6, 60 (2019). https://doi.org/10.1186/s40537-019-0197-0
- L. Nanni, G. Maguolo, M. Paci. Ecolog. Inform., 57, 101084 (2020). https://doi.org/10.1016/j.ecoinf.2020.101084
- Q. Wen, L. Sun, F. Yang, X. Song, J. Gao, X. Wang, H. Xu. Comp. Sci. Machine Learning, (2020). https://doi.org/10.48550/arXiv.2002.12478
- I.V. Isaev, S.A. Burikov, T.A. Dolenko, K.A. Laptinskiy, S.A. Dolenko. Improving the resilience of neural network solution of inverse problems in Raman spectroscopy to the distortions caused by frequency shift of the spectral channels (Samara, 2018), p. 2710-2715
- D.P. Kingma, M. Welling. Foundations and Trends in Machine Learning, 12 (4), 307 (2019). http://dx.doi.org/10.1561/2200000056
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Commun. ACM, 63 (11), 139 (2020). https://doi.org/10.1145/3422622
- J. Li, A. Madry, J. Peebles, L. Schmidt. On the Limitations of First-Order Approximation in GAN Dynamics, Proceed. 35th Intern. Conf. Machine Learning, PMLR, 80, 3005-3013 (2018)
- M. Shanker, M.Y. Hu, M.S. Hung. Omega, 24 (4), 385 (1996). https://doi.org/10.1016/0305-0483(96)00010-2
- A. Efitorov, T. Dolenko, K. Laptinskiy, S. Burikov, S. Dolenko. Proceed. Sci., 410, art. 013 (2021). https://doi.org/10.22323/1.410.0013
- A. Efitorov, S. Burikov, T. Dolenko, S. Dolenko. Studies in Computational Intelligence, 1064, 557 (2023). https://doi.org/10.1007/978-3-031-19032-2_56
- S. Kullback, R.A. Leibler. Annals Mathemat. Statistics, 22, 79 (1951)
- K. Sohn, X. Yan, H. Lee. Learning structured output representation using deep conditional generative models Advances in Neural Information Processing Systems, ed. C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett (Curran Associates, Inc. 28, 2015)
- Qing-Song Xu, Yi-Zeng Liang. Chemometrics and Intelligent Laboratory Systems, 56 (1), 1 (2001). https://doi.org/10.1016/S0169-7439(00)00122-2
- I. Isaev, S. Burikov, T. Dolenko, K. Laptinskiy, A. Vervald, S. Dolenko. (2018). Joint Application of Group Determination of Parameters and of Training with Noise Addition to Improve the Resilience of the Neural Network Solution of the Inverse Problem in Spectroscopy to Noise in Data. In: V. Kurkova, Y. Manolopoulos, B. Hammer, L. Iliadis, I. Maglogiannis. (eds) Artificial Neural Networks and Machine Learning --- ICANN 2018. (Lecture Notes in Computer Science, 11139, 435 (2018) Springer, Cham.)
- D.P. Kingma. (2014). Adam: A method for stochastic optimization arXiv preprint. https://doi.org/10.48550/arXiv.1412.6980