Comparing Approaches to Improving Representativity of Spectroscopic Data using Variational Autoencoders
Mushchina A. S.1,2, Isaev I. V.1, Sarmanova O. E.1,2, Burikov S. A.1,2, Dolenko T. A.1,2, Dolenko S. A.1
1Lomonosov Moscow State University, Skobeltsyn Institute of Nuclear Physics, Moscow, Russia
2Department of Physics, Lomonosov Moscow State University, Moscow, Russia
Email: anastasemusa@gmail.com

PDF
Solving inverse problems in optical spectroscopy of multicomponent solutions to determine component concentrations is a complex task. One effective approach to solving this problem is the use of artificial neural networks. However, one of the major challenges in this approach is the limited representativity of experimental data, due to the complexity and high cost of large-scale physical experiments. In this paper, algorithms for generating additional model data using variational autoencoders are explored and compared to enhance the representativity of the training dataset. The results show that the most promising approach is the use of a standard (unconditioned) variational autoencoder, generating patterns from the uniform distribution in the latent space. Further research should focus on identifying the optimal distribution in the latent space for generating patterns. Keywords: data generation, inverse problem of spectroscopy, multicomponent solutions, artificial neural networks.
  1. Yu.N. Vodyanitsky, D.V. Ladonin, A.T. Savichev. Zagryaznenie pochv tyazhelymy metallami (Tipographia Rosselkhozakademii, M., 2012 (in Russain))
  2. G.A. Teplaya. Astrakhan bulletin of ecological education, 1 (23), 182 (2013)
  3. Y. Fa, Y. Yu, F. Li, F. Du, X. Liang, X. Liu. J. Chromatography A, 1554, 123 (2018). https://doi.org/10.1016/j.chroma.2018.04.017
  4. N.G. Carpenter, D. Pletcher. Anal. Chim. Acta, 317, 287 (1995). https://doi.org/10.1016/0003-2670(95)00384-3
  5. C. Pasquini, I.B.S. Cunha. Analyst, 120 (11), 2763 (1995). https://doi.org/10.1039/AN9952002763
  6. N. Porter, B.T. Hart, R. Morrison, I.C. Hamilton. Anal. Chim. Acta, 308, 313 (1995). https://doi.org/10.1016/0003-2670(94)00330-O
  7. C. Neuhold, K. Kalcher, W. Diewald, X. Cai, G. Raber. Electroanalysis, 6, 227 (1994). https://doi.org/10.1002/elan.1140060309
  8. B. Saad, F.W. Pok, A.N.A. Sujari, M.I. Saleh. Food Chem., 61 (1-2), 249 (1998). https://doi.org/10.1016/S0308-8146(97)00024-1
  9. V.K. Maurya, R.P. Singh, L.B. Prasad. Orient. J. Chem., 34 (1), 100 (2018). http://dx.doi.org/10.13005/ojc/340111
  10. C. Shorten, T.M. Khoshgoftaar. J. Big. Data, 6, 60 (2019). https://doi.org/10.1186/s40537-019-0197-0
  11. L. Nanni, G. Maguolo, M. Paci. Ecolog. Inform., 57, 101084 (2020). https://doi.org/10.1016/j.ecoinf.2020.101084
  12. Q. Wen, L. Sun, F. Yang, X. Song, J. Gao, X. Wang, H. Xu. Comp. Sci. Machine Learning, (2020). https://doi.org/10.48550/arXiv.2002.12478
  13. I.V. Isaev, S.A. Burikov, T.A. Dolenko, K.A. Laptinskiy, S.A. Dolenko. Improving the resilience of neural network solution of inverse problems in Raman spectroscopy to the distortions caused by frequency shift of the spectral channels (Samara, 2018), p. 2710-2715
  14. D.P. Kingma, M. Welling. Foundations and Trends in Machine Learning, 12 (4), 307 (2019). http://dx.doi.org/10.1561/2200000056
  15. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Commun. ACM, 63 (11), 139 (2020). https://doi.org/10.1145/3422622
  16. J. Li, A. Madry, J. Peebles, L. Schmidt. On the Limitations of First-Order Approximation in GAN Dynamics, Proceed. 35th Intern. Conf. Machine Learning, PMLR, 80, 3005-3013 (2018)
  17. M. Shanker, M.Y. Hu, M.S. Hung. Omega, 24 (4), 385 (1996). https://doi.org/10.1016/0305-0483(96)00010-2
  18. A. Efitorov, T. Dolenko, K. Laptinskiy, S. Burikov, S. Dolenko. Proceed. Sci., 410, art. 013 (2021). https://doi.org/10.22323/1.410.0013
  19. A. Efitorov, S. Burikov, T. Dolenko, S. Dolenko. Studies in Computational Intelligence, 1064, 557 (2023). https://doi.org/10.1007/978-3-031-19032-2_56
  20. S. Kullback, R.A. Leibler. Annals Mathemat. Statistics, 22, 79 (1951)
  21. K. Sohn, X. Yan, H. Lee. Learning structured output representation using deep conditional generative models Advances in Neural Information Processing Systems, ed. C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett (Curran Associates, Inc. 28, 2015)
  22. Qing-Song Xu, Yi-Zeng Liang. Chemometrics and Intelligent Laboratory Systems, 56 (1), 1 (2001). https://doi.org/10.1016/S0169-7439(00)00122-2
  23. I. Isaev, S. Burikov, T. Dolenko, K. Laptinskiy, A. Vervald, S. Dolenko. (2018). Joint Application of Group Determination of Parameters and of Training with Noise Addition to Improve the Resilience of the Neural Network Solution of the Inverse Problem in Spectroscopy to Noise in Data. In: V. Kurkova, Y. Manolopoulos, B. Hammer, L. Iliadis, I. Maglogiannis. (eds) Artificial Neural Networks and Machine Learning --- ICANN 2018. (Lecture Notes in Computer Science, 11139, 435 (2018) Springer, Cham.)
  24. D.P. Kingma. (2014). Adam: A method for stochastic optimization arXiv preprint. https://doi.org/10.48550/arXiv.1412.6980
Publisher:

Ioffe Institute

Institute Officers:

Director: Sergei V. Ivanov

Contact us:

26 Polytekhnicheskaya, Saint Petersburg 194021, Russian Federation
Fax: +7 (812) 297 1017
Phone: +7 (812) 297 2245
E-mail: post@mail.ioffe.ru