Приложение на изкуствени невронни мрежи за гласово разпознаване на български език

  • Penka Valkova Georgieva mathematics
  • Hasan Mehmedov Hasanov

Абстракт

The natural language processing is one of the main areas of modern artificial intelligence. Voice recognition is an element of natural language processing and aims at transforming spoken words into written text by various techniques. Researchers in this area face many challenges that have different sources.


In this article Bulgarian Language Speech Recognition System 1.0 (BLSRS 1.0) is proposed and test results are presented. BLSRS 1.0 is based on an artificial neural network, trained to recognize the corresponding spectrograms.

Биография на Автор

Penka Valkova Georgieva, mathematics

доц. д-р Пенка Георгиева

Бургаски Свободен Университет

Hasan Mehmedov Hasanov

Бургаски свободен университет

 

 

Литература

[1] Krauwer S., "The Basic Language Resource Kit (BLARK) as the First Milestone for the Language Resources Roadmap," in International Workshop Speech and Computer, SPECOM 2003, 2003.
[2] Besacier L., Et. Barnard, Al. Karpov, T. Schultzd, "Automatic speech recognition for under-resourced languages: A survey," Speech Communication, vol. 56, pp. 85-100, 2014.
[3] Georgieva P., Genetic Fuzzy Systems (in Bulgarian), Burgas: Poligraph, 2016.
[4] Olson H., H. Belar, “Phonetic Typewriter,” The Journal of the Acoustical Society of America, vol. 28, no. 6, pp. 1072-1081, 1956.
[5] Forgie J., C. Forgie, “Results Obtained from a Vowel Recognition Computer,” The Journal of the Acoustical Society of America, vol. 31, no. 11, pp. 1480-1489, 1959.
[6] Suzuki J., K. Nakata , “Recognition of Japanese Vowels—Preliminary to the Recognition of Speech,” J. Radio Res. Lab, vol. 37, no. 8, pp. 193-212, 1961.
[7] Sakai T., S. Doshita, “The Phonetic Typewriter,” The Journal of the Acoustical Society of America, vol. 33, no. 11, 1961.
[8] Nagata K., Y. Kato, S. Chiba, „Spoken Digit Recognizer for Japanese Language,“ NEC Res. Develop, № 6, 1963.
[9] Denes P., “The Design and Operation of the Mechanical Speech Recognizer at University College London,” British Institution of Radio Engineers, vol. 19, no. 4, pp. 211-229, 1959.
[10] Martin Т., А. Nelson, Х. Zadell, “Speech Recognition by Feature Abstraction,” Tech. Report AL-TDR-64-176, Air Force Avionics Lab, 1964.
[11] Vintsyuk Т., “Speech Discrimination by Dynamic Programming,” Kibernetika, vol. 4, no. 2, pp. 81-88, 1968.
[12] Sakoe H., S. Chiba, “Dynamic Programming Algorithm Quantization for Spoken Word,” Speech and Signal Proc., vol. 26, no. 1, pp. 43-49, 1978.
[13] Viterbi A., “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans. Informaiton Theory, vol. 13, pp. 260-269, 1967.
[14] Atal B., S. Hanauer, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave,” J. Acoust. Soc. Am., vol. 50, no. 2, pp. 637-655, 1971.
[15] Itakura F., S. Saito, “A Statistical Method for Estimation of Speech Spectral Density and Formant Frequencies,” Electronics and Communications in Japan, vol. 53, pp. 36-43, 1970.
[16] I. F., “Minimum Prediction Residual Principle Applied to Speech Recognition,” IEEE Trans. Acoustics, Speech and Signal Proc, vol. 23, pp. 57-72, 1975.
[17] Rabiner L., S. Levinson, A. Rosenberg, J. Wilpon, “Speaker Independent Recognition of Isolated Words Using Clustering Techniques,” IEEE Trans. Acoustics, Speech and Signal Proc., vol. 27, pp. 336-349, 1979.
[18] Lowerre B., “The HARPY Speech Understanding System,” Trends in Speech Recognition, Speech Science Publications, 1986, reprinted in Readings in Speech Recognition, pp. 576-586, 1990.
[19] Klatt D., “Review of the DARPA Speech Understanding Project (1),” J. Acoust. Soc. Am., vol. 62, pp. 1345-1366, 1977.
[20] Georgieva P., H. Hasanov, „Voice recognition - historical development and main techniques,“ Computer Science and Communications , том 6, № 1, pp. 20-55, 2017.
[21] Juang B., C. Lee, W. Chou, “Minimum Classification Error Rate Methods for Speech Recognition,” IEEE Trans. Speech & Audio Processing, T-SA, vol. 5, no. 3, pp. 257-265, 1997.
[22] Vapnik V., Statistical Learning Theory, John Wiley and Sons, 1998.
[23] Lee K., Large-vocabulary Speaker-independent Continuous Speech Recognition: The Sphinx System, Ph.D. Thesis, Carnegie Mellon University, 1988.
[24] Schwartz R., C. Barry , Y. Chow, etc., „The BBN BYBLOS Continuous Speech Recognition System,“ in Proc. of the Speech and Natural Language Workshop, Philadelphia, 1989.
[25] Murveit H., M. Cohen , P. Price , etc., „SRI's DECIPHER System,“ in proceedings of the Speech and Natural Language Workshop, 1989, Philadelphia.
[26] Young S., „the HTKBook,“ http://htk.eng.cam.ac.uk/.
[27] Glass J., E. Weinstein, „SpeechBuilder: Facilitating Spoken Dialogue System Development,“ 7th European Conf. on Speech Communication and Technology, Aalborg Denmark, 2001.
[28] Gorin A., B. Parker, R. Sachs, J. Wilpon, “How May I Help You?,” 1996.
[29] Huang X., A. Acero, H. Hon, Spoken Language processing – A Guide to Theory, Algorithms and System Development, Prentice Hall PTR, 2001, pp. 375-407.
Публикуван
2017-12-23
Как да се цитира
GEORGIEVA, Penka Valkova; HASANOV, Hasan Mehmedov. Приложение на изкуствени невронни мрежи за гласово разпознаване на български език. Списание ХайТек / HiTech Journal, [S.l.], v. 1, n. 1, p. 69-81, дек. 2017. ISSN 2534-9996. Достъпно на: <https://hitech.agency/hit/index.php/hit/article/view/25>. Дата на достъп: 16 юли 2019.
Раздел
ХайТек. Рецензирани научно-технически публикации