Part-of-Speech Tagging in Persian Language using Convolutional Neural Network

Rahmani, E.; Sarmadi, S.

doi:10.47176/TMI.2018.172

Part-of-Speech Tagging in Persian Language using Convolutional Neural Network

Document Type : Original Article

Authors

E. Rahmani ¹

S. Sarmadi ²

¹ Computer Engineering and Information Technology Department, Urmia University of Technology, Urmia, Iran

² Assistant Professor, Computer Engineering and Information Technology Department, Urmia University of Technology, Urmia, Iran

10.47176/TMI.2018.172

Abstract

Part-of-speech tagging involves identifying the grammatical roles of words within a sentence, such as nouns, verbs, and objects. This process plays a critical role in a variety of natural language processing (NLP) applications, including machine translation, syntactic parsing, spell-checking, and information retrieval. While significant research has been conducted on part-of-speech tagging for many languages, researchers working with Persian encounter unique challenges due to the language's distinctive syntactic and morphological features. Persian is an inflectional language with a complex system of verb conjugations, noun declensions, and word order variations, making it more difficult to apply standard part-of-speech tagging techniques. Traditional methods have utilized a combination of linguistic and statistical models to address these issues, but achieving high accuracy remains a complex task. In this study, we propose the use of a Convolutional Neural Network (CNN) for part-of-speech tagging in Persian. CNNs have demonstrated significant success in various NLP tasks due to their ability to automatically learn feature representations from raw input data, making them particularly effective for language processing tasks that involve complex patterns. The proposed model was evaluated on a large Persian corpus, and the results show that the CNN-based approach achieves a high accuracy rate of 98.55%. This performance indicates the potential of deep learning techniques, specifically CNNs, in overcoming the challenges associated with Persian part-of-speech tagging. The results suggest that CNNs can effectively capture the intricate syntactic and morphological patterns of Persian, providing a reliable method for part-of-speech tagging that can be further extended to other languages with similar complexities.

Keywords

Part-of-Speech Tagging

Word Embedding

Natural Language Processing

Text Corpus

[1] Okhovvat, M., & Bidgoli, B. M. (2011). A hidden Markov model for Persian part-of-speech tagging. Procedia Computer Science, 3, 977–981. https://doi.org/10.1016/j.procs.2010.12.160
[2] Passban, P., Liu, Q., & Way, A. (2016). Boosting neural POS tagger for Farsi using morphological information. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 16(1), 4. https://doi.org/10.1145/2934676
[3] Raja, F., et al. (2007). Evaluation of part of speech tagging on Persian text.
[4] Seraji, M., Megyesi, B., & Nivre, J. (2012). A basic language resource kit for Persian. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012) (pp. 23–25). European Language Resources Association.
[5] Seraji, M. (2011). A statistical part-of-speech tagger for Persian. In Proceedings of NODALIDA 2011 (pp. 11–13). Riga, Latvia.
[6] Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188. https://doi.org/10.3115/v1/P14-1062
[7] Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882. https://doi.org/10.3115/v1/D14-1181
[8] Zeng, D., et al. (2014). Relation classification via convolutional deep neural network. In Proceedings of COLING.
[9] Nguyen, T. H., & Grishman, R. (2015). Relation extraction: Perspective from convolutional neural networks. In Proceedings of the HLT-NAACL Workshop on Visual Analytics (pp. 1–10). https://doi.org/10.3115/v1/W15-1506
[10] Sun, Y., et al. (2015). Modeling mention, context and entity with neural networks for entity disambiguation. In Proceedings of IJCAI.
[11] Strubell, E., et al. (2017). Fast and accurate sequence labeling with iterated dilated convolutions. arXiv preprint arXiv:1702.02098. https://doi.org/10.18653/v1/D17-1283
[12] Gehring, J., et al. (2017). Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122.
[13] Soskek. (2017). Convolutional sequence to sequence learning (Gehring et al., 2017) by Chainer. GitHub. https://github.com/soskek/convolutional_seq2seq
[14] Chainer. (n.d.). A powerful, flexible, and intuitive framework for neural networks. https://chainer.org/
[15] Mikolov, T., et al. (2013). Distributed representations of words and phrases and their compositionality. In NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems (Vol. 2, pp. 3111–3119).
[16] Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). https://doi.org/10.3115/v1/D14-1162

Volume 1, Issue 3
Summer 2018
Pages 172-181

XML

PDF 474.67 K

Receive Date 02 June 2018
Revise Date 10 August 2018
Accept Date 17 September 2018

Article View 200
PDF Download 238

Transactions on Machine Intelligence

Part-of-Speech Tagging in Persian Language using Convolutional Neural Network

Volume 1, Issue 3Summer 2018Pages 172-181

Files

History

Share

How to cite

Statistics

Volume 1, Issue 3
Summer 2018
Pages 172-181