An Efficient Sentiment Analysis Model for Crime Articles’ Comments using a Fine-tuned BERT Deep Architecture and Pre-Processing Techniques
Subject Areas : Natural Language Processing
Sovon
Chakraborty
1
(Brac University)
Muhammad
Borhan Uddin Talukdar
2
(Daffodil International University)
Portia
Sikdar
3
(North Western University)
Jia
Uddin
4
(Woosong University)
Keywords: BERT, BNLP, NLP, Sentiment Analysis, Bangla Sentiment Analysis.,
Abstract :
The prevalence of social media these days allows users to exchange views on a multitude of events. Public comments on the talk-of-the-country crimes can be analyzed to understand how the overall mass sentiment changes over time. In this paper, a specialized dataset has been developed and utilized, comprising public comments from various types of online platforms, about contemporary crime events. The comments are later manually annotated with one of the three polarity values- positive, negative, and neutral. Before feeding the model with the data, some pre-processing tasks are applied to eliminate the dispensable parts each comment contains. In this study, A deep Bidirectional Encoder Representation from Transformers (BERT) is utilized for sentiment analysis from the pre-processed crime data. In order the evaluate the performance that the model exhibits, F1 score, ROC curve, and Heatmap are used. Experimental results demonstrate that the model shows F1 Score of 89% for the tested dataset. In addition, the proposed model outperforms the other state-of-the-art machine learning and deep learning models by exhibiting higher accuracy with less trainable parameters. As the model requires less trainable parameters, and hence the complexity is lower compared to other models, it is expected that the proposed model may be a suitable option for utilization in portable IoT devices.
1. S. R. Bandekar and C. Vijayalakshmi, “Design and analysis of machine learning algorithms for the reduction of crime rates in India,” Procedia Computer Science, 2020, vol. 172, pp. 122-127.
2. M. P., Rahman, A. M. I., Hoque, M. F., Ahmed, I., Iftekhirul, A., Alam, and N. Hossain, “Bangladesh Crime Reports Analysis and Prediction,” In International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), 2021 pp. 453-458
3. H. Tabassum, G., Ghosh, A., Atika, and A. Chakrabarty, “Detecting Online Recruitment Fraud Using Machine Learning,” In 9th International Conference on Information and Communication Technology (ICoICT), 2021, pp. 472-477
4. A. Alzubaidi, “Measuring the level of cyber-security awareness for cybercrime in Saudi Arabia,” Heliyon, vol. 7, no. 1, e06016.
5. S. Lal, L. Tiwari, R. Ranjan, A. Verma, N. Sardana, and R. Mourya, “Analysis and classification of crime tweets. Procedia computer science,” 2020, vol. 167, pp. 1911-1919.
6. A. A. Biswas and S. Basak, “Forecasting the trends and patterns of crime in Bangladesh using machine learning model,” In 2nd international conference on intelligent communication and computational techniques (ICCT), 2019, pp. 114-118.
7. F. M. J. M. Shamrat, S. Chakraborty, M. M. Imran, J. N. Muna, M. M. Billah, P. Das, and O. M. Rahman, “Sentiment analysis on twitter tweets about COVID-19 vaccines using NLP and supervised KNN classification algorithm,” Indones. J. Electr. Eng. Comput. Sci, 2021, vol. 23, no. 1, pp. 463-470.
8. S. Aghababaei and M. Makrehchi, “Mining Social Media Content for Crime Prediction,” IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2016, pp. 526-531, doi: 10.1109/WI.2016.0089.
9. W. Li, L. Zhu, Y. Shi, K. Guo, and E. Cambria, “User reviews: Sentiment analysis using lexicon integrated two-channel CNN–LSTM family models,” Applied Soft Computing, vol. 94, no. 106435, DIO: 10.1016/j.asoc.2020.106435
10. J. Luo, S. Huang, and R. Wang, “A fine-grained sentiment analysis of online guest reviews of economy hotels in China,” Journal of Hospitality Marketing and Management, vol. 30, no. 1, pp. 71-95.
11. S. Rahman, J. N. Hemel, S. J. A. Anta, H. Al Muhee, and J. Uddin, “Sentiment analysis using R: An approach to correlate cryptocurrency price fluctuations with change in user sentiment using machine learning,” In Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 2018, pp. 492-497.
12. M. M Rahman, M. A. Pramanik, R. Sadik, M. Roy, and P. Chakraborty, “Bangla documents classification using transformer based deep learning models,” In 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI), 2020, pp. 1-5.
13. M. Singh, A. K. Jakhar, and S. Pandey, “Sentiment analysis on the impact of coronavirus in social life using the BERT model,” Social Network Analysis and Mining, 2021, vol. 11, no. 1, pp. 1-11.
14. Z. Gao, A. Feng, X. Song, and X. Wu, “Target-dependent sentiment classification with BERT,” IEEE Access, 2019, vol. 7, pp. 154290-154299.
15. C. Sun, L. Huang, and X. Qiu, “Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence,” 2019, arXiv preprint arXiv:1903.09588.
16. S. Xie, J. Cao, Z. Wu, K. Liu, X. Tao, H. Xie, “Sentiment Analysis of Chinese E-commerce Reviews Based on BERT,” In IEEE 18th International Conference on Industrial Informatics (INDIN), 202, vol. 1, pp. 713-718.
17. X. Li, L. Bing, W. Zhang, and W. Lam, “Exploiting BERT for end-to-end aspect-based sentiment analysis,” 2019, arXiv preprint arXiv:1910.00883.
18. S. Thurner, R. Hanel, B. Liu, B. Corominas-Murtra, “Understanding Zipf's law of word frequencies through sample-space collapse in sentence formation,” Journal of the Royal Society Interface, 2015, vol. 12, no. 108, pp. 20150330.
19. S. Nakagawa, P. C. Johnson, H. Schielzeth, “The coefficient of determination R 2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded,” Journal of the Royal Society Interface, vol. 14, no. 134, pp. 20170213.
20. H. Jing, C. Wang, L. Cheng, J. Qi, S. Jiang, and X. Zhang, “Automatic Development of Knowledge Graph Based on NLTK and Sentence Analysis,” In 3rd International Conference on Natural Language Processing (ICNLP), 2021, pp. 52-56.
21. S., Ezhilarasi and P. U. Maheswari, “Depicting a Neural Model for Lemmatization and POS Tagging of Words from Palaeographic Stone Inscriptions,” In 5th International Conference on Intelligent Computing and Control Systems (ICICCS), 2021, pp. 1879-1884.
22. T. Fawcett, “ROC graphs: Notes and practical considerations for researchers,” Machine learning, 2004, vol. 31, no. 1, pp. 1-38.
23. H. Guo, W. Zhang, C. Ni, Z. Cai, S. Chen, and X. Huang, “Heat map visualization for electrocardiogram data analysis,” BMC cardiovascular disorders, 2020, vol. 20, no. 1, pp. 1-8.
24. P. Chowdhury, E. M. Eumi, O. Sarkar, and M. Ahamed, “Bangla News Classification Using GloVe Vectorization, LSTM, and CNN,” In International Conference on Big Data, IoT, and Machine Learning, Singapore, 2022, pp. 723-731.
25. M. A. Rahman and E. Kumar Dey, “Datasets for aspect-based sentiment analysis in bangla and its baseline evaluation,” Data, vol. 3, no. 2, pp. 1-15.
26. S. Chowdhury and W. Chowdhury, “Performing sentiment analysis in Bangla microblog posts,” In International Conference on Informatics, Electronics & Vision (ICIEV), 2014, pp. 1-6.
27. M. H. Munna, M.R.I. Rifat, and A.S.M. Badrudduza, “Sentiment analysis and product review classification in e-commerce platform,” In 23rd International Conference on Computer and Information Technology (ICCIT), 2020, pp. 1-6.