An empirical study of Vietnamese sentiment analysis - Nguyen Hoang Quan

Tài liệu An empirical study of Vietnamese sentiment analysis - Nguyen Hoang Quan: Công nghệ thông tin & Cơ sở toán học cho tin học N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 104 AN EMPIRICAL STUDY OF VIETNAMESE SENTIMENT ANALYSIS Nguyen Hoang Quan*, Le Dinh Son , Nguyen Quang Uy Abstract: Sentiment Analysis is the area of research that studies people’s opinions, sentiments, evaluations, attitudes and emotions from written text. This has become one of the most active research fields in Natural Language Processing. In recent years, sentiment analysis for Vietnamese text has received a considerable attention. In the fourth international workshop on Vietnamese language and speech processing, there was a competition between various researchers in solving this problem based on a benchmark dataset provided by the workshop committee. However, each researcher address the problem with different methods. In this paper, we present a comparison of a large number of machine learning algorithms for tackling this ...

7 trang | Chia sẻ: quangot475 | Lượt xem: 707 | Lượt tải: 0

Bạn đang xem nội dung tài liệu An empirical study of Vietnamese sentiment analysis - Nguyen Hoang Quan, để tải tài liệu về máy bạn click vào nút DOWNLOAD ở trên

Công nghệ thông tin & Cơ sở toán học cho tin học N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 104 AN EMPIRICAL STUDY OF VIETNAMESE SENTIMENT ANALYSIS Nguyen Hoang Quan*, Le Dinh Son , Nguyen Quang Uy Abstract: Sentiment Analysis is the area of research that studies people’s opinions, sentiments, evaluations, attitudes and emotions from written text. This has become one of the most active research fields in Natural Language Processing. In recent years, sentiment analysis for Vietnamese text has received a considerable attention. In the fourth international workshop on Vietnamese language and speech processing, there was a competition between various researchers in solving this problem based on a benchmark dataset provided by the workshop committee. However, each researcher address the problem with different methods. In this paper, we present a comparison of a large number of machine learning algorithms for tackling this problem. The results of the experiments help to gain insight into the ability of various machine learning algorithms when used for Vietnamese sentiment analysis. Keywords: Sentiment analysis, Opinion mining, Machine learning. 1. INTRODUCTION Sentiment analysis is the task of determining user’s opinion about products, movies, events or policies etc. In this topic, sentiment classification is one of the most important task aiming to classify opinion of a sentence or document into several categories such as positive, negative and neutral. Predicting user’s sentiment is extremely important because the user’s opinion becomes more and more value. The public interest is the main factor that affects the profit of products like movies, books, etc. Subsequently, this problem is the interest of both researchers and companies. For a comprehensive survey of sentiment analysis and opinion mining, readers are refered to [1]. The major tasks in sentiment analysis include: • Subjective classification: aims to classify subjectivity and objectivity documents. • Polarity sentiment classification: aims to classify an subjectivity document into one of the three classes: “positive”, “negative” and “neutral”. • Rating: aims to rate the documents having personal opinions from 1 star to 5 star (very negative to very positive). For sentiment analysis of Vietnamese language, VLSP 2016 (The fourth Internationaly Workshop on Vietnamese Language and Speech Processing) evaluation campaign is the first effort to provide the benchmark data and to perform a systematic comparison between Vietnamese sentiment analysis systems. The scope of the campaign in VLSP 2016 is polarity classification in which participant systems need to classify Vietnamese reviews/documents into one of three categories: “positive”, “negative”, or “neutral”. The campaign has attracted eight teams participating the five best results are published in the proceedings of the workshop [2, 3, 4, 5, 6]. Overall, various machine learning have been applied by researchers to tackle this problem in VLSP 2016 competition. However, each team has used some of the popular algorithms with different parameters settings and features extraction methods. Therefore, it is Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 105 difficult to assess if a method is superior to other when used for Vietnamese sentiment analysis. The objective of this paper is to systematically conduct a comparsion between a large number of machine learning techniques in solving Vietnamese sentiment analysis problem. We also carefully tune parameters setting of the tested algorithms and only the best result of each method was reported. Based on the results of this paper, we will have better insight into the ability of different machine learning techniques in solving Vietnamese sentiment analysis problem. The remainder of this paper is organized as follows. Section 2 provides a detail of our system. Section 3 describes the experimental setup. The results of the experiments are presented and discussed in Section 4. Section 5 concludes the paper and points to avenues for future work. 2. SYSTEM DESCRIPTION Figure 1 illustrates the processes of our system for sentiment classification. After preprocessing data by removing low-frequency words, in feature extraction step, we extract sentence or document feature vector using TF and TF-IDF features. Then these feature vector is input to a classifier such as Support Vector Machine or Multilayer Neural Network, etc. to determine sentiment label of sentence or document. Figure 1. Sentiment classification system. A. Features In this paper, two methods are used to extract features from each sentence or each document: a) TF (Term Frequency)[7]: term frequency is often used to present the relationship between words in a document. Usually, the simplest choice is to use the raw frequency of a term in a document, i.e. the number of times that term t occurs in document d. If we denote the raw frequency of t by ft,d then the simple tf scheme is tf(t,d) = ft,d. b) TF-IDF (Term Frequency * Inverse Document Frequency)[8]: TF-IDF usually used in information retrieval to determine which words are importance. This feature has solved the local and global information problem in feature extraction approach through TF and IDF score. In our experiment, we use TF-IDF score of unigram (uni-word) to extract the feature from a sentence or a document. B. Algorithms A larger number of machine learning algorithms are used in this paper for comparison. These algorithms include: • Support Vector Machine (SVM) [9]: SVM is the classic supervised machine learning algorithm. The goal of SVM is to determine the hyperplane that has the largest distance to the support vectors. With its effectiveness in classification, Document Feature Vector Feature Extraction Sentiment Label Preprocess Classification Công nghệ thông tin & Cơ sở toán học cho tin học N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 106 SVM is popularly used in many areas as the hand-written recognizer, opinion mining. etc. • Multilayer Neural Network (MLNN) [10]: In neural network classifier, sentence’s features are integrated into a multi-layer full connected network. The last layer use softmax function classify feature vector. Particularly, MLNN was trained by stochastic gradient descent optimizer (learning rate 0.1) and it uses sigmoid function as activation function at hidden layer. • K-Nearest Neighbors: In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification [11]. The input consists of the k closest training examples in the feature space. The output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). • Decision Tree[12]: A decision tree is a classification algorithm that uses a tree-like graph for making decisions [13]. Each internal node of a tree represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represents the outcome of the test and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represents classification rules. • Random forests: Random forests [14] are the method that aim to correct for decision trees' habit of overfitting to their training set. Random forests are an ensemble learning method used for classification and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. • The passive-aggressive algorithms [15] are a family of algorithms for large- scale learning. They are similar to the Perceptron Network in that they do not require a learning rate. However, contrary to the Perceptron, they include a regularization parameter C in order to a void overfitting. • AdaBoost: AdaBost shorted for "Adaptive Boosting", is a ensemble method that can be used in conjunction with many other types of learning algorithms to improve their performance. The output of the other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. For all these algorithms, we used their implementation in Scikit-learn library to conduct the experiments. Scikit-learn is a popular machine learning library written in python [16]. 3. EXPERIMENTAL SETTINGS A. Dataset To train and test our systems, we used the data provided in the VLSP evaluation campaign in sentiment analysis task. It contains user’s reviews about technological Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 107 device following three categories: ”negative”, ”positive” and ”neutral”. We divided the dataset into two parts: one for training and another for testing. The number of positive, negative, neutral samples and the total samples for training and testing set are shown in Table 1. Table 1. Dataset for training and testing algorithms. Positive Neutral Negative Total Train 1400 1400 1400 4200 Test 300 300 300 900 B. Evaluation The performance of the sentiment classification systems will be evaluated using three popular metrics including precision, recall, and the F1 score. Let A and B be the set of reviews that the system predicted as positive and the set of reviews with positive label, the precision, recall, and the F1 score of positive label can be computed as follows (similarly for negative and neutral labels): A B precision A   A B recall B   2 1 precision recall F precision recall     After calculating precision, recall and F1 for each label, the final value of precision, recall and F1 of a method is obtained by averaging over the value of three labels. 4. RESULTS AND DISCUSSION For each method, we used two features (TF, and TF-IDF) as has been presented in the above section. We also tested each algorithm with various values of its parameters. After a number of experiments, we found that the results with TF feature is often worse compared to the results of TF-IDF feature. The reason could be that TF feature ignored the information relating to the lengtht of the document and this information may be useful for classifying the opinions contained in the document. Therefore, the results of TF feature was discarded and we only present and discuss the result of TF-IDF feature in this section. The best result on each algorithm with the parameters that achieved best result are shown in Table 2. Among three performance metrics, F1 is the most important measure. Therefore, in this table, we focus on comparing between algorthms based on F1. Precision and Recall are used only for reference. In Table 2, the best algorithm (the highest value of F1) is printed bold faced and the worst algorithm (the lowest value of F1) is printed bold and italic faced. Công nghệ thông tin & Cơ sở toán học cho tin học N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 108 Table 2. The best results of each algorthm with it parameters. Classifier Precision Recall F1- Score Nearest Neighbor (K=4) 0.54 0.52 0.50 SVM (Radial basic function kernel, Gamma=2, C=1) 0.66 0.64 0.65 SVM (LinearSVC) 0.64 0.64 0.63 SVM (Stochastic Gredient Descent) 0.57 0.56 0.56 Decision Tree (Max Deep=5) 0.54 0.45 0.42 Random Forest (Max Deep=5, n_estimators=10, max_features=1) 0.45 0.44 0.41 Neural Network (Multi-layer Perceptron, alpha=1, hidden_layer_size=100) 0.58 0.57 0.56 Passive Aggressive (C=1.0, n_iter=5) 0.63 0.63 0.62 Adaptive Boosting (n_estimaters=50) 0.57 0.56 0.56 It can be seen from this table that the best result achieved by support vector machine with Radial basic function kernel while the worst result obtained by random forest. Overall, the results of support vector machine with different kernel functions are rather solid. The value of F1 achieved by support vector machine with three kernel functions are alway among the best results of all methods. Figure 2. Comparison between nine algorithms based on F1-score. Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 109 The table also shows that tree-based algorithms often did not perform well on this problem. Two tree-based algorithms including Decision Tree and Random Forest are the worst methods among nine tested methods. Among four left algorithms, we can see that the results of passive aggressive is also very good. This algorithm is ranked third among nine algorithms while the results of neural network and AdaBoost are equal and they are in the midle of the tested algorithms. Finally, the result of Nearest Neighbor is also not convincing. This method is only better than two tree-based algorthms (Decission Tree and Random Forest). Figure 2 presents in details the comparison between nine tested algorithms based on their F1 value. 5. CONCLUSION In this paper, we have examined the performance of various machine learning algorithms in solving Vietnamese sentiment analysis problem. Nine popular algorithms were selected and tested. A recent released data set for Vietnamese sentiment analysis (VLSP 2016 dataset) was used in the experiments. The results of the experiments showed that the methods based on support vector machine achieved the best performance while tree-based methods (Decision Tree and Random Forest) did not perform well. The results also showed that using TF-IDF feature is better than using TF feature. These results provide the insight into the ability of various machine learning techniques in solving this problem. There are a number of research areas for future work, which arise from this paper. First, we would like to investigate the better techniques for features extraction. In this paper, we have shown that TF-IDF feature is better than IF feature. In the future, we will study the methods of word2vec to extract features for Vietnamese sentiment analysis. Second, recent research has shown that applying deep learning to neural language proceecing has gained a significant improvement. Therefore, it will be interesting to examine if deep learning can be usefull for Vietnamese sentiment analysis. Last but not least, we are planning to collect and conduct our research on a large Vietnamese dataset. REFERENCES [1]. K. Ravi and V. Ravi, “A survey on opinion mining and sentiment analysis: Tasks, approaches and applications” Knowledge-Based Systems, vol. 89, pp. 14, 2015. [2]. Le Anh Cuong, Ng. T. Minh Huyen, Ng. Viet Hung, “VLSP 2016 Shared Task: Vietnamese Analysis”, VLSP 2016, Ha Noi, 2016. [3]. Vi Ngo Van, Minh Hoang Van, Tam Nguyen Thanh: “Sentiment Analysis for Vietnamese using Support Vector Machines with application to Facebook”, VLSP 2016, Ha Noi, 2016. [4]. Hy Nguyen, Tung Le, Viet-Thang Luong, Dinh Dien: “A Simple Supervised Learning Approach to Sentiment Classiﬁcation at VLSP 2016”, VLSP 2016, Ha Noi, 2016. [5]. Minh Nhat Quang Pham, Tran The Trung: “A Lightweight Ensemble Method for Sentiment Classiﬁcation Task”, VLSP 2016, Ha Noi, 2016. [6]. Quynh-Trang Thi Pham, Xuan-Truong Nguyen, Van-Hien Tran, Thi-Cham Nguyen, Mai-Vu Tran: “DSKTLAB: Vietnamese Sentiment Analysis for Product Reviews”, VLSP 2016, Ha Noi, 2016. Công nghệ thông tin & Cơ sở toán học cho tin học N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 110 [7]. Rajaraman, A.; Ullman, J. D. (2011). "Data Mining". Mining of Massive Datasets (PDF). pp. 1–17. [8]. Khoo Khyou Bun; Bun, Khoo Khyou; Ishizuka, M. "Emerging Topic Tracking System". Proceedings Third International Workshop on Advanced Issues of E- Commerce and Web-Based Information Systems. WECWIS 2001. [9]. C. Cortes and V. Vapnik, “Support-vector networks” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995. [10]. Rosenblatt, Frank. x. “Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms”. Spartan Books, Washington DC, 1961 [11].Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor nonparametric regression". The American Statistician. 46 (3): 175–185. [12]. Rokach, Lior; Maimon, O. (2008). “Data mining with decision trees: theory and applications”. World Scientific Pub Co Inc. ISBN 978-9812771711. [13]. Rokach, Lior; Maimon, O. (2008). “Data mining with decision trees: theory and applications”. World Scientific Pub Co Inc. ISBN 978-9812771711. [14]. Ho, Tin Kam (1998). "The Random Subspace Method for Constructing Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601. [15]. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer, “Online Passive-Aggressive Algorithms”, School of Computer Science and Engineering The Hebrew University Jerusalem, 91904, Israel, 2006 [16]. Luis Pedro Coelho, Willi Richert, “Building Machine Learning System with Python”, ISBN 978-1-78439-277-2, 2015. TÓM TẮT MỘT NGHIÊN CỨU THỰC NGHIỆM VỀ PHÂN TÍCH TÂM LÝ TRONG TIẾNG VIỆT Phân tích tâm lý bao gồm các lĩnh vực nghiên cứu về ý kiến, tình cảm, đánh giá, thái độ và cảm xúc của con người dựa vào đoạn văn bản. Điều này đã trở thành một trong những lĩnh vực nghiên cứu tích cực nhất trong xử lý ngôn ngữ tự nhiên. Trong những năm gần đây, phân tích tâm lý cho văn bản tiếng Việt đã nhận được sự chú ý đáng kể. Trong hội thảo quốc tế lần thứ tư về ngôn ngữ Việt và xử lý tiếng nói, có một cuộc thi giữa các nhà nghiên cứu khác nhau trong việc giải quyết vấn đề này trên một tập dữ liệu chuẩn được ban tổ chức hội thảo cung cấp. Tuy nhiên, mỗi nhà nghiên cứu giải quyết các vấn đề với các phương pháp khác nhau. Trong bài báo này, chúng tôi thể hiện sự so sánh của nhiều thuật toán học máy để giải quyết vấn đề trên. Các kết quả thí nghiệm giúp hiểu sâu hơn về khả năng của các thuật toán học máy khác nhau khi được sử dụng để phân tích tâm lý trong tiếng Việt. Từ khóa: Phân tích tâm lý, Khai thác quan điểm, Học máy. Nhận bài ngày 17 tháng 02 năm 2017 Hoàn thiện ngày 04 tháng 4 năm 2017 Chấp nhận đăng ngày 05 tháng 4 năm 2017 Address: Military Technical Academy; *Email: [email protected]

Các file đính kèm theo tài liệu này:

12_son_2774_2151790.pdf