Công nghệ thông tin & Cơ sở toán học cho tin học 
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 104 
AN EMPIRICAL STUDY OF VIETNAMESE SENTIMENT ANALYSIS 
Nguyen Hoang Quan*, Le Dinh Son , Nguyen Quang Uy 
Abstract: Sentiment Analysis is the area of research that studies people’s opinions, 
sentiments, evaluations, attitudes and emotions from written text. This has become 
one of the most active research fields in Natural Language Processing. In recent 
years, sentiment analysis for Vietnamese text has received a considerable attention. In 
the fourth international workshop on Vietnamese language and speech processing, 
there was a competition between various researchers in solving this problem based on 
a benchmark dataset provided by the workshop committee. However, each researcher 
address the problem with different methods. In this paper, we present a comparison of 
a large number of machine learning algorithms for tackling this problem. The results 
of the experiments help to gain insight into the ability of various machine learning 
algorithms when used for Vietnamese sentiment analysis. 
Keywords: Sentiment analysis, Opinion mining, Machine learning. 
1. INTRODUCTION 
Sentiment analysis is the task of determining user’s opinion about products, 
movies, events or policies etc. In this topic, sentiment classification is one of the 
most important task aiming to classify opinion of a sentence or document into 
several categories such as positive, negative and neutral. Predicting user’s 
sentiment is extremely important because the user’s opinion becomes more and 
more value. The public interest is the main factor that affects the profit of products 
like movies, books, etc. Subsequently, this problem is the interest of both 
researchers and companies. 
For a comprehensive survey of sentiment analysis and opinion mining, readers 
are refered to [1]. The major tasks in sentiment analysis include: 
• Subjective classification: aims to classify subjectivity and objectivity 
documents. 
• Polarity sentiment classification: aims to classify an subjectivity document 
into one of the three classes: “positive”, “negative” and “neutral”. 
• Rating: aims to rate the documents having personal opinions from 1 star to 5 
star (very negative to very positive). 
For sentiment analysis of Vietnamese language, VLSP 2016 (The fourth 
Internationaly Workshop on Vietnamese Language and Speech Processing) 
evaluation campaign is the first effort to provide the benchmark data and to 
perform a systematic comparison between Vietnamese sentiment analysis systems. 
The scope of the campaign in VLSP 2016 is polarity classification in which 
participant systems need to classify Vietnamese reviews/documents into one of 
three categories: “positive”, “negative”, or “neutral”. 
The campaign has attracted eight teams participating the five best results are 
published in the proceedings of the workshop [2, 3, 4, 5, 6]. Overall, various 
machine learning have been applied by researchers to tackle this problem in VLSP 
2016 competition. However, each team has used some of the popular algorithms 
with different parameters settings and features extraction methods. Therefore, it is 
Nghiên cứu khoa học công nghệ 
Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 105
difficult to assess if a method is superior to other when used for Vietnamese 
sentiment analysis. The objective of this paper is to systematically conduct a 
comparsion between a large number of machine learning techniques in solving 
Vietnamese sentiment analysis problem. We also carefully tune parameters setting 
of the tested algorithms and only the best result of each method was reported. 
Based on the results of this paper, we will have better insight into the ability of 
different machine learning techniques in solving Vietnamese sentiment analysis 
problem. 
The remainder of this paper is organized as follows. Section 2 provides a detail 
of our system. Section 3 describes the experimental setup. The results of the 
experiments are presented and discussed in Section 4. Section 5 concludes the 
paper and points to avenues for future work. 
2. SYSTEM DESCRIPTION 
Figure 1 illustrates the processes of our system for sentiment classification. 
After preprocessing data by removing low-frequency words, in feature extraction 
step, we extract sentence or document feature vector using TF and TF-IDF 
features. Then these feature vector is input to a classifier such as Support Vector 
Machine or Multilayer Neural Network, etc. to determine sentiment label of 
sentence or document. 
Figure 1. Sentiment classification system. 
A. Features 
In this paper, two methods are used to extract features from each sentence or 
each document: 
a) TF (Term Frequency)[7]: term frequency is often used to present the 
relationship between words in a document. Usually, the simplest choice is to use 
the raw frequency of a term in a document, i.e. the number of times that term t 
occurs in document d. If we denote the raw frequency of t by ft,d then the simple tf 
scheme is tf(t,d) = ft,d. 
b) TF-IDF (Term Frequency * Inverse Document Frequency)[8]: TF-IDF 
usually used in information retrieval to determine which words are importance. 
This feature has solved the local and global information problem in feature 
extraction approach through TF and IDF score. In our experiment, we use TF-IDF 
score of unigram (uni-word) to extract the feature from a sentence or a document. 
B. Algorithms 
A larger number of machine learning algorithms are used in this paper for 
comparison. These algorithms include: 
• Support Vector Machine (SVM) [9]: SVM is the classic supervised machine 
learning algorithm. The goal of SVM is to determine the hyperplane that has the 
largest distance to the support vectors. With its effectiveness in classification, 
Document Feature 
Vector 
Feature 
Extraction 
Sentiment 
Label 
Preprocess Classification 
Công nghệ thông tin & Cơ sở toán học cho tin học 
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 106 
SVM is popularly used in many areas as the hand-written recognizer, opinion 
mining. etc. 
• Multilayer Neural Network (MLNN) [10]: In neural network classifier, 
sentence’s features are integrated into a multi-layer full connected network. The 
last layer use softmax function classify feature vector. Particularly, MLNN was 
trained by stochastic gradient descent optimizer (learning rate 0.1) and it uses 
sigmoid function as activation function at hidden layer. 
• K-Nearest Neighbors: In pattern recognition, the k-nearest neighbors 
algorithm (k-NN) is a non-parametric method used for classification [11]. The 
input consists of the k closest training examples in the feature space. The output is 
a class membership. An object is classified by a majority vote of its neighbors, 
with the object being assigned to the class most common among its k nearest 
neighbors (k is a positive integer, typically small). 
• Decision Tree[12]: A decision tree is a classification algorithm that uses a 
tree-like graph for making decisions [13]. Each internal node of a tree represents a 
"test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch 
represents the outcome of the test and each leaf node represents a class label 
(decision taken after computing all attributes). The paths from root to leaf 
represents classification rules. 
• Random forests: Random forests [14] are the method that aim to correct for 
decision trees' habit of overfitting to their training set. Random forests are an 
ensemble learning method used for classification and other tasks, that operate by 
constructing a multitude of decision trees at training time and outputting the class 
that is the mode of the classes (classification) or mean prediction (regression) of 
the individual trees. 
• The passive-aggressive algorithms [15] are a family of algorithms for large-
scale learning. They are similar to the Perceptron Network in that they do not 
require a learning rate. However, contrary to the Perceptron, they include a 
regularization parameter C in order to a void overfitting. 
• AdaBoost: AdaBost shorted for "Adaptive Boosting", is a ensemble method 
that can be used in conjunction with many other types of learning algorithms to 
improve their performance. The output of the other learning algorithms ('weak 
learners') is combined into a weighted sum that represents the final output of the 
boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners 
are tweaked in favor of those instances misclassified by previous classifiers. 
For all these algorithms, we used their implementation in Scikit-learn library to 
conduct the experiments. Scikit-learn is a popular machine learning library written 
in python [16]. 
3. EXPERIMENTAL SETTINGS 
A. Dataset 
To train and test our systems, we used the data provided in the VLSP evaluation 
campaign in sentiment analysis task. It contains user’s reviews about technological 
Nghiên cứu khoa học công nghệ 
Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 107
device following three categories: ”negative”, ”positive” and ”neutral”. We divided 
the dataset into two parts: one for training and another for testing. The number of 
positive, negative, neutral samples and the total samples for training and testing set 
are shown in Table 1. 
Table 1. Dataset for training and testing algorithms. 
 Positive Neutral Negative Total 
Train 1400 1400 1400 4200 
Test 300 300 300 900 
B. Evaluation 
The performance of the sentiment classification systems will be evaluated using 
three popular metrics including precision, recall, and the F1 score. Let A and B be 
the set of reviews that the system predicted as positive and the set of reviews with 
positive label, the precision, recall, and the F1 score of positive label can be 
computed as follows (similarly for negative and neutral labels): 
A B
precision
A
A B
recall
B
2
1
precision recall
F
precision recall
 
 
After calculating precision, recall and F1 for each label, the final value of 
precision, recall and F1 of a method is obtained by averaging over the value of 
three labels. 
4. RESULTS AND DISCUSSION 
For each method, we used two features (TF, and TF-IDF) as has been presented 
in the above section. We also tested each algorithm with various values of its 
parameters. After a number of experiments, we found that the results with TF 
feature is often worse compared to the results of TF-IDF feature. The reason could 
be that TF feature ignored the information relating to the lengtht of the document 
and this information may be useful for classifying the opinions contained in the 
document. Therefore, the results of TF feature was discarded and we only present 
and discuss the result of TF-IDF feature in this section. 
The best result on each algorithm with the parameters that achieved best result 
are shown in Table 2. Among three performance metrics, F1 is the most important 
measure. Therefore, in this table, we focus on comparing between algorthms based 
on F1. Precision and Recall are used only for reference. In Table 2, the best 
algorithm (the highest value of F1) is printed bold faced and the worst algorithm 
(the lowest value of F1) is printed bold and italic faced. 
Công nghệ thông tin & Cơ sở toán học cho tin học 
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 108 
Table 2. The best results of each algorthm with it parameters. 
Classifier Precision Recall F1-
Score 
Nearest Neighbor (K=4) 0.54 0.52 0.50 
SVM (Radial basic function kernel, Gamma=2, C=1) 0.66 0.64 0.65 
SVM (LinearSVC) 0.64 0.64 0.63 
SVM (Stochastic Gredient Descent) 0.57 0.56 0.56 
Decision Tree (Max Deep=5) 0.54 0.45 0.42 
Random Forest (Max Deep=5, n_estimators=10, 
max_features=1) 
0.45 0.44 0.41 
Neural Network (Multi-layer Perceptron, alpha=1, 
hidden_layer_size=100) 
0.58 0.57 0.56 
Passive Aggressive (C=1.0, n_iter=5) 0.63 0.63 0.62 
Adaptive Boosting (n_estimaters=50) 0.57 0.56 0.56 
 It can be seen from this table that the best result achieved by support vector 
machine with Radial basic function kernel while the worst result obtained by 
random forest. Overall, the results of support vector machine with different kernel 
functions are rather solid. The value of F1 achieved by support vector machine 
with three kernel functions are alway among the best results of all methods. 
Figure 2. Comparison between nine algorithms based on F1-score. 
Nghiên cứu khoa học công nghệ 
Tạp chí Nghiên cứu KH&CN quân sự, Số 48, 04 - 2017 109
The table also shows that tree-based algorithms often did not perform well on this 
problem. Two tree-based algorithms including Decision Tree and Random Forest are 
the worst methods among nine tested methods. Among four left algorithms, we can 
see that the results of passive aggressive is also very good. This algorithm is ranked 
third among nine algorithms while the results of neural network and AdaBoost are 
equal and they are in the midle of the tested algorithms. Finally, the result of Nearest 
Neighbor is also not convincing. This method is only better than two tree-based 
algorthms (Decission Tree and Random Forest). Figure 2 presents in details the 
comparison between nine tested algorithms based on their F1 value. 
5. CONCLUSION 
In this paper, we have examined the performance of various machine learning 
algorithms in solving Vietnamese sentiment analysis problem. Nine popular 
algorithms were selected and tested. A recent released data set for Vietnamese 
sentiment analysis (VLSP 2016 dataset) was used in the experiments. The results 
of the experiments showed that the methods based on support vector machine 
achieved the best performance while tree-based methods (Decision Tree and 
Random Forest) did not perform well. The results also showed that using TF-IDF 
feature is better than using TF feature. These results provide the insight into the 
ability of various machine learning techniques in solving this problem. 
There are a number of research areas for future work, which arise from this 
paper. First, we would like to investigate the better techniques for features 
extraction. In this paper, we have shown that TF-IDF feature is better than IF 
feature. In the future, we will study the methods of word2vec to extract features for 
Vietnamese sentiment analysis. Second, recent research has shown that applying 
deep learning to neural language proceecing has gained a significant improvement. 
Therefore, it will be interesting to examine if deep learning can be usefull for 
Vietnamese sentiment analysis. Last but not least, we are planning to collect and 
conduct our research on a large Vietnamese dataset. 
REFERENCES 
[1]. K. Ravi and V. Ravi, “A survey on opinion mining and sentiment analysis: Tasks, 
approaches and applications” Knowledge-Based Systems, vol. 89, pp. 14, 2015. 
[2]. Le Anh Cuong, Ng. T. Minh Huyen, Ng. Viet Hung, “VLSP 2016 Shared 
Task: Vietnamese Analysis”, VLSP 2016, Ha Noi, 2016. 
[3]. Vi Ngo Van, Minh Hoang Van, Tam Nguyen Thanh: “Sentiment Analysis for 
Vietnamese using Support Vector Machines with application to Facebook”, 
VLSP 2016, Ha Noi, 2016. 
[4]. Hy Nguyen, Tung Le, Viet-Thang Luong, Dinh Dien: “A Simple Supervised 
Learning Approach to Sentiment Classification at VLSP 2016”, VLSP 2016, 
Ha Noi, 2016. 
[5]. Minh Nhat Quang Pham, Tran The Trung: “A Lightweight Ensemble Method 
for Sentiment Classification Task”, VLSP 2016, Ha Noi, 2016. 
[6]. Quynh-Trang Thi Pham, Xuan-Truong Nguyen, Van-Hien Tran, Thi-Cham 
Nguyen, Mai-Vu Tran: “DSKTLAB: Vietnamese Sentiment Analysis for 
Product Reviews”, VLSP 2016, Ha Noi, 2016. 
Công nghệ thông tin & Cơ sở toán học cho tin học 
N. H. Quan, L. D. Son, N. Q. Uy, “An empirical study of Vietnamese sentiment analysis.” 110 
[7]. Rajaraman, A.; Ullman, J. D. (2011). "Data Mining". Mining of Massive 
Datasets (PDF). pp. 1–17. 
[8]. Khoo Khyou Bun; Bun, Khoo Khyou; Ishizuka, M. "Emerging Topic Tracking 
System". Proceedings Third International Workshop on Advanced Issues of E-
Commerce and Web-Based Information Systems. WECWIS 2001. 
 [9]. C. Cortes and V. Vapnik, “Support-vector networks” Mach. Learn., vol. 20, 
no. 3, pp. 273–297, Sep. 1995. 
[10]. Rosenblatt, Frank. x. “Principles of Neurodynamics: Perceptrons and the 
Theory of Brain Mechanisms”. Spartan Books, Washington DC, 1961 
[11].Altman, N. S. (1992). "An introduction to kernel and nearest-neighbor 
nonparametric regression". The American Statistician. 46 (3): 175–185. 
[12]. Rokach, Lior; Maimon, O. (2008). “Data mining with decision trees: theory 
and applications”. World Scientific Pub Co Inc. ISBN 978-9812771711. 
[13]. Rokach, Lior; Maimon, O. (2008). “Data mining with decision trees: theory 
and applications”. World Scientific Pub Co Inc. ISBN 978-9812771711. 
[14]. Ho, Tin Kam (1998). "The Random Subspace Method for Constructing 
Decision Forests" (PDF). IEEE Transactions on Pattern Analysis and 
Machine Intelligence. 20 (8): 832–844. doi:10.1109/34.709601. 
[15]. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, Yoram 
Singer, “Online Passive-Aggressive Algorithms”, School of Computer Science 
and Engineering The Hebrew University Jerusalem, 91904, Israel, 2006 
[16]. Luis Pedro Coelho, Willi Richert, “Building Machine Learning System with 
Python”, ISBN 978-1-78439-277-2, 2015. 
TÓM TẮT 
MỘT NGHIÊN CỨU THỰC NGHIỆM 
VỀ PHÂN TÍCH TÂM LÝ TRONG TIẾNG VIỆT 
Phân tích tâm lý bao gồm các lĩnh vực nghiên cứu về ý kiến, tình cảm, 
đánh giá, thái độ và cảm xúc của con người dựa vào đoạn văn bản. Điều này 
đã trở thành một trong những lĩnh vực nghiên cứu tích cực nhất trong xử lý 
ngôn ngữ tự nhiên. Trong những năm gần đây, phân tích tâm lý cho văn bản 
tiếng Việt đã nhận được sự chú ý đáng kể. Trong hội thảo quốc tế lần thứ tư 
về ngôn ngữ Việt và xử lý tiếng nói, có một cuộc thi giữa các nhà nghiên cứu 
khác nhau trong việc giải quyết vấn đề này trên một tập dữ liệu chuẩn được 
ban tổ chức hội thảo cung cấp. Tuy nhiên, mỗi nhà nghiên cứu giải quyết các 
vấn đề với các phương pháp khác nhau. Trong bài báo này, chúng tôi thể 
hiện sự so sánh của nhiều thuật toán học máy để giải quyết vấn đề trên. Các 
kết quả thí nghiệm giúp hiểu sâu hơn về khả năng của các thuật toán học 
máy khác nhau khi được sử dụng để phân tích tâm lý trong tiếng Việt. 
Từ khóa: Phân tích tâm lý, Khai thác quan điểm, Học máy. 
Nhận bài ngày 17 tháng 02 năm 2017 
Hoàn thiện ngày 04 tháng 4 năm 2017 
Chấp nhận đăng ngày 05 tháng 4 năm 2017 
Address: Military Technical Academy; 
 *Email: 
[email protected]