Resumo:
Social networks are one of the largest means of communication currently, within which
are generated all kinds of information that may be related to people, events, places and
various other factors. Users within social networks express their opinions freely, leaving
their personality and preferences exposed to the world. Therefore, sentiment analysis
in social networks is becoming more frequent, since this type of information may be
important for a company in discovering the preferences of its customers in relation to
their products and services. The study proposes an approach to estimate sentiment in
social networks for the portuguese language, focusing on Twitter; the method uses a
machine learning algorithm approach that is called a committee, in which it combines
the prediction of a set of six algorithms and defines the predicted value as the most
voted among the algorithms, considering that the votes of the algorithms have weight.
To perform this process, some tests are performed with a database of Portuguese tweets
already labeled with the classes: negative (-1), neutral (0) and positive (1). To evaluate
the performance of the techniques, the following performance metrics were used: accuracy,
precision, recall, f1-score and error, the test base was classified using the algorithms and
the results were analyzed according to the metrics proposed individually. In addition,
sentiment analysis services available on the market were also tested, IBM Watson and
Microsoft Text Analytics. The proposed method obtained an accuracy of approximately
86 % being superior to others in this respect. The next step was to perform the statistical
analysis using some techniques, in order to verify if the proposed method has statistical
difference for the other approaches presented, so it was concluded that the method has
difference only for the following techniques: decision tree, IBM Watson and Microsoft
Text Analytics, therefore demonstrating being statistically equivalent; the results of these
tests were crucial to determine the significant differences of the method proposed for other
techniques.