Reading 'Sentiment Analysis'

What is a "Conference Sweep"?

Conference Sweep is a practice that has proven to be VERY useful to me. The general process I follow is:
1) Select a conference
2) Download all relevant papers from the conference. In my case, Sentiment Analysis
3) Read them carefully. Don't get stuck on details - but read it enough to get a reasonable understanding of the paper.
4) Write your findings in a document.

In other words, I go on these crazy gobbling up sessions often where I read all sentiment analysis papers that appeared in a conference. The output is a conference sweep document. This way, each time I want to go through past work, the document acts as a useful reference to look up.

=====

Here's my "conference sweep" for SA@NAACL 2015.

1) LCCT:    A    Semi-supervised    Model    for    Sentiment    Classification;    Yang    et    al.

Corpus-based approaches to sentiment analysis are based on classifiers, while lexicon-based approaches are rule-based implementations. This paper combines the two using co-training. The rule-based approach uses a lexicon that is created using a sentiment-aware LDA model. The corpus-based approach uses deep learning to create a classifier. The co-training proceeds as follows: Use some labeled documents to train approach 1, get the most confident test examples. Similarly for approach 2, and then retrain. Three English and Chinese datasets are used for experimentation – and they do better than several past works.

2)    Benchmarking Machine Translated Sentiment Analysis for Arabic Tweets; Refaee et al.

This looks like an early paper in Arabic sentiment analysis. SA of Arabic tweets is performed by using a MT tool, followed by testing on a English trained sentiment classifier. They compare with different approaches: using Arabic lexicons, using English translated Arabic lexicons, etc. Lots of graphs, tables and examples.

3)    Sub-sentential Sentiment on a Shoestring: A Cross-lingual Analysis of Compositional Classification; Michael Haas et al.

Compositional sentiment is how sentiment of larger phrases is composed of sentiment of smaller phrases. This paper is based on the classic deep learning approach by Socher et al (2013). They create a German Sentiment Treebank, Hiedel Sentiment Treebank. They translate it, and align words using a word alignment tool. Finally, they perform crosslingual sentiment analysis using an English trained system.

4)    Sentiment after Translation:A Case Study on Arabic Social Media Posts; Saif Mohammed et al.

Crosslingual Sentiment analysis assumes that sentiment is preserved across languages. This paper validates if this holds true for English-Arabic – also in terms of the degree of sentiment. It compares different automatic and manual methods of translation.

5)    On the Automatic Learning of Sentiment Lexicons;Severyn et al.

The paper learns sentiment lexicons automatically as follows: (a) Obtain tweets with hashtag-based supervision for sentiment. (b) Learn a SVM with unigrams and bigrams as features. (c) Use weights associated with unigrams and bigrams – as the sentiment lexicon and the scores! The lexicons thus generated are combined with other manual lexicons, in a rule-based SA system.

6) Do We Really Need Lexical Information? Towards a Topdown Approach to Sentiment Analysis of Product Reviews; Otmakhowa et al.

This is my favorite paper from NAACL 2015. The paper says that, given the rating of a document and knowledge of discourse markers, sentence-level sentiment classification can be achieved. This paper takes into consideration things like 'flow of sentiment within a document'. They use a CRF where each sentence is a variable, and the goal is to identify the sentiment of a sentence. Sentences are connected to each other using discourse connectors.

7)MPQA 3.0: An Entity/EventLevel Sentiment Corpus; Janyce Wiebe et al.
MPQA is a popular sentiment corpus where annotations for sentences are organized in a nested structure. This short paper introduces MPQA 3.0, gives examples, and as expected, presents agreement studies. The new addition to the corpus is that now they capture event target and entity target in their annotation. This, they hope, will be useful for target-specific ('Australia beat Sri Lanka to win the World Cup') or event-specific ('The School Principal condemned the sale of cigarettes on the school premises').

8) Entity/EventLevel Sentiment Detection and Inference; Lingjia Deng.
This paper uses annotations provided in MPQA 3.0, and implements an entity/event level SA system. The rules are inference based and look a lot like Prolog style inference rules.

General    trends:

1)    Two papers on Arabic sentiment analysis

2)    (7)and(8)share an author, and the papers are closely related.

3)    Crosslingual SA seems to be the flavor of SA papers in NAACL.

Note: These observations are my personal impressions. Please form your own before you trust these notes completely.

Reading 'Sentiment Analysis'

Sunday, September 6, 2015

Conference Sweep: SA@NAACL 2015

The opening post

Sentiment analysis is the task of predicting opinion/sentiment in text