Sunday, September 6, 2015

Conference Sweep: SA@NAACL 2015

What is a "Conference Sweep"?

Conference Sweep is a practice that has proven to be VERY useful to me. The general process I follow is:
1) Select a conference
2) Download all relevant papers from the conference. In my case, Sentiment Analysis
3) Read them carefully. Don't get stuck on details - but read it enough to get a reasonable understanding of the paper.
4) Write your findings in a document.

In other words, I go on these crazy gobbling up sessions often where I read all sentiment analysis papers that appeared in a conference. The output is a conference sweep document. This way, each time I want to go through past work, the document acts as a useful reference to look up.


Here's my "conference sweep" for SA@NAACL 2015.

1) LCCT:    A    Semi-­supervised    Model    for    Sentiment    Classification;    Yang    et    al.

Corpus­-based approaches to sentiment analysis are based on classifiers, while lexicon­-based approaches are rule­-based implementations. This paper combines the two using co­-training. The rule-based approach uses a lexicon that is created using a sentiment-­aware LDA model. The corpus­-based approach uses deep learning to create a classifier. The co­-training proceeds as follows: Use some labeled documents to train approach 1, get the most confident test examples. Similarly for approach 2, and then re­train. Three English and Chinese datasets are used for experimentation – and they do better than several past works.

2)    Benchmarking Machine Translated Sentiment Analysis for Arabic Tweets; Refaee et al.

This looks like an early paper in Arabic sentiment analysis. SA of Arabic tweets is performed by using a MT tool, followed by testing on a English­ trained sentiment classifier. They compare with different approaches: using Arabic lexicons, using English­ translated Arabic lexicons, etc. Lots of graphs, tables and examples.

3)    Sub-sentential Sentiment on a Shoestring: A Cross-lingual Analysis of Compositional Classification; Michael Haas et al.

Compositional sentiment is how sentiment of larger phrases is composed of sentiment of smaller phrases. This paper is based on the classic deep learning approach by Socher et al (2013). They create a German Sentiment Treebank, Hiedel Sentiment Treebank. They translate it, and align words using a word alignment tool. Finally, they perform cross­lingual sentiment analysis using an English­ trained system.

4)    Sentiment after Translation:A Case­ Study on Arabic Social Media Posts; Saif Mohammed et al.

Cross­lingual Sentiment analysis assumes that sentiment is preserved across languages. This paper validates if this holds true for English­-Arabic – also in terms of the degree of sentiment. It compares different automatic and manual methods of translation.

5)    On the Automatic Learning of Sentiment Lexicons;Severyn et al.

The paper learns sentiment lexicons automatically as follows: (a) Obtain tweets with hashtag­-based supervision for sentiment. (b) Learn a SVM with unigrams and bigrams as features. (c) Use weights associated with unigrams and bigrams – as the sentiment lexicon and the scores! The lexicons thus generated are combined with other manual lexicons, in a rule­-based SA system.

6) Do We Really Need Lexical Information? Towards a Top­down Approach to Sentiment Analysis of Product Reviews; Otmakhowa et al.

This is my favorite paper from NAACL 2015. The paper says that, given the rating of a document and knowledge of discourse markers, sentence­-level sentiment classification can be achieved. This paper takes into consideration things like 'flow of sentiment within a document'. They use a CRF where each sentence is a variable, and the goal is to identify the sentiment of a sentence. Sentences are connected to each other using discourse connectors.

7)MPQA 3.0: An Entity/Event­Level Sentiment Corpus; Janyce Wiebe et al. 
MPQA is a popular sentiment corpus where annotations for sentences are organized in a nested structure. This short paper introduces MPQA 3.0, gives examples, and as expected, presents agreement studies. The new addition to the corpus is that now they capture event target and entity target in their annotation. This, they hope, will be useful for target­-specific ('Australia beat Sri Lanka to win the World Cup') or event­-specific ('The School Principal condemned the sale of cigarettes on the school premises').

8) Entity/Event­Level Sentiment Detection and Inference; Lingjia Deng. 
This paper uses annotations provided in MPQA 3.0, and implements an entity/event­ level SA system. The rules are inference ­based and look a lot like Prolog style inference rules.

General    trends:

1)    Two papers on Arabic sentiment analysis

2)    (7)and(8)share an author, and the papers are closely related.

3)    Cross­lingual SA seems to be the flavor of SA papers in NAACL.

Note: These observations are my personal impressions. Please form your own before you trust these notes completely. 

The opening post

Sentiment analysis is the task of predicting opinion/sentiment in text

My exploration in sentiment analysis and opinion mining began in 2009 and continues till date. It is a fascinating field that touches three areas of research: psychology, linguistics and computer science. In addition to useful applications, sentiment analysis also finds its roots firmly into the classic Turing test presented by Alan Turing in 1950.

I start this blog with multiple intentions:
1) Publish my notes on papers that I read. It's been for ever that I wanted one place to put it - and find it.
2) Put out fundamental content on sentiment analysis and opinion mining.


Aditya Madhav Joshi