Description
Tasks to perform
i. Run 5-fold Cross Validation on the training.txt using the 5 learning algorithms.
Report the average-precision, average-recall and average-F1-scores. You can do this
quite easily using the cross_val_score() function in sklearn
In each algorithm, try to explore different settings of the parameters to achieve best
possible results (this step is largely experimental, try to automate as much as
possible). Parameters that you should try to change include
a. In neural networks change the number of hidden layers and number of units in
each layer
b. In SVMs, change the penalty parameter C and the kernel type
c. In Adaboosting change the number of estimators (n_estimators)
d. In Logistic regression change the penalty: L1 regularization that can also perform
feature selection and L2. Also change the regularization strength parameter ( C )
ii. Include some additional knowledge into your model. Specifically, not all words are
useful in predicting rating scores. Words that express sentiments are more likely to
be useful. Use the sentiment words in pos_words.txt and neg_words.txt to filter
words from the review text, and then evaluate the algorithms once again. These
sentiment words are taken from https://www.cs.uic.edu/~liub/FBS/sentimentanalysis.html. What changes did you observe?
iii. Perform evaluation on the test dataset using the optimal parameter settings that
were obtained from the training set. How did each algorithm perform? Report its
precision, recall and f-scores. Which types of reviews were the hardest to predict?
To answer this, for the best performing algorithm, compute precision and recall for
every rating score separately (e.g. rating score 1 precision/recall, rating score 2
precision/recall, etc.).
iv. Discuss some ideas that could help improve your predictions (even if you did not
implement this it is fine)
What to submit on dropbox?
1. A report that describes your experimental results (I would expect this to be around 2
or 3 pages at most)
2. Source code
Policies
Please do not plagiarize from the internet, etc. You can discuss with other groups but please do
not share any code. Plagiarism is treated very seriously by the department and the university
and are subject to harsh penalties.

