Resource type
Corpora
Description
COAH is a corpora of hotel reviews for polarity classification tasks at document level. The corpus is composed by 1816 reviews from TripAdvisor, which are scored on a scale from 1 (negative) to 5 (positive). The number of opinions per each class is:
| Number of opinions | 1816 |
| Number of tokens | 272446 |
| Number of words | 239749 |
| Number of unique words | 154297 |
| Lexical diversity | 0,6435 |
| Number of characters | 1372737 |
| Number of characters without whitespaces | 1135306 |
| Number of nouns | 55530 |
| Number of verbs | 40318 |
| Number of adjectives | 19935 |
| Number of adverbs | 16629 |
| Number of lemmas | 239749 |
| Número de lemas únicos | 138549 |
| Lemmas diversity | 0,577 |
| Number of senses | 106205 |
| Number of unique senses | 77397 |
| Mean length of sentences | 23,245 |
| Mean of nouns | 0,231 |
| Mean of verbs | 0,168 |
| Mean of adjectives | 0.083 |
| Mean of adverbs | 0.069 |
How to cite
Molina-González, M. D., Martínez-Cámara, E., Martín-Valdivia, M. T., Ureña-López, L. A. (2014). Cross-domain sentiment analysis using spanish opinionated words. Natural Language Processing and Information Systems, Lecture Notes in Computer Science, vol. 8455, pp. 214-219. Springer International Publishing. DOI: 10.1007/978-3-319-07983-7_28
For any questions on the corpus sends an email to M. Dolores Molina or Eugenio Martínez
Enlace