Showing posts with label corpora. Show all posts
Showing posts with label corpora. Show all posts
Wednesday, May 19, 2010
SIRCA
I have been able to contact Diccon Close at SIRCA, who is able to supply access to some much needed ASX Equities information, including full order book and tick information. Access to this would be critical to evaluate the real-world performance of sentiment indicators provided by the system. SIRCA also has distribution rights to Reuters corpora. I've been thinking that it may be a good idea to crawl parts of this Reuters corpus, to incorporate more recent information, even if only to obtain tf-idf scores for equities news. Also, in the original corpus, time stamps have been eradicated which make it difficult to see exact timing of information release vs. stock price.
Labels:
corpora,
reuters,
sirca,
tf-idf,
tick information
Tuesday, December 8, 2009
Meeting Minutes
Meeting with Rafael Calvo, Robert Dale
- Academic Paper: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews, P.D Turney, July 2002
- Much work has been done in the past on sentiment in product/movie reviews, but these may not transfer to the domain of finance, and some finance domain knowledge is necessary in analysis.
- AFR Titles may tend to be more extreme (sensationalized) in relation to their actual content
- Necessary to find sources of positive/negative related terms, and how well they correlate with the data set/annotations and vise versa.
- May not be able to be done on word frequency count alone (taking out company name, terms such as the, a etc).
- Could use synonym trees to help with the large term set (lexical database such as WordNet)
- Found information on some research done in this area by Sirca
- Interesting abstract and presentation given by C Robertson: "Enabling Sophisticated Financial Text Mining"
Subscribe to:
Comments (Atom)

