I have made a couple of sentence examples, of desired system output, which sum up the problem:
Friday, May 21, 2010
Sentence Examples
I believe the language model of finance to be inherently different to other domains (which is rather justified, as it is a unique language domain - with its own vocabulary etc). I propose a language model based on a combination of several factors, including entities (companies, CEOs, management), financial terms (net income, profit, price etc), industry specific terms (technology, resources - iron ore, aluminium etc), quantitative values ($ million, $ billion, per cent), directions (positive - rise, increase, outperform and negative - decrease, decline, fall) and general sentiment of regular english words (SentiWordNet - good, bad, successful, poor). Combinations of these are expected to have a significant impact on the polarity of the news abstracts (which are annotated). The reason why direction and values are included specifically is the sentiment of these abstracts is largely based on the amount (larger increases % wise, bumper profits of $100 milllion). Of course, as is the case in any natural language, the general rule always comes with exceptions. In addition, the subject matter of each abstract is typically on one subject matter (takeover, reports, government, resource etc).
Subscribe to:
Post Comments (Atom)
This is great separation various chucks in financial sentences. Recently ran some experiments on movie reviews and gained some reasonably better accuracy by using prefix as features. Though tried with BoWs, bi-tri gram on relative frequncy and with presence/absence of feature vectors. Do you have any data on this or is there a published paper?
ReplyDelete