I have made a couple of sentence examples, of desired system output, which sum up the problem:
Friday, May 21, 2010
I believe the language model of finance to be inherently different to other domains (which is rather justified, as it is a unique language domain - with its own vocabulary etc). I propose a language model based on a combination of several factors, including entities (companies, CEOs, management), financial terms (net income, profit, price etc), industry specific terms (technology, resources - iron ore, aluminium etc), quantitative values ($ million, $ billion, per cent), directions (positive - rise, increase, outperform and negative - decrease, decline, fall) and general sentiment of regular english words (SentiWordNet - good, bad, successful, poor). Combinations of these are expected to have a significant impact on the polarity of the news abstracts (which are annotated). The reason why direction and values are included specifically is the sentiment of these abstracts is largely based on the amount (larger increases % wise, bumper profits of $100 milllion). Of course, as is the case in any natural language, the general rule always comes with exceptions. In addition, the subject matter of each abstract is typically on one subject matter (takeover, reports, government, resource etc).