Ogmios : a scalable NLP platform for annotate large web document collection 
while NLP tool be now widely available , their use can be problematic considering the lack of homogeneity of their input/output , the granularity variation of the provided information , but also the difficulty to process large amount of document in a reasonable time , and their tunability to a domain . To address these problem , we propose a configurable platform combine NLP tool to enrich very large collection of French and English specialize document . the platform be a modularized and tunable framework . each module carry out an annotation step by use existing NLP tool and can be tune to a domain by add specific resource : name entity recognition , sentence and word segmentation , lemmatisation , po tag , term tag and parse . linguistic annotation be record in a stand-off XML format . To manage very large collection of document , we focus on the robustness of the annotation process by distribute the process on several machine . 
in the Alvis project ( www.alvis.info/alvis ) , we have test the scalability of the platform on two collection of @card@ biomedical web document ( @card@ million of word ) and @card@ Search Engine News ( @card@ million of word ) with @card@ computer . the collection have be annotate until the term tag , respectively in @card@ hour and 3 hour . 