Sunday, November 6, 2011
List of Japanese NLP tools
I haven't tried out all of these so I don't have comments for everything, but hopefully this list will come in useful for someone.
Itadaki: a Japanese processing module for OpenOffice. I've done a tiny bit of work and issue documentation on a fork here, and someone forked that to work with a Japanese/German dictionary here. 
GoSen: Uses sen as a base, and is part of Itadaki; a pure Java version of ChaSen. See my previous post on where to download it from. 
MeCab: This page also contains a comparison of MeCab, ChaSen, JUMAN, and Kakasi. 
ChaSen 
JUMAN 
Cabocha: Uses support vector machines for morphological and dependency structure analysis. 
Gomoku 
Igo 
Kuromoji: Donated to Apache and used in Solr. Looks nice. 
Hypermedia Corpus 
TüBa-J/S: Japanese treebank from universityu of Tübingen. Not as heavily annotated as I'd hoped. You have to send them an agreement to download it, but it's free. 
GSK: Not free, but very cheap. 
LDC: Expensive unless your institution is a member 
Kakasi: Gives readings for kanji compounds. 
WordNet: Stil under development by NiCT. The sense numbers are cross-indexed with those in the English WordNet, so it could be useful for translation. Also, there are no verb frames like there are in English. 
LCS Database: From Okayama University 
Framenet: Unfortunately you can only do online browsing. 
Chakoshi: Online collocation search engine. 
Subscribe to:
Post Comments (Atom)
 
No comments:
Post a Comment