Sunday, November 6, 2011
List of Japanese NLP tools
I haven't tried out all of these so I don't have comments for everything, but hopefully this list will come in useful for someone.
Itadaki: a Japanese processing module for OpenOffice. I've done a tiny bit of work and issue documentation on a fork here, and someone forked that to work with a Japanese/German dictionary here.
GoSen: Uses sen as a base, and is part of Itadaki; a pure Java version of ChaSen. See my previous post on where to download it from.
MeCab: This page also contains a comparison of MeCab, ChaSen, JUMAN, and Kakasi.
ChaSen
JUMAN
Cabocha: Uses support vector machines for morphological and dependency structure analysis.
Gomoku
Igo
Kuromoji: Donated to Apache and used in Solr. Looks nice.
Hypermedia Corpus
TüBa-J/S: Japanese treebank from universityu of Tübingen. Not as heavily annotated as I'd hoped. You have to send them an agreement to download it, but it's free.
GSK: Not free, but very cheap.
LDC: Expensive unless your institution is a member
Kakasi: Gives readings for kanji compounds.
WordNet: Stil under development by NiCT. The sense numbers are cross-indexed with those in the English WordNet, so it could be useful for translation. Also, there are no verb frames like there are in English.
LCS Database: From Okayama University
Framenet: Unfortunately you can only do online browsing.
Chakoshi: Online collocation search engine.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment