Linguistics Miscellany: List of Japanese NLP tools

I haven't tried out all of these so I don't have comments for everything, but hopefully this list will come in useful for someone.

Morphological analyzers/tokenizers

Itadaki: a Japanese processing module for OpenOffice. I've done a tiny bit of work and issue documentation on a fork here, and someone forked that to work with a Japanese/German dictionary here.

GoSen: Uses sen as a base, and is part of Itadaki; a pure Java version of ChaSen. See my previous post on where to download it from.

MeCab: This page also contains a comparison of MeCab, ChaSen, JUMAN, and Kakasi.

ChaSen

JUMAN

Cabocha: Uses support vector machines for morphological and dependency structure analysis.

Gomoku

Igo

Kuromoji: Donated to Apache and used in Solr. Looks nice.

Corpora

Hypermedia Corpus

TüBa-J/S: Japanese treebank from universityu of Tübingen. Not as heavily annotated as I'd hoped. You have to send them an agreement to download it, but it's free.

GSK: Not free, but very cheap.

LDC: Expensive unless your institution is a member

Other lexical resources

Kakasi: Gives readings for kanji compounds.

WordNet: Stil under development by NiCT. The sense numbers are cross-indexed with those in the English WordNet, so it could be useful for translation. Also, there are no verb frames like there are in English.

LCS Database: From Okayama University

Framenet: Unfortunately you can only do online browsing.

Chakoshi: Online collocation search engine.