Jakarta Luceneのインストール
戻る
Lucene+JapaneseAnalyzerで全文検索プログラムを作るには以下のファイルが必要です。
■Apache Ant
■Perl
■sen
■Lucene-ja
/**
* 事前準備
*/
//////////////////////
// 1.Ant
//////////////////////
$ which ant
/cygdrive/d/apache-ant-1.6.0/bin/ant
//////////////////////
// 2.Perl
//////////////////////
$ which perl
/usr/local/bin/perl
//////////////////////
// 3.sen
//////////////////////
下記サイトよりダウンロード
https://sen.dev.java.net/servlets/ProjectDocumentList?folderID=755&expandFolder=755&folderID=0
$ wget https://sen.dev.java.net/files/documents/1373/31864/sen-1.2.2.1.zip
$ ls -l sen-1.2.2.1.zip
-rw-r--r-- 1 KIOSK mkgroup-l-d 468390 Dec 9 07:58 sen-1.2.2.1.zip
$ unzip sen-1.2.2.1.zip
Archive: sen-1.2.2.1.zip
creating: sen-1.2.2.1/
creating: sen-1.2.2.1/.settings/
creating: sen-1.2.2.1/bin/
creating: sen-1.2.2.1/conf/
creating: sen-1.2.2.1/demo/
creating: sen-1.2.2.1/dic/
creating: sen-1.2.2.1/docs/
creating: sen-1.2.2.1/docs/api/
creating: sen-1.2.2.1/docs/api/class-use/
creating: sen-1.2.2.1/docs/api/net/
creating: sen-1.2.2.1/docs/api/net/java/
creating: sen-1.2.2.1/docs/api/net/java/sen/
creating: sen-1.2.2.1/docs/api/net/java/sen/class-use/
creating: sen-1.2.2.1/docs/api/net/java/sen/io/
〜以下省略〜
//////////////////////
// 4.Lucene-ja
//////////////////////
下記サイトよりダウンロード
https://sen.dev.java.net/servlets/ProjectDocumentList?folderID=755&expandFolder=755&folderID=0
$ wget https://sen.dev.java.net/files/documents/1373/11260/lucene-ja-1.4.3sen1.2-2.zip
$ ls -l lucene-ja-1.4.3sen1.2-2.zip
-rw-r--r-- 1 KIOSK mkgroup-l-d 1002447 Feb 8 2005 lucene-ja-1.4.3sen1.2-2.zip
$ unzip lucene-ja-1.4.3sen1.2-2.zip
Archive: lucene-ja-1.4.3sen1.2-2.zip
creating: lucene-ja/
creating: lucene-ja/bin/
inflating: lucene-ja/bin/mkhtmlindex.bat
inflating: lucene-ja/bin/mkhtmlindex.sh
inflating: lucene-ja/bin/mktextindex.bat
inflating: lucene-ja/bin/mktextindex.sh
inflating: lucene-ja/bin/search.bat
inflating: lucene-ja/bin/search.sh
inflating: lucene-ja/bin/simplelog.properties
creating: lucene-ja/docs-ja/
inflating: lucene-ja/docs-ja/demo.html
inflating: lucene-ja/docs-ja/demo2.html
inflating: lucene-ja/docs-ja/demo3.html
inflating: lucene-ja/docs-ja/gettingstarted.html
inflating: lucene-ja/docs-ja/index.html
inflating: lucene-ja/docs-ja/powered.html
inflating: lucene-ja/docs-ja/resources.html
inflating: lucene-ja/docs-ja/whoweare.html
creating: lucene-ja/lib/
inflating: lucene-ja/lib/commons-logging.jar
inflating: lucene-ja/lib/lucene-1.4.3.jar
inflating: lucene-ja/lib/lucene-demos-1.4.3.jar
inflating: lucene-ja/lib/lucene-ja.jar
inflating: lucene-ja/lib/sen.jar
inflating: lucene-ja/LICENSE.txt
inflating: lucene-ja/lucene-ja-src.jar
inflating: lucene-ja/readme.txt
creating: lucene-ja/webapp/
inflating: lucene-ja/webapp/configuration.jsp
inflating: lucene-ja/webapp/footer.jsp
inflating: lucene-ja/webapp/header.jsp
inflating: lucene-ja/webapp/index.jsp
inflating: lucene-ja/webapp/results.jsp
creating: lucene-ja/webapp/WEB-INF/
creating: lucene-ja/webapp/WEB-INF/lib/
inflating: lucene-ja/webapp/WEB-INF/lib/commons-logging.jar
inflating: lucene-ja/webapp/WEB-INF/lib/lucene-1.4.3.jar
inflating: lucene-ja/webapp/WEB-INF/lib/lucene-demos-1.4.3.jar
inflating: lucene-ja/webapp/WEB-INF/lib/lucene-ja.jar
inflating: lucene-ja/webapp/WEB-INF/lib/sen.jar
////////////////////////////////
// 5.senのインストール
////////////////////////////////
■ senの辞書ディレクトリに移動します
$ pwd
/cygdrive/d/MyDevelopment/Lucene/sen-1.2.2.1/dic
$ ls
build.xml compound.pl dictionary.properties ipa2mecab.pl
$ ant -Dperl.bin=/usr/local/bin/perl
cygwinでやると失敗したので、DOSプロンプトから、以下のように実行
D:\MyDevelopment\Lucene\sen-1.2.2.1\dic>ant -Dperl.bin=D:\perl\bin\perl.exe
Buildfile: build.xml
prepare-proxy:
prepare-archive:
prepare-dics0:
prepare-dics:
download:
melt:
prepare:
dics0:
[exec] ipadic-2.6.0/Adj.dic ...
[exec] ipadic-2.6.0/Adnominal.dic ...
[exec] ipadic-2.6.0/Adverb.dic ...
[exec] ipadic-2.6.0/Auxil.dic ...
[exec] ipadic-2.6.0/Conjunction.dic ...
[exec] ipadic-2.6.0/Filler.dic ...
[exec] ipadic-2.6.0/Interjection.dic ...
[exec] ipadic-2.6.0/Noun.adjv.dic ...
[exec] ipadic-2.6.0/Noun.adverbal.dic ...
[exec] ipadic-2.6.0/Noun.demonst.dic ...
[exec] ipadic-2.6.0/Noun.dic ...
[exec] ipadic-2.6.0/Noun.nai.dic ...
[exec] ipadic-2.6.0/Noun.name.dic ...
[exec] ipadic-2.6.0/Noun.number.dic ...
[exec] ipadic-2.6.0/Noun.org.dic ...
[exec] ipadic-2.6.0/Noun.others.dic ...
[exec] ipadic-2.6.0/Noun.place.dic ...
[exec] ipadic-2.6.0/Noun.proper.dic ...
[exec] ipadic-2.6.0/Noun.verbal.dic ...
[exec] ipadic-2.6.0/Others.dic ...
[exec] ipadic-2.6.0/Postp-col.dic ...
[exec] ipadic-2.6.0/Postp.dic ...
[exec] ipadic-2.6.0/Prefix.dic ...
[exec] ipadic-2.6.0/Suffix.dic ...
[exec] ipadic-2.6.0/Symbol.dic ...
[exec] ipadic-2.6.0/Verb.dic ...
create:
[java] [INFO] MkSenDic - (1/7): reading connection matrix ...
[java] [INFO] MkSenDic - connection file = connect.csv
[java] [INFO] MkSenDic - charset = EUC_JP
[java] [INFO] MkSenDic - (2/7): building type dictionary ...
[java] [INFO] MkSenDic - (3/7): writing conection matrix (5 x 1281 x 701 = 4489905) ...
[java] [INFO] MkSenDic - (4/7): reading morpheme information ...
[java] [INFO] MkSenDic - load dic: dic.csv
[java] [INFO] MkSenDic - 50000...
[java] [INFO] MkSenDic - 100000...
[java] [INFO] MkSenDic - 150000...
[java] [INFO] MkSenDic - 200000...
[java] [INFO] MkSenDic - 250000...
[java] [INFO] MkSenDic - 300000...
[java] [INFO] MkSenDic - 350000...
[java] [INFO] MkSenDic - (5/7): sorting lex...
[java] [INFO] MkSenDic - (6/7): writing token...
[java] [INFO] MkSenDic - key size = 378227
[java] [INFO] MkSenDic - (7/7): building Double-Array (size = 325254) ...
[java] [INFO] DoubleArrayTrie - save time = 0.571[s]
[java] [INFO] MkSenDic - total time = 375[ms]
BUILD SUCCESSFUL
Total time: 7 minutes 39 seconds
※cygwinで失敗した理由はbuild.xmlでperlに関わる部分をチェック
$ grep -n perl build.xml
11: <property name="perl.bin" value="c:/usr/cygwin/bin/perl.exe"/>
13: <property name="perl.bin" value="/usr/bin/perl"/>
88: <exec executable="${perl.bin}">
136: <exec executable="${perl.bin}">
単にperlのPATHがずれていた、/usr/local/bin/perlに訂正して、単純に下記を実行
$ ant
Buildfile: build.xml
prepare-proxy:
prepare-archive:
prepare-dics0:
prepare-dics:
download:
melt:
prepare:
dics0:
create:
[java] [INFO] MkSenDic - (1/7): reading connection matrix ...
[java] [INFO] MkSenDic - connection file = connect.csv
[java] [INFO] MkSenDic - charset = EUC_JP
[java] [INFO] MkSenDic - (2/7): building type dictionary ...
[java] [INFO] MkSenDic - (3/7): writing conection matrix (5 x 1281 x 701 = 4489905) ...
[java] [INFO] MkSenDic - (4/7): reading morpheme information ...
[java] [INFO] MkSenDic - load dic: dic.csv
[java] [INFO] MkSenDic - 50000...
[java] [INFO] MkSenDic - 100000...
[java] [INFO] MkSenDic - 150000...
[java] [INFO] MkSenDic - 200000...
[java] [INFO] MkSenDic - 250000...
[java] [INFO] MkSenDic - 300000...
[java] [INFO] MkSenDic - 350000...
[java] [INFO] MkSenDic - (5/7): sorting lex...
[java] [INFO] MkSenDic - (6/7): writing token...
[java] [INFO] MkSenDic - key size = 378227
[java] [INFO] MkSenDic - (7/7): building Double-Array (size = 325254) ...
[java] [INFO] DoubleArrayTrie - save time = 0.811[s]
[java] [INFO] MkSenDic - total time = 399[ms]
BUILD SUCCESSFUL
Total time: 6 minutes 42 seconds
今度は成功!
////////////////////////////////
// 6.luceneのインストール
////////////////////////////////
ダウンロードして解凍していますので
特に作業は不要です。
戻る