Lucene学习总结

susiya

浏览: 89168 次
性别:
来自: 上海

最近访客更多访客>>

我素熊猫

xiaoyoue

dongzhumao86

William.K

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Java

一、Lucene 原理

Lucene是一个高效的，基于Java的全文检索库。

在生活中会遇到各种各样的数据, 而数据可以概括为两种, 结构化数据和非结构化数据。
1、结构化数据指具有固定格式或有限长度的数据，如数据库，元数据等。
2、非机构化数据指指不定长或无固定格式的数据，如邮件，word文档等。

当我们需要全文检索某个信息，主要有两种方法：
a. 顺序扫描法(Serial Scanning)：一个一个文件的全文搜索，这种当然是很慢的了。
b. 通过索引查找法：通过对非结构数据进行重新组织，建立索引，再对索引进行查找。而Lucene 就是用的这个原理进行全文检索。

Lucene的全文检索大体分两个过程，索引创建(Indexing)和搜索索引(Search)。
a.索引创建：将现实世界中所有的结构化和非结构化数据提取信息，创建索引的过程。
b.搜索索引：就是得到用户的查询请求，搜索创建的索引，然后返回结果的过程。

下面这幅图来自《Lucene in action》，描述了Lucene 的检索过程和全文检索的一般过程。

参考自：http://www.cnblogs.com/forfuture1978/archive/2010/06/13/1757479.html

二、Lucene例子
下面是建立索引和查找文件的简单例子
a.建立索引

    private static void indexFiles() throws IOException {
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        config.setOpenMode(OpenMode.CREATE_OR_APPEND);
        
        Directory dir = FSDirectory.open(Paths.get("C:\\shuxiang\\tmp\\lucene6"));
        IndexWriter writer = new IndexWriter(dir, config);
        Document doc = new Document();
        Path path = Paths.get("C:\\shuxiang\\tmp\\Edit5");
        InputStream newInputStream = Files.newInputStream(path);
        InputStreamReader inputStreamReader = new InputStreamReader(newInputStream, StandardCharsets.UTF_8);
        Field pathField = new StringField("path", path.toString(), Field.Store.YES);
        doc.add(pathField);
        TextField field = new TextField("contents", new BufferedReader(inputStreamReader));
        doc.add(field);
        writer.addDocument(doc);
        writer.close();
        
    }

b. 查找文件

    private static void searchFile() throws IOException, ParseException {
        Directory dir = FSDirectory.open(Paths.get("C:\\shuxiang\\tmp\\lucene6"));
        IndexReader reader = DirectoryReader.open(dir);
        IndexSearcher searcher = new IndexSearcher(reader);
        
        Analyzer analyzer = new StandardAnalyzer();
        QueryParser parser = new QueryParser("contents", analyzer);
        Query query = parser.parse("92646KHJ4");
        System.out.println("Searching for: " + query.toString("contents"));
        
        TopDocs topDocs = searcher.search(query, 100000);
        System.out.println(topDocs.totalHits + " total matching documents");
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            Document hitDoc = searcher.doc(scoreDoc.doc);
            System.out.println("hit:" + hitDoc.get("path"));
        }
    }

Lucene官网有两个很好的例子，如下：
https://lucene.apache.org/core/6_2_1/demo/src-html/org/apache/lucene/demo/IndexFiles.html

https://lucene.apache.org/core/6_2_1/demo/src-html/org/apache/lucene/demo/SearchFiles.html

分享到：

Java 单例模式 | java 多线程总结

2016-10-19 14:40
浏览 690
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论