一、概述
Tantivy是一个全文搜索引擎库,灵感来自Apache Lucene,用Rust编写。
如果你正在寻找Elasticsearch或Apache Solr的替代品,请查看我们基于Tantivy构建的分布式搜索引擎Quiuckwit。
Tantivy更接近Apache Lucene,而不是Elasticsearch或Apache Solr,因为它不是现成的搜索引擎服务器,而是一个可用于构建此类搜索引擎的库。
Tantivy的性能非常棒,请看下图:
二、特征
- 全文搜索
- 可配置的标记器(可用于 17种拉丁语言的词干提取),并支持第三方对中文(tantivy-jieba和cang-jie)、日语(lindera、Vaporetto和tantivy-tokenizer-tiny-segmenter)和韩语(lindera+ lindera-ko-dic-builder)的支持
- 快速(查看🐎 ✨基准✨ 🐎)
- 启动时间极短 (<10ms),非常适合命令行工具
- BM25 评分(与 Lucene 相同)
- 自然查询语言(例如(michael AND jackson) OR “king of pop”)
- 短语查询搜索(例如"michael jackson")
- 增量索引
- 多线程索引(在我的桌面上索引英文维基百科只需不到 3 分钟)
- Mmap 目录
- 当平台/CPU 包含 SSE2 指令集时,SIMD 整数压缩
- 单值和多值 u64、i64 和 f64 快速字段(相当于 Lucene 中的 doc 值)
- &[u8]快速场
- 文本、i64、u64、f64、日期、ip、bool 和分层方面字段
- 压缩文档存储(LZ4、Zstd、None)
- 范围查询
- 分面搜索
- 可配置索引(可选词频和位置索引)
- JSON 字段
- 聚合收集器:直方图、范围桶、平均值和统计指标
- 带删除的 LogMergePolicy
- 搜索器预热 API
- 带有马的俗气标志
注意:分布式搜索超出了 Tantivy 的范围,但如果您正在寻找此功能,请查看Quickwit。
三、Tanvity的小示例
usetantivy::collector::TopDocs;usetantivy::query::QueryParser;usetantivy::schema::*;usetantivy::{doc,Index,IndexWriter,ReloadPolicy};usetempfile::TempDir;fnmain()->tantivy::Result<()>{let index_path =TempDir::new()?;letmut schema_builder =Schema::builder();
schema_builder.add_text_field("title",TEXT|STORED);
schema_builder.add_text_field("body",TEXT);let schema = schema_builder.build();let index =Index::create_in_dir(&index_path, schema.clone())?;letmut index_writer:IndexWriter= index.writer(50_000_000)?;let title = schema.get_field("title").unwrap();let body = schema.get_field("body").unwrap();letmut old_man_doc =TantivyDocument::default();
old_man_doc.add_text(title,"The Old Man and the Sea");
old_man_doc.add_text(
body,"He was an old man who fished alone in a skiff in the Gulf Stream and he had gone \
eighty-four days now without taking a fish.",);
index_writer.add_document(old_man_doc)?;
index_writer.add_document(doc!(
title =>"Of Mice and Men",
body =>"A few miles south of Soledad, the Salinas River drops in close to the hillside \
bank and runs deep and green. The water is warm too, for it has slipped twinkling \
over the yellow sands in the sunlight before reaching the narrow pool. On one \
side of the river the golden foothill slopes curve up to the strong and rocky \
Gabilan Mountains, but on the valley side the water is lined with trees—willows \
fresh and green with every spring, carrying in their lower leaf junctures the \
debris of the winter’s flooding; and sycamores with mottled, white, recumbent \
limbs and branches that arch over the pool"))?;
index_writer.add_document(doc!(
title =>"Frankenstein",
title =>"The Modern Prometheus",
body =>"You will rejoice to hear that no disaster has accompanied the commencement of an \
enterprise which you have regarded with such evil forebodings. I arrived here \
yesterday, and my first task is to assure my dear sister of my welfare and \
increasing confidence in the success of my undertaking."))?;
index_writer.commit()?;let reader = index
.reader_builder().reload_policy(ReloadPolicy::OnCommitWithDelay).try_into()?;let searcher = reader.searcher();let query_parser =QueryParser::for_index(&index,vec![title, body]);let query = query_parser.parse_query("sea whale")?;let top_docs = searcher.search(&query,&TopDocs::with_limit(10))?;for(_score, doc_address)in top_docs {let retrieved_doc:TantivyDocument= searcher.doc(doc_address)?;println!("{}", retrieved_doc.to_json(&schema));}let query = query_parser.parse_query("title:sea^20 body:whale^70")?;let(_score, doc_address)= searcher
.search(&query,&TopDocs::with_limit(1))?.into_iter().next().unwrap();let explanation = query.explain(&searcher, doc_address)?;println!("{}", explanation.to_pretty_json());Ok(())}
版权归原作者 Hello.Reader 所有, 如有侵权,请联系我们删除。