再识 ES

前言

14年创业时,学习过 ES,15年到了厦门,又复习了一遍,现在用到了,所以叫再识。

暂时缺乏一个很好的思路去记录这个笔记,那么还是按照最老实的方式来记录,就是对官网的文章一篇一篇地去学习,记录笔记。

博客

IT老兵博客

正文

Elasticsearch is a distributed document store. Instead of storing information as rows of columnar data, Elasticsearch stores complex data structures that have been serialized as JSON documents. When you have multiple Elasticsearch nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately from any node.

ES 存储数据不是用行列式来存储—-这里说的应该是类似 MySQL 那种存储方式。ES 是把数据结构序列化成 JSON 文档来保存。

如果你的 ES 节点位于一个集群里面,那么这些存储的数据是分布在这个集群里面的—-那么每个节点上面是完整的吗?

When a document is stored, it is indexed and fully searchable in near real-time—​within 1 second. Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.

当一条文档被存储,它会被索引化和完全可搜索化在1秒之内。ES 使用一个反向索引来支持快速的全文检索。

An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data. By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. For example, text fields are stored in inverted indices, and numeric and geo fields are stored in BKD trees. The ability to use the per-field data structures to assemble and return search results is what makes Elasticsearch so fast.

一个 index 可以被认为是一个优化过的文档集合,其中的每一条文档是一个字段的集合。

Elasticsearch also has the ability to be schema-less, which means that documents can be indexed without explicitly specifying how to handle each of the different fields that might occur in a document. When dynamic mapping is enabled, Elasticsearch automatically detects and adds new fields to the index. This default behavior makes it easy to index and explore your data—​just start indexing documents and Elasticsearch will detect and map booleans, floating point and integer values, dates, and strings to the appropriate Elasticsearch datatypes.

ES 具有模式无关的能力,这个是说,哪怕你新增了字段—-等于说是违反了之前的模式—-它也能处理索引化。

Ultimately, however, you know more about your data and how you want to use it than Elasticsearch can. You can define rules to control dynamic mapping and explicitly define mappings to take full control of how fields are stored and indexed.

然而,你对于你的数据知道的更多,更清楚应该如何让 ES 去使用它,所以如何做映射,应该是由你来明确定义。

Defining your own mappings enables you to:

Distinguish between full-text string fields and exact value string fields
Perform language-specific text analysis
Optimize fields for partial matching
Use custom date formats
Use data types such as geo_point and geo_shape that cannot be automatically detected
It’s often useful to index the same field in different ways for different purposes. For example, you might want to index a string field as both a text field for full-text search and as a keyword field for sorting or aggregating your data. Or, you might choose to use more than one language analyzer to process the contents of a string field that contains user input.

The analysis chain that is applied to a full-text field during indexing is also used at search time. When you query a full-text field, the query text undergoes the same analysis before the terms are looked up in the index.

在做索引化时,分析链被用于全文字段,同时也被用于搜索的时候。当你查询一个全文字段,查询文本也在被做分析,在这些术语在被查找之前。

总结

这里涉及到一个单词 indices, 这个单词是 index 的另外一种复数形式,底下的参考里面有讲。

参考

https://www.elastic.co/guide/en/elasticsearch/reference/current/documents-indices.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-creation
https://www.nasdaq.com/articles/indexes-or-indices-whats-the-deal-2016-05-12