Docker Compose安装milvus向量数据库单机版-milvus基本操作

安装Ubuntu 22.04 LTS

以管理员身份运行powershell
在这里插入图片描述

wsl

wsl --list --online

Ubuntu 22.04 LTS可以不装，wsl必须更新。。。

wsl --install -d Ubuntu-22.04

wsl.exe --update

如果操作超时，可以试试开代理。
在这里插入图片描述
重启电脑。。。

设置用户名、密码

在power shell启动milvus容器

安装docker desktop

https://hub.docker.com/

在这里插入图片描述

重启电脑。。。

下载yaml文件

power shell输入以下命令，下载yaml文件到指定目录，并重命名为docker-compose.yml

Invoke-WebRequest -Uri https://github.com/milvus-io/milvus/releases/download/v2.3.2/milvus-standalone-docker-compose.yml -OutFile E:\codes\milvus\docker-compose.yml

或者
点击一下链接直接下载
https://github.com/milvus-io/milvus/releases/download/v2.3.2/milvus-standalone-docker-compose.yml

启动milvus容器

cd E:\codes\milvus

docker-compose up -d

在这里插入图片描述
a few moments later。。。

docker compose ps

在这里插入图片描述

Milvus管理软件Attu

https://github.com/zilliztech/attu/releases
https://github.com/zilliztech/attu/releases/download/v2.3.2/attu-Setup-2.3.2.exe
在这里插入图片描述

python连接milvus

https://milvus.io/docs/example_code.md

配置

python环境

conda create -n milvus-env python=3.9

conda env list

conda activate milvus-env

pip install ipykernel -i https://pypi.tuna.tsinghua.edu.cn/simple/

python -m ipykernel install --name milvus-env

pip install pymilvus==2.3.2 -i https://pypi.tuna.tsinghua.edu.cn/simple/

下载wget

https://eternallybored.org/misc/wget/
wget.exe文件放到C:\Windows\System32

!wget https://raw.githubusercontent.com/milvus-io/pymilvus/master/examples/hello_milvus.py

示例

下面演示如何使用PyMilvus库连接到Milvus数据库，创建数据表，插入数据，创建索引，进行搜索、查询、分页查询，以及删除数据表等操作。

导入必要的模块和类

connections: 这是PyMilvus库的模块，用于建立与Milvus数据库的连接。
utility: 这也是PyMilvus库的模块，包含了一些实用的函数，用于执行Milvus的管理和操作。
FieldSchema, CollectionSchema, DataType, Collection: 这些类属于PyMilvus库，用于定义数据表的字段结构、数据类型、数据表模式和执行数据表操作。

变量

fmt

、

search_latency_fmt

、

num_entities

和

dim

：用于格式化输出和指定示例中使用的实体数量和维度。

import time

import numpy as np
from pymilvus import(
    connections,
    utility,
    FieldSchema, CollectionSchema, DataType,
    Collection,)

fmt ="\n=== {:30} ===\n"
search_latency_fmt ="search latency = {:.4f}s"
num_entities, dim =3000,8

与Milvus数据库建立连接

与Milvus数据库建立连接并检查是否存在名为"hello_milvus"的数据表

使用connections模块的connect函数来建立连接，指定了连接别名（“default”）以及Milvus服务器的主机地址和端口。在这里，连接别名是"default"，表示使用默认的连接配置，Milvus服务器的地址是"localhost"，端口是"19530"。
使用utility模块的has_collection函数检查是否存在名为"hello_milvus"的数据表。如果数据表存在，它将返回True，否则返回False。

print(fmt.format("start connecting to Milvus"))
connections.connect("default", host="localhost", port="19530")

has = utility.has_collection("hello_milvus")print(f"Does collection hello_milvus exist in Milvus: {has}")

output:

=== start connecting to Milvus     ===

Does collection hello_milvus exist in Milvus: False

创建名为"hello_milvus"的Milvus数据表

创建名为"hello_milvus"的Milvus数据表，并定义数据表的字段结构和模式。

fields: 这是一个包含了数据表字段结构的列表。每个字段由FieldSchema对象表示，其中包括字段名称、数据类型、是否是主键、主键是否自动生成、以及其他相关属性。在这个示例中，定义了三个字段：- “pk” 字段是主键字段，数据类型为VARCHAR，主键不自动生成（auto_id=False），并且设置最大长度为100字符。- “random” 字段是双精度浮点数字段，数据类型为DOUBLE。- “embeddings” 字段是浮点向量字段，数据类型为FLOAT_VECTOR，并且指定向量维度（dim）为之前定义的dim变量的值（8维）。
schema: 这是一个CollectionSchema对象，它用于定义数据表的模式。schema包含了字段结构和数据表的描述信息。
创建Milvus数据表：使用Collection对象来创建数据表，指定数据表的名称（“hello_milvus”），数据表模式（schema对象），以及一致性级别（“Strong”）。一致性级别用于控制数据表的数据一致性。
field namefield typeother attributesfield description1“pk”VarCharis_primary=True , auto_id=False“primary field”2“random”Double“a double field”3“embeddings”FloatVectordim=8“float vector with dim 8”
```
fields =[
FieldSchema(name="pk", dtype=DataType.VARCHAR, is_primary=True, auto_id=False, max_length=100),
FieldSchema(name="random", dtype=DataType.DOUBLE),
FieldSchema(name="embeddings", dtype=DataType.FLOAT_VECTOR, dim=dim)]

schema = CollectionSchema(fields,"hello_milvus is the simplest demo to introduce the APIs")print(fmt.format("Create collection hello_milvus"))
hello_milvus = Collection("hello_milvus", schema, consistency_level="Strong")


#### 插入数据

插入数据记录到Milvus数据表"hello_milvus"中

1. ```entities```: 这是一个包含要插入的数据记录的列表。数据记录按字段分组，其中每个字段的数据以列表的形式包含在```entities```列表中。具体描述如下：- 第一个子列表 ```[str(i) for i in range(num_entities)]``` 包含了主键字段 “pk” 的值，使用字符串表示。这些字符串是根据主键字段的定义生成的，因为```auto_id```设置为False，所以需要提供主键值。- 第二个子列表 ```rng.random(num_entities).tolist()``` 包含了双精度浮点数字段 “random” 的值，这些值是使用随机数生成器生成的，并转换为列表格式。- 第三个子列表 ```rng.random((num_entities, dim))``` 包含了浮点向量字段 “embeddings” 的值，这些值是使用随机数生成器生成的，维度（```dim```）由之前定义的变量确定。
2. 使用Milvus数据表的```insert```方法将数据记录插入到数据表中。插入后，```insert_result```将包含插入操作的结果信息，如主键值等。
3. ```flush()```: 刷新数据表，确保插入的数据被持久化保存到磁盘中。在Milvus中，数据通常在内存中进行操作，然后通过flush操作将其持久保存。

print(fmt.format("Start inserting entities"))
rng = np.random.default_rng(seed=19530)
entities =[# provide the pk field because auto_id is set to False[str(i)for i inrange(num_entities)],
rng.random(num_entities).tolist(),# field random, only supports list
rng.random((num_entities, dim)),# field embeddings, supports numpy.ndarray and list]

insert_result = hello_milvus.insert(entities)

hello_milvus.flush()print(f"Number of entities in Milvus: {hello_milvus.num_entities}")# check the num_entities


output:

=== Start inserting entities ===

Number of entities in Milvus: 3000


#### 创建索引

在Milvus数据表"hello_milvus"的浮点向量字段"embeddings"上创建一个IVF_FLAT索引。

1. ```index```: 这是一个字典，包含了索引的相关参数。在这个示例中，定义了以下索引参数：- “index_type”: 指定了索引类型为 “IVF_FLAT”，这是一种基于倒排列表的索引类型，适用于浮点向量字段。- “metric_type”: 指定了距离度量类型为 “L2”，表示使用欧几里德距离来衡量向量之间的相似性。- “params”: 这是一个包含索引参数的字典，包括 “nlist” 参数，它指定了索引的列表数量，这里设置为128。
2. 使用Milvus数据表的```create_index```方法，在名为"embeddings"的字段上创建了指定的IVF_FLAT索引。参数 “embeddings” 表示要在哪个字段上创建索引，而 ```index``` 字典包含了索引的配置信息。

通过这段代码，IVF_FLAT索引被创建在"hello_milvus"数据表的"embeddings"字段上，用于加速相似性搜索操作。索引的创建有助于提高查询性能，特别是对于包含大量浮点向量数据的场景。索引类型和参数可以根据具体需求进行调整和优化。

print(fmt.format("Start Creating index IVF_FLAT"))
index ={"index_type":"IVF_FLAT","metric_type":"L2","params":{"nlist":128},}

hello_milvus.create_index("embeddings", index)


#### 基于向量相似性的搜索

1. ```hello_milvus.load()```: 将数据表"hello_milvus"中的数据加载到内存中，以便后续的搜索和查询操作可以更快地执行。在Milvus中，数据通常是存储在磁盘上的，加载数据到内存可以提高查询性能。
2. ```vectors_to_search = entities[-1][-2:]```: 从```entities```中获取浮点向量字段"embeddings"的值。```entities[-1]```表示最后一个子列表，而```[-2:]```表示获取该子列表的最后两个元素，即浮点向量数据。这些向量数据将用于相似性搜索。
3. ```search_params```: 这是一个包含搜索参数的字典。在这个示例中，定义了以下参数：- “metric_type”: 指定了距离度量类型为 “L2”，表示使用欧几里德距离来衡量向量之间的相似性。- “params”: 这是一个包含搜索参数的字典，包括 “nprobe” 参数，它指定了搜索时的候选集数量，这里设置为10。
4. ```search()```: 使用Milvus数据表的```search```方法执行相似性搜索操作。参数包括搜索的向量数据（```vectors_to_search```）、搜索的字段名称（“embeddings”）、搜索参数（```search_params```）、返回结果的数量限制（limit=3），以及要返回的输出字段（“random”）。搜索操作将返回与搜索向量相似的数据记录。
5. 遍历搜索结果，遍历每个搜索结果中的数据记录。

print(fmt.format("Start loading"))
hello_milvus.load()# -----------------------------------------------------------------------------# search based on vector similarityprint(fmt.format("Start searching based on vector similarity"))
vectors_to_search = entities[-1][-2:]
search_params ={"metric_type":"L2","params":{"nprobe":10},}

start_time = time.time()
result = hello_milvus.search(vectors_to_search,"embeddings", search_params, limit=3, output_fields=["random"])
end_time = time.time()for hits in result:for hit in hits:print(f"hit: {hit}, random field: {hit.entity.get('random')}")print(search_latency_fmt.format(end_time - start_time))


output:

=== Start loading ===

=== Start searching based on vector similarity ===

hit: id: 2998, distance: 0.0, entity: {'random': 0.9728033590489911}, random field: 0.9728033590489911
hit: id: 1262, distance: 0.08883658051490784, entity: {'random': 0.2978858685751561}, random field: 0.2978858685751561
hit: id: 1265, distance: 0.09590047597885132, entity: {'random': 0.3042039939240304}, random field: 0.3042039939240304
hit: id: 2999, distance: 0.0, entity: {'random': 0.02316334456872482}, random field: 0.02316334456872482
hit: id: 1580, distance: 0.05628091096878052, entity: {'random': 0.3855988746044062}, random field: 0.3855988746044062
hit: id: 2377, distance: 0.08096685260534286, entity: {'random': 0.8745922204004368}, random field: 0.8745922204004368
search latency = 0.3700s


#### 基于标量过滤条件的查询操作

基于标量过滤条件的查询操作，以及查询结果的分页操作。

1. ```hello_milvus.query(expr="random > 0.5", output_fields=["random", "embeddings"])```: 使用Milvus数据表的```query```方法执行查询操作。筛选"random > 0.5"的数据记录，返回的输出字段（“random"和"embeddings”）。
2. ```result``` 是一个包含查询结果的列表，每个元素是一个包含查询结果字段的字典。在这里，打印了第一个查询结果的信息。
3. ```hello_milvus.query(expr="random > 0.5", limit=4, output_fields=["random"])```: 分页查询，限制结果数量为4条。参数 ```limit=4``` 指定了返回结果的最大数量，只返回满足条件的前4条数据，并指定了要返回的输出字段为 “random”。
4. ```hello_milvus.query(expr="random > 0.5", offset=1, limit=3, output_fields=["random"])```: 另一个分页查询，设置了偏移量 ```offset=1``` 和限制结果数量 ```limit=3```，以返回满足条件的数据记录的第2到第4条数据，并同样指定了要返回的输出字段为 “random”。

print(fmt.format("Start querying with random > 0.5"))

start_time = time.time()
result = hello_milvus.query(expr="random > 0.5", output_fields=["random","embeddings"])
end_time = time.time()print(f"query result:\n-{result[0]}")print(search_latency_fmt.format(end_time - start_time))# -----------------------------------------------------------------------------# pagination
r1 = hello_milvus.query(expr="random > 0.5", limit=4, output_fields=["random"])
r2 = hello_milvus.query(expr="random > 0.5", offset=1, limit=3, output_fields=["random"])print(f"query pagination(limit=4):\n\t{r1}")print(f"query pagination(offset=1, limit=3):\n\t{r2}")


output:

=== Start querying with random > 0.5 ===

query result:
-{'random': 0.6378742006852851, 'embeddings': [0.20963514, 0.39746657, 0.12019053, 0.6947492, 0.9535575, 0.5454552, 0.82360446, 0.21096309], 'pk': '0'}
search latency = 0.4006s
query pagination(limit=4):
[{'random': 0.6378742006852851, 'pk': '0'}, {'random': 0.5763523024650556, 'pk': '100'}, {'random': 0.9425935891639464, 'pk': '1000'}, {'random': 0.7893211256191387, 'pk': '1001'}]
query pagination(offset=1, limit=3):
[{'random': 0.5763523024650556, 'pk': '100'}, {'random': 0.9425935891639464, 'pk': '1000'}, {'random': 0.7893211256191387, 'pk': '1001'}]


#### 基于向量相似性和标量过滤条件的混合搜索

1. ```hello_milvus.search(vectors_to_search, "embeddings", search_params, limit=3, expr="random > 0.5", output_fields=["random"])```: 使用Milvus数据表的```search```方法执行混合搜索操作。参数包括搜索的向量数据（```vectors_to_search```）、搜索的字段名称（“embeddings”）、搜索参数（```search_params```），限制结果数量（limit=3），以及标量过滤条件表达式（```expr="random > 0.5"```）。混合搜索操作将返回同时满足向量相似性和标量条件的数据记录。
2. 遍历混合搜索结果，遍历每个搜索结果中的数据记录。

基于向量相似性和标量过滤条件的混合搜索操作，检索同时满足这两种条件的数据记录，并输出了混合搜索结果。混合搜索可用于更精确地筛选满足多个条件的数据记录。

print(fmt.format("Start hybrid searching with random > 0.5"))

start_time = time.time()
result = hello_milvus.search(vectors_to_search,"embeddings", search_params, limit=3, expr="random > 0.5", output_fields=["random"])
end_time = time.time()for hits in result:for hit in hits:print(f"hit: {hit}, random field: {hit.entity.get('random')}")print(search_latency_fmt.format(end_time - start_time))


output:

=== Start hybrid searching with random > 0.5 ===

hit: id: 2998, distance: 0.0, entity: {'random': 0.9728033590489911}, random field: 0.9728033590489911
hit: id: 747, distance: 0.14606499671936035, entity: {'random': 0.5648774800635661}, random field: 0.5648774800635661
hit: id: 2527, distance: 0.1530652642250061, entity: {'random': 0.8928974315571507}, random field: 0.8928974315571507
hit: id: 2377, distance: 0.08096685260534286, entity: {'random': 0.8745922204004368}, random field: 0.8745922204004368
hit: id: 2034, distance: 0.20354536175727844, entity: {'random': 0.5526117606328499}, random field: 0.5526117606328499
hit: id: 958, distance: 0.21908017992973328, entity: {'random': 0.6647383716417955}, random field: 0.6647383716417955
search latency = 0.3875s


#### 基于主键值删除数据记录

1. ```insert_result.primary_keys```: 从之前插入数据的结果对象```insert_result```中获取了插入操作生成的主键值（PK）。这些主键值被保存在```primary_keys```属性中。
2. ```expr = f'pk in ["{ids[0]}" , "{ids[1]}"]'```: 使用主键值来指定要删除的数据记录。
3. ```hello_milvus.query(expr=expr, output_fields=["random", "embeddings"])```: 使用Milvus数据表的```query```方法执行查询操作，以验证删除操作前的查询结果。查询操作使用之前构建的布尔表达式```expr```，并指定要返回的输出字段为 “random” 和 “embeddings”。
4. ```hello_milvus.delete(expr)```: 使用Milvus数据表的```delete```方法执行删除操作，根据之前构建的布尔表达式```expr```删除满足条件的数据记录。
5. ```hello_milvus.query(expr=expr, output_fields=["random", "embeddings"])```: 再次使用```query```方法执行查询操作，以验证删除操作后的查询结果。由于之前的数据记录已经被删除，查询结果应该为空。

ids = insert_result.primary_keys

expr =f'pk in ["{ids[0]}" , "{ids[1]}"]'print(fmt.format(f"Start deleting with expr {expr}"))

result = hello_milvus.query(expr=expr, output_fields=["random","embeddings"])print(f"query before delete by expr={expr} -> result: \n-{result[0]}\n-{result[1]}\n")

hello_milvus.delete(expr)

result = hello_milvus.query(expr=expr, output_fields=["random","embeddings"])print(f"query after delete by expr={expr} -> result: {result}\n")


output:

=== Start deleting with expr pk in ["0" , "1"] ===

query before delete by expr=pk in ["0" , "1"] -> result:
-{'embeddings': [0.20963514, 0.39746657, 0.12019053, 0.6947492, 0.9535575, 0.5454552, 0.82360446, 0.21096309], 'pk': '0', 'random': 0.6378742006852851}
-{'embeddings': [0.52323616, 0.8035404, 0.77824664, 0.80369574, 0.4914803, 0.8265614, 0.6145269, 0.80234545], 'pk': '1', 'random': 0.43925103574669633}

query after delete by expr=pk in ["0" , "1"] -> result: []


#### 删除Milvus数据表

使用

utility

模块中的

drop_collection

函数，删除名为"hello_milvus"的Milvus数据表（集合）。删除数据表会彻底删除其中的所有数据记录和索引，并释放相关资源。

这个操作可以用于在不再需要数据表时释放资源和空间。

print(fmt.format("Drop collection hello_milvus"))
utility.drop_collection("hello_milvus")


## 停止所有docker容器

docker stop $(docker ps -q)

```

未完待续

标签： docker milvus 数据库

本文转载自: https://blog.csdn.net/weixin_45735391/article/details/134227074
版权归原作者 蓝净云 所有，如有侵权，请联系我们删除。

Docker Compose安装milvus向量数据库单机版-milvus基本操作

目录

安装Ubuntu 22.04 LTS

在power shell启动milvus容器

安装docker desktop

下载yaml文件

启动milvus容器

Milvus管理软件Attu

python连接milvus

配置

下载wget

示例

导入必要的模块和类

与Milvus数据库建立连接

创建名为"hello_milvus"的Milvus数据表

未完待续

发表评论

“Docker Compose安装milvus向量数据库单机版-milvus基本操作”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航