0

We have a cluster deployment of Milvus on the k8s and our dataset sizes are of the order of a 150 million. The querynodes are distributed across 16 replicas with 40G of memory each. The current bottleneck for us in the partition load times and we exploring ways to improve the load times. We were referring to https://milvus.io/docs/chunk_cache.md and tried setting this up in our deployment. However, we do not see any noticeable improvements in the load times and from the logs, there is not indication that the chunk cache is being loaded. There are a couple of questions we have in this area:

Is chunk caching supposed to speed up partition load times in general or does it only come into play during vector search? How to verify if chunk cache is working as expected? Is the cache stored in the local storage of the querynodes under the localStorage directory which in our case happens to be /var/lib/milvus/data which is currently empty? Which grafana panel can be used as an indication of checking the cache? Which logs should specifically indicate that chunk cache is working? The user.yaml file on the query nodes have the below configuration in them:

> kubectl exec -it qa-milvus-querynode -- cat /milvus/configs/user.yaml
Defaulted container "querynode" out of: querynode, config (init)
common:
  security:
    authorizationEnabled: true
proxy:
  maxUserNum: 500
  maxRoleNum: 100
queryNode:
  cache:
    enabled: true
    warmup: async

1 Answer 1

0

Milvus manages data by segments, and each segment has an independent index that progresses asynchronously. If a segment's index is not ready, the query node will load its original vector data into memory to perform a brute-force search.

The time required to load a collection or partition primarily depends on the number of segments or indexes within it. When warmup is enabled, the load process will download the original vector files to the local disk. Once these files are stored locally, search operations with output_fields will read data from the local files instead of S3.

The size of the index relative to the original vectors varies by index type:

  • IVF_FLAT/HNSW: A bit larger than the original vector size.
  • IVF_SQ8/IVF_PQ: 25% to 30% of the original vector size.
  • DISKANN: 1/4 to 1/6 of the original vector size.

The time taken to output vector data depends on the number of vectors read. For instance, a search request with NQ = 1 and topk = 1 will only read one vector from storage, having minimal impact on search performance. However, setting topk = 1000 will require reading NQ * 1000 vectors, significantly impacting search performance.

New contributor
Rashad Tockey is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

Not the answer you're looking for? Browse other questions tagged or ask your own question.