Interesting solution! According to the provided numbers at "query latency" chapter, the query over cold data, which selects samples for 497 time series over 6 hours time range takes 15 seconds if the queried data isn't available in the cache. This means that typical queries over historical data will take eternity to execute ;(
yes. this is current issue. there are two solutions:
1. the reason it's slow as you select more series over longer periods of time is that the series has to be pulled for each time bucket in the range, and then the samples have to be pulled for each bucket. By compacting older buckets and merging samples together, historical queries should be pretty comparable to 'more recent' cold queries.
2. We don't pre-cache all the metadata today. If we did that, then we could parallelize sample loads much more efficiently, lowering latency.
3. There is a lot of room to do better batching and tune the parallelism of cold reads.
We've only been at this for a couple of months. THe techniques to improve latency on object storage are well known, we just have to implement them.
Another benefit is this: all the data is on S3, so spinning up more optimized readers to transform older data to do more detailed analysis is also an option with this architecture.
The other solution is to aggressively size your disk cache and keep effectively the full working set on disk, using object storage just as a durability layer. Then the main benefit is operational simplicity because you have a true shared-nothing architecture between the read replicas (there's no quorum or hash ring to maintain and no deduplication on read). Obviously you'll have a more expensive deployment topology if you do so, but it's still compelling IMO because you have the knobs to tune whether you want to cache on disk or not.
+1 to what @agavra said. It's awesome to see you here @valyala. Your writing and talks about timeseries databases were a great inspriratino for us. I recall one of your earlier talks about the data layout design of VM. Opendata Timeseries has emulated a lot of it.
There is another approach for solving this issue - to use increase_pure() function from MetricsQL - https://docs.victoriametrics.com/metricsql/#increase_pure . Of course, you need to switch to VictoriaMetrics, since Mimir doesn't support this function.
It is interesting why Airbnb uses vmagent for streaming aggregation and didn't switch from Mimir to VictoriaMetrics. This could save them a lot of costs on infrastructure and operations, like in cases of Roblox, Spotify, Grammarly and others - https://docs.victoriametrics.com/victoriametrics/casestudies...
This is a good alternative for those who wants storing petabytes of historical logs, metrics or traces in VictoriaLogs, VictoriaMetrics and VictoriaTraces, and wants saving 2x-4x on the persistent storage pricing (compare EBS pricing to S3 pricing).
If the server cannot keep up with the given workload because of some bottleneck (CPU, network, disk IO), then it cannot guarantee any response times - incoming queries will be either rejected or queued in a long wait queue, which will lead to awfully big response times. This doesn't depend on the programming language or the framework the server written in.
If you want response time guarantees, make sure the server has enough free resources for processing the given workload.
If you are struggling with observability solutions which require object storage for production setups after such news (i.e. Thanos, Loki, Mimir, Tempo), then try alternatives without this requirement, such as VictoriaMetrics, VictoriaLogs and VictoriaTraces. They scale to petabytes of data on regular block storage, and they provide higher performance and availability than systems, which depend on manually managed object storage such as MinIO.
Why do you need non-trivial dependency on the object storage for the database for logs in the first place?
Object storage has advantages over regular block storage if it is managed by cloud, and if it has a proven record on durability, availability and "infinite" storage space at low costs, such as S3 at Amazon or GCS at Google.
Object storage has zero advantages over regular block storage if you run it on yourself:
- It doesn't provide "infinite" storage space - you need to regularly monitor and manually add new physical storage to the object storage.
- It doesn't provide high durability and availability. It has lower availability comparing to a regular locally attached block storage because of the complicated coordination of the object storage state between storage nodes over network. It usually has lower durability than the object storage provided by cloud hosting. If some data is corrupted or lost on the underlying hardware storage, there are low chances it is properly and automatically recovered by DIY object storage.
- It is more expensive because of higher overhead (and, probably, half-baked replication) comparing to locally attached block storage.
- It is slower than locally attached block storage because of much higher network latency compared to the latency when accessing local storage. The latency difference is 1000x - 100ms at object storage vs 0.1ms at local block storage.
- It is much harder to configure, operate and troubleshoot than block storage.
So I'd recommend taking a look at other databases for logs, which do not require object storage for large-scale production setups. For example, VictoriaLogs. It scales to hundreds of terabytes of logs on a single node, and it can scale to petabytes of logs in cluster mode. Both modes are open source and free to use.
Disclaimer: I'm the core developer of VictoriaLogs.
> Object storage has zero advantages over regular block storage if you run it on yourself
Worth adding, this depends on what's using your block storage / object storage. For Loki specifically, there are known edge-cases with large object counts on block storage (this isn't related to object size or disk space) - this obviously isn't something I've encountered & I probably never will, but they are documented.
For an application I had written myself, I can see clearly that block storage is going to trump object storage for all self-hosted usecases, but for 3P software I'm merely administering, I have less control over its quirks & those pros -vs- cons are much less clear cut.
Initially I was just following recommendations blindly - I've never run Loki off-cloud before so my typical approach to learning a system would be to start with defaults & tweak/add/remove components as I learn it. Grafana's docs use object storage everywhere, so it's a lot easier with you're aligned, you can rely more heavily on config parity.
While I try to avoid complexity, idiomatic approaches have their advantages; it's always a trade-off.
That said my first instinct when I saw minio's status was to use filestorage but the rustfs setup has been pretty painless sofar. I might still remove it, we'll see.
You can expose Unikernel application metrics in Prometheus text exposition format at `/metrics` http page and collect them with Prometheus or any other collector, which can scrape Prometheus-compatible targets. Alternatively, you can push metrics from the Unikernel to the centralized database for metrics for further investigation. Both pull-based and push-based metrics' collection is supported by popular client libraries for metrics such as https://github.com/VictoriaMetrics/metrics .
You can emit logs by the Unikernel app and send them to a centralized database for logs via syslog protocol (or any other protocol) for further analysis. See, for example, how to set up collect ing logs via syslog protocol at VictoriaLogs - https://docs.victoriametrics.com/victorialogs/data-ingestion...
You can expose various debug endpoints via http at the Unikernel application for debugging assistance. For example, if the application is written in Go, it is recommended exposing endpoints for collecting CPU, memory and goroutines profiles from the running application.
There is no need in the operating system to run Unikernels. Every Unikernel includes parts of operating system needed for interacting with the underlying hardware. So Unikernels can run on bare metal if they know how to interact with the underlying hardware (i.e. if they have drivers for that hardware). Usually Unikernels are targeted to run on virtual machines because virtual machines have unified virtualised hardware. This allows running the same Unikernel on virtual machines across multiple cloud providers, since they have similar virtual hardware.
It monitors all the applications in your network and automatically detects the most common issues with the applications. It also collects metrics, traces, logs and CPU profiles for the monitored applications, so you could quickly investigate the root cause of various issues if needed.
I like that Coroot works out of the box without the need in complicated configuration.
reply