5 Comments
User's avatar
Tao Wan's avatar

Impressive post. Regarding the LogProc cost ($0.75 per 100 GiB), Azure's Cool blob storage cost is $1 per 100 GiB according to its pricing (https://azure.microsoft.com/en-us/pricing/details/storage/blobs/). Do you have to only use Cool and Archive in order to achieve $0.75 per 100 GiB?

Gaurav Nolkha's avatar

Great observation. The cost we calculated is for uncompressed data. Moreover, Your mileage will vary based on your cloud provider discounts.

Rainbow Roxy's avatar

Brilliant. Man, having logging eat 17% of your cloud bill and then dealing with degraded OpenSearch clusters, that's gotta sting. But leveraging blob stores for LogProc and getting that 50x cost reduction with reliabilty is seriously impressive stuff.

Derek Shaw's avatar

Just curious if you compared this to Open Telemetry (https://opentelemetry.io/docs/) + the Grafana stack (Loki, Tempo, Mimir). Specifically log support using an OTel collector and Loki for logs with blob storage as the backing store? A lot of how its built seems very much like what you have documented. Chunking logs, good indexing, blob storage backend, ability to scale through automatic replication, simple sharding, no clusters. It looks to me like your Ingester would be similar to the Collector in the OTel stack. If you did look at it and decided it was not going to work, I would be curious about why.

Morten G's avatar

Very nice blog post! A lot of interesting techniques used here to achieve HA and low costs. Thanks for sharing!

You mentioned that if an ingester goes down, then queries from the past 15 minutes will be incomplete.

Do you have a way of detecting this and notify the querying developer about it? If so, how precise could you pinpoint the missing data? Could you in theory answer the question of if I query the past 30minutes, then half of it would be from blob storage and the other half would be from the ingester I presume. Could you detect that half of the block is missing simply from the block itself?