For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | gulcin_xata's commentsregister

For many production setups, taking a database snapshot involves transferring significant amounts of data over the network. The standard way to do this efficiently is to process data in batches. Batching reduces per-request overhead and helps maximize throughput, but it also introduces an important tuning problem: choosing the right batch size.

A batch size that works well in a low latency environment can become a bottleneck when snapshots run across regions or under less predictable network conditions. Static batch size configuration assumes stable networks, which rarely reflects reality.

In this blog post we describe how we used automatic batch size tuning to optimize data throughput for Postgres snapshots, the constraints we worked under and how we validated that the approach actually improves performance in production-like environments for our open source pgstream tool.


Thanks for the feedback!


Ah, thanks for the sharing this insight.


I agree, it is not so straightforward to find out.


Hey!

(Disclaimer: I work at Xata.) Just wanted to mention that we also support anonymization, in case that’s something you're looking into: https://xata.io/postgres-data-masking


Recently we launched Xata Agent, an open-source AI agent which helps diagnose issues and suggest optimizations for PostgreSQL databases.

To make sure that Xata Agent still works well after modifying a prompt or switching LLM models we decided to test it with an Eval. In this blog, we'll explain how we used Vercel's AI SDK and Vitest to build an Eval in TypeScript.


Tried with a video I previously watched and got a good enough summary, also liked the UI. Did you implement a RAG?


thanks! Cool that you like the UI since I feel like I'm not very 'artistic' if that makes sense.

No, no RAG. I use Gemini 1.5 Flash as an LLM, and it has a very long context window (1M tokens). Because of that, I can feed the entire transcript into Gemini's context. I feel that's important to get good results.


I see you're using pg-schema-diff for schema diffing, hadn’t come across it before, so thanks for mentioning it!

Have you seen pgroll? https://github.com/xataio/pgroll It is a Postgres schema migration tool for achieving zero downtime, minimal locking schema changes. Thought it might be interesting for you. I also checked the unsupported operations in pg-schema-diff, and from a quick look, pgroll seems to cover more migration types: https://pgroll.com/docs/v0.8.0/getting-started


Have you seen pgstream? https://github.com/xataio/pgstream It is similar to pg_replicate and could be a good fit for streaming CDC data from Postgres. There is no built-in output plugin specifically for DuckDB but it might help you for building something lightweight and custom for your use case.


Thanks, I'll take a look!


Is this an open-source product? How can we give feedback?


It is not an open-source product at this time. However, I welcome any feedback on the product idea or suggestions for survey software features.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You