I think you are looking too hard for a secret purpose. I think they just want to keep making bets on controlling the social network space. They see Twitter failing and it seems like a great time to try and snag the market. They use ActivityPub as a selling point and Instagram users as an existing network and hope to replace Twitter. Instagram still seems pretty popular but they are clearly missing some part of the market with a text-based network so they are both expanding and diversifying. Facebook is already circling the drain (although it will probably be circling that drain for a decade) and they don't know if Instagram will continue to be popular.
I'm sure harvesting data for LLMs is a great side benefit, but if that was their main goal they would probably just run a crawler. They could even crawl ActivityPub without having their own instance if they wanted. It is a public API. They would just miss out on private posts which is probably a small fraction.
There is no easy way to get the number of “reads” of an article, so the task was quite challenging. Instead, I looked at engagement across HN-like websites (Hackernews, Reddit, and X). Then, I used some Python and Jupyter to build the final list.
Neither a book nor a blog, but I'm building a newsletter that aggregates the latest articles from company engineering blogs. I think you might find it useful: http://bigtechdigest.substack.com
Have a look at a newsletter called "Big Tech Digest". It provides you with the latest articles from hundreds of engineering blogs like Meta, Google, Uber, Airbnb, Doordash, etc. every two weeks. Very useful for building knowledge around system architecture.
Pure FP is fantastic when you're doing Domain-Driven Design. Optics is brilliant for working with aggregates, functional core imperative shell to structure the dependencies within the applications and put the domain in the center, monads and applicatives to achieve different component composition styles.