This is true for guards, middle relays, and of course exits, but less so for bridges, since their IPs are not published globally like other relays are. The people maintaining block lists can't block what they don't know.
That's extremely insidious. I suppose I never encountered this issue because I almost always call asyncio.gather(*), which makes having a collection of tasks natural.
Hello, I work on this project! For the most part, we use Common Crawl to discover which websites carry the most CC content, and then integrate the platform through either their API, if available, or put together a bespoke scraper. If you put your content on one of these integrated platforms, eventually your work will appear in our collection.
In my mind, the dream is to have the user embed an asset from our servers on their web page (like an updated version of these old CC license buttons [0]), read the referrer headers from the server logs, and then dispatch crawlers that read ccREL [1] data embedded on the page, which would allow us to instantly index content as soon as it is published. Performing broad web crawls searching the web for ccREL data is also possible but probably not what we're looking to do in the near-term.
We have a ways to go before we are able to do this, since there's no easy way for end users to create and embed ccREL at the moment, and there are of course lots of other unanswered questions about how we would moderate incorrect attribution, how these tools might be abused, etc.