This is awesome! Had been meaning to open-source parts of my startup [1] that is built almost entirely on Workers right now. Glad we can do that now without making massive changes to the code-base.
Amazed by what OP and the Workers team have done over the years. Took a while for us to get used to the Workers paradigm. But once we did, feature velocity has been great.
Last wish list item: a Postgres service on Workers (D2?) becoming available in the not too distant future
I went to use your product and connect Google Analytics to give it a shot, but you haven't verified the app w/ Google, so you get a big warning. I closed out of the browser. Might want to look into that.
We've mostly been using the GA plugin for use with beta users. Had completely forgotten about the need to verify the app with Google. Will do that this week.
They do have relational DB connectors [1], which have been working great for us. But maintaining a centralised DB is still a bit of a pain and latency can be high, depending on where users are accessing them from. Having a managed, distributed SQL service will be easier to manage and will likely have much lower latency. Their SQLite DB, D1 [2] looks interesting. But would be awesome to have Postgres' more complete feature set as a managed, low-latency service
Backend: Cloudflare Durable Objects for the consumer-facing app, Python cronjobs on a GCP hosted VM for background task processing, FastAPI for self-hosted vector search
Frontend: Nextjs. Antd as UI framework, Highcharts for charts. Hosted on Vercel
This looks pretty cool! Is this basically efficient/scalable fuzzy object matching?
IMO, it would be super useful to have some performance benchmarks – how fast is this for 1k/100k objects? How does that compare to other approaches etc
Not sure how feasible these are, but features I would find super useful:
- string matching across languages in different scripts (with something like unidecode maybe? [1])
- fuzzy matching that includes continuous variables like lat/long, age etc
Excited about using this – will be following the repo very closely!
We see performance varies by
a) Number of attributes to match
b) Size of data
c) Type of matching and the features we compute for each
d) Hardware and cluster size
Although we do not do matching across languages like English with Chinese, we have tested Zingg quite rigorously with Chinese, Japanese, Hindi, German and other languages and it seems to work out of the box. Likely due to the inbuilt Java unicode support and the ML based learning.
You make a great point about continuous variables like lat/long, age etc. Age seems to work, again due to integer differences and the learning. Have not tried lat/long yet. Would you have any dataset you could recommend for testing?
Try to make something that works in the real world as soon as you can (a web-page, a mobile app, a web-app, or whatever is most relevant to what you're learning).
Being able to create something that works - no matter how simple - can be a far stronger motivator that simply checking off a curriculum's requirements.
> I'm not suggesting that everything is hunky-dory, just that we bear in mind the proportions of the problem. Also Pinker may well be off the mark here, as others have pointed out in[1].
The data seems unrepresentative, though. While data on suicide rates is fairly clear, it might be more interesting to look at revealed preferences instead of self-reported ones. To this end, indicators for "lives of despair" (drug OD deaths, hospitalisation for drug/alcohol abuse etc) might be more appropriate.
1. Innovation has continued and accelerated in the world of bits, but has plateaued in the world of stuff
2. If you go to a room and get rid of all the screens, how do you know you’re not in 1979?
3. Since the Great Depression, we’ve been managing economic metrics. But the technological and economic tailwinds haven’t been there at all.
4. In a healthy system, you can have wild dissent and it’s not threatening. Because everyone knows that the system is heathy. In an unhealthy system, the dissent becomes much more dangerous. There are very few people who openly criticise the unhealthy systems that they are part of
5. In late modernity (which we are living in), there’s simply too much knowledge for an individual to understand all of it. In 1800s, Goethe could understand all of everything. In 1900s, Hilbert could understand all of mathematics. But now, the kind of specialisation we have is much harder to get a handle on.
6. If you believe that productivity and growth is over, and you don’t want to emphasise merit. Instead, you focus on simply making sure that each group has its share of slots on the table. It’s not about wealth creation, it’s about receiving the wealth that’s already there.
"The following content may not be eligible for monetization:
...
Tragedy & Conflict
Content that focuses on real world tragedies, including but not limited to depictions of death, casualties, physical injuries, even if the intention is to promote awareness or education. For example, situations like natural disasters, crime, self-harm, medical conditions and terminal illnesses.
Debated Social Issues
Content that is incendiary, inflammatory, demeaning or disparages people, groups, or causes is not eligible for ads. Content that features or promotes attacks on people or groups is generally not eligible for ads, even if in the context of news or awareness purposes.
..."
Facebook has the right to do whatever it wants on its platform, but this will have bad social consequences. Publishers will have no incentive to cover conflicts in war-torn regions, attacks by radical religious groups on minorities in third-world countries, or any news that is not happy and advertiser-friendly under these guidelines.
The only groups that will now have an incentive to cover these issues will be those with a political agenda - including political pages and fake/misleading news outlets.
Terrifying for the future of the discourse in the world :/
I agree. I did not agree with everything he wrote, but everyone of his articles were insightful and I came out a better person for having read and thought about them.
Amazed by what OP and the Workers team have done over the years. Took a while for us to get used to the Workers paradigm. But once we did, feature velocity has been great.
Last wish list item: a Postgres service on Workers (D2?) becoming available in the not too distant future
[1] https://datanarratives.com/