For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more zopf's commentsregister

10/10, would invest


HyperTalk was my first programming language!

I helped a friend build a choose-your-own-adventure murder mystery game called Blood Hotel, and found myself obsessed with the feeling of inventive power that programming enabled.

I ended up building an animated space invaders game, and even tried my hand at writing a "virus" in HyperTalk that would infect other stacks with its code.

Ah, the good old days. Lovely to see this at the top of HN!


I made a thing that turns my Instagram posts into ambient soundscapes by understanding the content of the images, searching for relevant sounds, and mixing them into looped audio scenes.

It was made for the Monthly Music Hackathon in NYC held at Spotify, but it ended up being not terribly musical and more about just fun with audio and convnets :)

https://zopf.github.io/ambiance

Oh yeah! And for another instance of the same meetup, I teamed up with a guy who was great with audio synthesis, and I hooked up an Arduino and a gyroscope and microphone to my drumstick, and we made a wireless throat-singing, spatially-aware percussion instrument:

https://www.youtube.com/watch?v=G-g1GBaTevk


First off: isn't this from June of 2016?

Second: I don't get it. The primary example they use to illustrate the need has almost nothing to do with model building or selection, and everything to do with selecting and painstakingly cleaning data. This mirrors my experience with data science so far.

"A recent exercise conducted by researchers from New York University illustrated the problem. The goal was to model traffic flows as a function of time, weather and location for each block in downtown Manhattan, and then use that model to conduct “what-if” simulations of various ride-sharing scenarios and project the likely effects of those ride-sharing variants on congestion. The team managed to make the model, but it required about 30 person-months of NYU data scientists’ time and more than 60 person-months of preparatory effort to explore, clean and regularize several urban data sets, including statistics about local crime, schools, subway systems, parks, noise, taxis, and restaurants."

So - the meta part isn't such a big deal. But if DARPA has found a way to properly automate the painstaking process of selecting, cleaning, validating, and normalizing data, well THEN we'll really have something to be impressed about.


Unsure if it's CNN-based, but check out http://www.wowtune.net/


very impressive, even if a bit out of tune on the Autumn Leaves demo!

They have a very good team with actual audio industry experience. Look forward to seeing more demos.

It sounds like a phoneme based speech/singing synthesizer, similar to Yamaha Vocaloid. I wonder how much training data is required to extract the phonemes to create a "voice"



Where'd you get your images? Food-101 plus something else?

We're working on a similar component for our commercial behavioral-economics-driven app suite, targeted toward patients with chronic diseases.

Do you use location as a way to filter the set of possible foods, as in Google's im2calories project/paper last year?

They do some pretty awesome depth calculation stuff too: https://www.google.com/?ion=1&espv=2#q=im2calories+type:pdf

Edit: oops, should have read further. I see that you pulled appropriate terms from WordNet and used ImageNet to gather training images from Flickr. Cool!


Only 4 years behind Bitcasa...


These guys are using similar technology to the Boahen lab's:

http://web.stanford.edu/group/brainsinsilicon/index.html

In fact, I think some of the alumni now work at IBM, on this very project.


Cool analysis. I wonder if you could show something like a LOESS curve fitted across all the articles' timeseries? Or if they're all roughly linear descents, I wonder if you could show the distribution of slopes - do some descend faster than others? Why?

And then, a bone to pick:

Need a beefy RDBMS for 15mm rows? Maybe if you want to store the whole denormalized table in memory, but if you're just indexing a small field (or even partial-indexing a larger field) you should have no problem. The table will just spill to disk and page in as necessary, and you're mostly appending anyway so you shouldn't have much trouble. Plus, you could normalize the data: store the (large) article title in an Articles table with an id (hash of title?) and then just store the ranks in a Ranks table for less overall storage than the NoSQL database (thus needing a less-beefy machine).

Nothing against modern Not-only-SQL solutions or document stores, but don't discount RDBMS. Schemas aren't so scary or unwieldy that you should never use them.

Anyway, thanks for an informative post!


>Need a beefy RDBMS for 15mm rows? Maybe if you want to store the whole denormalized table in memory, but if you're just indexing a small field (or even partial-indexing a larger field) you should have no problem.

Good point. Honestly, I don't have that much experience with using row-based RDBMS for analytics purposes (my background is mostly in finance where folks use expensive proprietary columnar databases) and Hadoop. Any good resources on testing the limits of using MySQL/PostgreSQL for analytics?


I've spoken to friends who've played with billion+ row Oracle RDBMS installs, and we (at Next Big Sound) have an offline snapshot MySQL instance with tables of up to about a hundred million rows (with over a hundred columns).

That said, I agree that distributed columnar stores end up being much more useful for large-scale analytics, and the power of high computation parallelism seals the deal. We've mostly moved on from those snapshot MySQL databases to Impala running on top of our Hadoop cluster, so you're preaching to the choir :)

That said, a hell of a lot of analytics can be done in a properly-structured SQL database, and schema changes aren't a big deal as long you don't need to do them online in a production system.

More info: http://stackoverflow.com/questions/14733462/can-mysql-handle...


Thanks a lot!

Yea, I felt like a total n00b when I came to the web startup world a few years ago. This sounds ridiculous, but one and only database I had used until that point is kdb+ (kx.com). I had no idea about the performance/tradeoffs of any other databases.

I agree with you that properly-structured SQL databases can scale horizontally/vertically. That said, I've noticed that the set of people who know SQL performance well and the set of data analysts/statistically inclined folks do not overlap much (myself included), and frankly, data analysts should be able to focus on analysis, not SQL optimizations.

In a way, this is the problem Impala (and other MPP databases) solves at many companies: it's not that their data analysis cannot be handled with MySQL/Oracle, but it's cheaper and quicker to throw all the data in HDFS and query via Impala (sans some cost associated with setting up/maintaining Impala).


He works for Treasure Data. This post, while providing some information, is most likely a shill for their NoSQL platform.

If not, I genuinely hope the rest of the NoSQL crowd isn't so incredibly ignorant about what a RDBMS is capable of, nor posses such a strong aversion to what would be a very straightforward normalized schema.


You can say it's a bit of advertisement, but I won't call it a "shill". If I were doing that, I would have pretended that I magically stumbled on Treasure Data.

Honestly, for data of this scale, I could have totally used any RDBMS. Scale really would have not been an issue. But I do like the schema-flexibility that Treasure Data provides.

Then again, I could have used MongoDB (and this point is clearly indicated in the post).


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You