For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more abeppu's commentsregister

So I'm actually confused that in the little image of his run in the article it seems he's often making absolute progress in the opposite direction the ship is going for part of each lap. Like, was the ship going unusually slowly?


In their little algorithm box on Chain Distillation, they have at step 2b some expression that involves multiplying and dividing by `T`, and then they say "where α = 0.5, T = 1.0".

I think someone during the copy-editing process told them this needed to look more complicated?


tl;dr it makes sense once you see there are hidden softmax in there; it's just the explicit formula written out and then applied with the common param value

Bloody hell, I am so unfamiliar with ML notation:

    L = (1 - α) · CE(M_k(x), y) + α · T² · KL(M_k(x)/T ‖ M_{k-1}(x)/T)
So CE is cross-entropy and KL is Kullback-Leibler, but then division by T is kind of silly there since it falls out of the KL formula. So considering the subject, this is probably the conversion from logits to probabilities as in Hinton's paper https://arxiv.org/pdf/1503.02531

But that means there's a hidden softmax there not specified. Very terse, if so. And then the multiplication makes sense because he says:

> Since the magnitudes of the gradients produced by the soft targets scale as 1/T2 it is important to multiply them by T2 when using both hard and soft targets.

I guess to someone familiar with the field they obviously insert the softmax there and the division by T goes inside it but boy is it confusing if you're not familiar (and I am not familiar). Particularly because they're being so explicit about writing out the full loss formula just to set T to 1 in the end. That's all consistent. In writing out the formula for probabilities q_i from logits M_k(x)_i:

    q_i = exp(M_k(x)_i / T) / sum_j exp(M_k(x)_j / T)
Hinton says

> where T is a temperature that is normally set to 1. Using a higher value for T produces a softer probability distribution over classes.

So the real formula is

    L = (1 - α) · CE(softmax(M_k(x)), y) + α · T² · KL(softmax(M_k(x)/T) ‖ softmax(M_{k-1}(x)/T))
And then they're using the usual form of setting T to 1. The reason they specify the full thing is just because that's the standard loss function, and it must be the case that people in this field frequently assume softmaxes where necessary to turn logits into probabilities. In this field this must be such a common operation that writing it out just hurts readability. I would guess one of them reading this would be like "yeah, obviously you softmax, you can't KL a vector of logits".

Good question. I just sort of skipped over that when reading but what you said made me think about it.


the T stands for tea :)


Ah, so it's a source of randomness! Presumably 1.0 corresponds to a really hot cup of fresh tea.


I get that the norms lean conservative and that's a good thing. But if someone says you should do a recall and the actual lab tests saying whether your product actually has toxin-producing bacteria haven't finished running yet, I can understand the desire to wait until the evidence is in.


They've got some evidence, 7 known cases over three states all linked to the same product. The history of problems from this producer makes it seem more likely to be true. A lot of companies would rather have their customers throw away their product and buy it again from a different batch than risk having their customers get violently sick or dead from their food because the people who get sick and survive can end up with a very strong aversion to the brand and/or product going forward, and voluntarily recalling the product just to be safe is good from a PR stance since it looks like you actually care about your customers.


I think some of it was just a belief that work you can see being done by a floor of people talking with their mouths and looking at screens in the same room is more real than the slightly less visible conversations in slack while looking at screens in their own rooms.

Open plan offices continue to be designed more for seeing the work happen than for doing the work. I spend a lot of mental energy on ignoring the distractions around me. No job has ever offered me a private office with a door that closes in exchange for being in the office 5 days a week.


> Meanwhile, you're a week behind on that Jira ticket for issuing JSON Web Tokens because there's no actively maintained JWT libraries for Gooby.

> I've seen buggy pre-1.0 libraries used for critical production softwareNot "critical" as in "if it goes down it's really annoying", but "critical" as in "checks and stores passwords" or "moves money from one bank account to another". simply because it was a wrapper library written in a functional language, as opposed to a more stable library written in an imperative language.

I do think if you're using a niche language in an industrial capacity, you need to know how to work with libraries outside the language. Admittedly this is easier for some languages than others. But I've run into candidates who wanted to interview in clojure but who didn't know how to call into pre-provided java libraries.


This. Almost all languages can at least call into c. Which should allow you to do whatever you want.


I don't have any deep background in econ, but do we not need to switch from talking about GDP to talking about a version of Net Domestic Product where "net" includes:

- changes to the value of natural and ecosystem resources (e.g. if I clear a forest to sell timber, we must acknowledge some lost value for the forest)

- amount of economic transactions in service of mitigating problems created by externalities from other activity (e.g. if my pollution gets into your groundwater, you paying to remediate the pollution isn't "value created")

I.e. growth of _actual net value_ still sounds like a good thing to pursue but we let our politicians run around doing anything to maximize GDP without talking about what the "gross" is hiding.


Also this isn't only a gap about environmental issues. If you pay X for child daycare and work but only make X+taxes, and have a dumb pointless job, GDP says the economy is at least 2X+tax larger than if you took care of your own child during the day (bc your employer paid you and you paid your daycare). This seems dumb at an accounting level, even before we consider that you probably get a greater emotional benefit from being with your child than does the daycare worker.


But they’re not measuring nor optimizing for the contentedness of you, or your kids.

They’re measuring money generated for shareholders, they’re measuring tax base.


I think the point they were trying to make is that they are measuring and optimizing for the wrong thing


I am agreeing with that point,

and providing speculation why it’s unlikely to change.


It's just very hard to measure.

On the corporate scale, see the whole carbon / ESG / impact measure ent industry. Lifecycle Analysis, supply chain extrapolation, Bill of Materials analysis.

You only get some relatively crude estimate and a lot of missing data points, whereas economic growth can conveniently assign a dollar value on everything.

I think it only gets worse as you scale up.


As an example, a forest managed for productivity won't really lose value from a harvest.

You'd have to price the conversion of it to that management strategy.


But we have a lot of sources of information already available that do not seem to be incorporated into any kind of top-level number that we grade ourselves on.

- when we have an estimate of how many hundreds of billions it costs to rebuild after a hurricane that would not have happened but for climate change, existing economic processes generate that number

- when insurers raise rates throughout a region, this reflects an expectation on the cost of damage, and the change over time reflects the increase in risk we've created

- when a heatwave kills a bunch of people, we already have a range of ways of estimating a monetary value for those lives from insurance, healthcare and liability litigation.

Further ... suppose your elderly relative left you a bunch of jewelry. You don't know how much it's worth and getting it appraised can actually be a bit complicated and doesn't give you complete certainty over value. But it would be _bonkers_ to continually take unappraised jewelry out into the marketplace, liquidate it, and pretend that the whole sales price was _earnings_. After the transaction, you don't have a thing you had before. You didn't know what it was worth initially but that doesn't mean that it was worthless, and you probably got scammed. Yes, measuring the full environmental impact of all our industries is hard, but pretending it's 0 is silly.


It's kind of an accounting problem. What you really want is human happiness and abundant nature but doing some gardening and playing with the kids may produce happiness but no GDP whereas enlarging a chemicals plant to make even larger SUVs gives much GDP.

Trouble is it's hard to account for that kind of stuff but maybe we could make a flawed but functional accounting thing with AI?


A huge part of such stuff is deliberately hidden to avoid getting the government too involved in day to day lives.

Case in point: for a while we had an arrangement with our neighbour that we'll pick up their child from preschool and stay with her until her parents get home and in exchange they would prepare dinner for us.

No money exchanged hands, so no GDP generated, yet everyone's quality of life improved.


I guess a lot of the 'free market' stuff is also about avoiding too much government involvement. It tends to be a pain the neck when you have to fill tax returns and apply for permits.


Mark Carney's book "Values" pitches a system such as this.

In better times, perhaps we have the collective will to try.


You should also include who is profiting. Is it the wealthiest 1% or is it the entire population.


> Moreover is the fact that they're 100% automated a material fact to the consumer?

I do think that for a meaningful fraction of first time customers, the choice to try it is about the novelty of it being automated. In SF I do often see people explaining waymo to out of town visitors, and the uniqueness of "driverless" vs "remote controlled" is part of the appeal.


But that's not what they're paying for. You're hoping to get the automated experience but you aren't paying for the automated experience. This is like going to Hooters to buy a meal and then suing because the girl you wanted to see didn't serve you.


https://x.com/Waymo/status/1890083513531084973

Here's a waymo ad from a year ago. In like 10 seconds they repeat "it's driving itself" 3 times.

https://www.youtube.com/watch?v=0kJPDg207oc

Here's another one. The closing screen says "Autonomous rides 24/7". They talk about the robot

Here's a blogpost from 2021 in which they insist that their messaging from there forward will talk about "fully autonomous driving", and not merely self-driving. https://waymo.com/blog/2021/01/why-youll-hear-us-say-autonom...

Here's a post from this year where as part of their expansion to new cities they say " we continue our accelerated growth and welcome the first public riders into our _fully autonomous_ ride-hailing service in four new cities" (emphasis mine). https://waymo.com/blog/#:~:text=Waymo%20will%20begin%20fully...

I haven't read the TOS in the app and I'm sure they didn't legally commit that no human will ever be involved even in unusual circumstances (which would probably be irresponsible). But they have been advertising on the basis of being autonomous, they're presenting that as part of their value prop to new users. Maybe it's up to lawyers to decide whether that's "material". But they are repeatedly, loudly, proudly advertising and marketing on the basis of it being fully autonomous.


April 1 is an in interesting choice for a big event that will be news if it goes well and bigger news if it goes badly


They don't really have a choice. The launch window is small and they either make it or they don't.


There is a window on the 2nd. But you don't aim for the second half of the launch period and hope you make it, you aim for the start to allow time to resolve issues without waiting for the next window (which is the end of the month).


What factors are there for the lunar launch window?

It can't be weather, here, right? That's too far ahead.

Is it perigee?

If this window is missed, when is the next one?


The position of the moon relative to the earth and the sun. The windows are about a month apart.


Well at least there’s a 50% probability of success


"April fools, your space shuttle just disintegrated!"


I remember hearing somewhere on this site that medical imaging got pretty good at building systems that recycle helium. Does chip manufacturing not do this or are the losses at their scale are still large enough that you need a substantial constant supply?


The big problem is purity. Fabs use grade 5 and 6 helium where contaminants are 1-10 parts per billion. The infrastructure to get it that pure becomes very specialized and any time the helium goes through a process it picks up so much contamination that recycling it would require the entire purifying and quality control infrastructure for pressure or temperature swing adsorption.

Some fabs are starting to reuse helium in downstream processes but there’s only so much they can do without expanding their core competency into yet another complex chemical manufacturing process.

MRI machines don’t need high purity helium and the contamination doesn’t “gunk up” all the tools so it’s not an issue to recycle it there.


Now I'm imagining a procedural cop show where they bust an illegal helium dealer, and one of the cops takes a huff to gauge what they're dealing with, and then squeaks out "that's the good stuff".



> The infrastructure to get it that pure becomes very specialized

I think some of the most advanced fab infrastructure is the ultra pure water system. Water becomes quite aggressive chemically when it has no dissolved ions in it. You have to use exotic or highly processed materials simply to transport it around. If the factory didn't need such massive quantities of it, trucking it in would likely be preferable.


I guess you watch asianometry?

https://www.youtube.com/watch?v=C3RzODSR3gk


No. I worked at Samsung Austin Semiconductor during a large UPW upgrade project many years ago.


Ultrahigh density polyethylene isn't that exotic.


What about microplastics?


What about them?


They are toxic.


Well, good thing you aren't drinking it then, because the complete lack of electrolytes would kill you far faster than the microplastics. Surely if they can chemically purify the water to chip-making standards they can filter out the microplastics (when they are done with it)? At least one can hope.


From the article I thought the helium was used mostly for cooling (where I imagine the purity wouldn't be that important)

But what other processes do the fabs use the helium for then?


It is used a lot for cooling but in many systems its used without a classical heat exchangers where the helium is isolated from the workpiece.

Helium is pumped beneath the wafer to keep it cool so any impurities can leak through the chuck seal into the chamber above and disrupt the process. It’s also very precisely controlled so impurities change the uniformity of the thermal conductivity of the gas, creating hot spots on the wafer.

In EUV it’s used to both to cool the optics and as a buffer gas to manage debris from the plasma so any contaminants can deposit on the optics. At 13.5nm even a single layer of hydrocarbon molecules can create problems and the light bounces many times between mirrors so the error compounds.

There are many places where helium doesn’t have to be as pure but contamination events and surprise maintenance are so expensive that it’s not worth the extra savings (or the risk of mislabling and using dirty helium in the sensitive parts).


Thanks a lot for the info. Yeah, it makes sense that if you need pure helium anyway, you probably wouldn't create an entire second supply chain for impure helium.


You seem to have a very deep knowledge of aluminum and helium supply chains. Curious, what industry do you work in?


Do we have a process to make new helium from hydrogen?


If you come up with a process to do that efficiently, the helium will be a lovely bonus but not remotely the most important result. :D


If you want to make new helium, it's far easier to go the other way.

You just need quite a bit of Polonium, Thorium or Radon. Put it in a pool - and then wait a while. You just gotta collect what bubbles to the surface.


Yeah, but it gets quite warm


Nuclear fusion?


We usually take it from natural gas deposits instead.


Some of the fabs do recycle as effectively as they can, but MRIs use it in a single process, in liquid form, in a relatively constrained container. Fabs use it for a variety of processes, ranging from wafer cooling to purging environments, to making ultra ultra clean chambers. The scale of what they use is higher, too, so even if an individual process is more efficiently recapturing helium, they might go through a few tons a day, with an MRI only using a few liters and losing 5% or less.


Also fab companies have had to learn to be incredibly conservative about perceptively meaningless changes.

Most famously illustrated by Intel's "Copy Exactly!" methodology. https://duckduckgo.com/?q="copy%20exactly"+Intel

An adjacent IBM story that kinda explains why:

  During the year 1986, there was an anomalous increase in LSI memory problems. Electronics in early 1987 appeared to have problem rates approaching 20 times higher than predicted. In contrast, identical LSI memories being manufactured in Europe showed no anomalous problems. Because of knowledge of the radioactivity problem with the Intel 2107 RAMs, it was thought that the LSI package probably was at fault, since the IBM chips were mounted on similar ceramic materials. LSI ceramic packages made by IBM in Europe and in the U.S. were exchanged, but the European computer modules (with European chips and U.S. packaging) showed no fails, while the U.S. chips with European packages still failed at a high rate. This indicated that the problem was undoubtedly in the U.S.-manufactured LSI chips. In April 1987, significant design changes had been made to the memory chip with the most problems, a 4Kb bipolar RAM. The newer chip had been given the nickname Hera, and so at an early stage the incident became known as the "Hera problem."
  By June 1987, the problem was very serious. A group was organized to investigate the problem. The first breakthrough in understanding occurred with the analysis of "carcasses" from the memory chips (the term carcasses refers to the chips on an LSI wafer which do not work correctly, and are not used but saved in case some problem occurs at a future time). Some of these carcasses were shown to have significant radioactivity.
  Six weeks was spent in the manufacturing process lines, looking for radioactivity, and traces were found inside various processing units. However, it could not be determined whether these traces came from the raw materials used, or whether they were transferred from the chips themselves, which might have been contaminated earlier in their processing. Further, it was discovered that radioactive filaments (containing radioactive thorium) were commonly used in some evaporators. A detailed analysis by T. Zabel of some of the "hot" chips revealed that the radioactive contamination came from a single source: Po210 This isotope is found in the uranium decay chain, which contains about twelve different radioactive species. The surprising fact was that Po210 was the only contaminant on the LSI chips, and all the other expected decay-chain elements were missing. Hundreds of chips were analyzed for radioactivity, and Po210 contamination was found going back more than a year. Then it was found that whatever caused the radioactivity problem disappeared on all wafers started after May 22, 1987. After this precise date, all new wafers were free of contamination, except for small amounts which probably were contaminated by other older chips being processed by the same equipment. Since it takes about four months for chips to be manufactured, the pipeline was still full of "hot" chips in July and August 1987. Further sweeps of the manufacturing lines showed trace radioactivity, but the plant was essentially clean. The contamination had appeared in 1985, increased by more than 1000 times until May 22, 1987, and then totally disappeared!
  Several months passed, with widespread testing of manufacturing materials and tools, but no radioactive contamination was discovered. All memory chips in the manufacturing lines were spot-screened for radioactivity, but they were clean. The radioactivity reappeared in the manufacturing plant in early December 1987, mildly contaminating several hundred wafers, then disappeared again. A search of all the materials used in the fabrication of these chips found no source of the radioactivity. With further screening, and a lot of luck, a new and unused bottle of nitric acid was identified by J. Hannah as radioactive. One surprising aspect of this discovery was that, of twelve bottles in the single lot of acid, only one was contaminated. Since all screening of materials assumed lot-sized homogeneity, this discovery of a single bad sample in a large lot probably explained why previous scans of the manufacturing line had been negative. The unopened bottle of radioactive nitric acid led investigators back to a supplier's factory, and it was found that the radioactivity was being injected by a bottle-cleaning machine for semiconductor-grade acid bottles. This bottle cleaner used radioactive Po210 material to ionize an air jet which was used to dislodge electrostatic dust inside the bottles after washing. The jets were leaking radioactivity because of a change in the epoxy used to seal the Po210 inside the air jet capsule. Since these jets gave off infrequent and random bursts of radioactivity, only a few bottles out of thousands were contaminated.
An excerpt from: Ziegler, James F., et al. "IBM experiments in soft fails in computer electronics (1978–1994)." IBM journal of research and development 40.1 (1996): 3-18

Polonium is debuggable. More subtle statistical aberrations would be exponentially harder.


this story would make a killer asianometry video


CSI parody style?

I'm most familiar with software and home electronics debugging, but it would be wonderful to hear some stories from other disciplines where a culprit is found, and also about the forensic tools specific to other domains.


Good to find another fan of asianometry channel ;)

I agree, this story above would be a perfect for another asianometry document.


What a horror story. Incredible detective work.


> That breaks down when there isn't open discussion on campus. Communists were jeered but essentially allowed on campus in the 60s and 70s, even at the height of the cold war.

I think that's a misleading telling of the history. During the 40s and 50s a lot of people were fired for suspected or real links to communism and some schools even demanded loyalty oaths. Courts struck down a bunch of laws that were used to fire people but many rulings didn't land until the 60s. Angela Davis was famously fired in 1969.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You