More

dthal · on Nov 12, 2015

A lot of people are going to dump on this billion dollar valuation, but we don't have the terms. Presumably there is liquidation preference. According to Crunchbase [1], the total capital raised is about $160 million after this money, so the investors don't need this company to be worth $1 billion to come out fine. Sam Altman made the point a few days ago that a lot of these late-stage private financings are kind of debt-like [2, about paragraph 8-9 and footnote #2], with the result that the valuations aren't really meaningful.

[1] https://www.crunchbase.com/organization/udacity#/entity

[2] http://blog.samaltman.com/the-tech-bust-of-2015

dthal · on Nov 12, 2015

I wouldn't be so down on vocational training for software...It is a field that has an unusually high requirement for ongoing learning. And when you have to learn some new tech, usually you have to do it now, and in-place, not next September, in the nearest college town. Given the high ongoing learning requirements that software has, I think there is a pretty clear need for training that's delivered where you are and when you need it, and traditional schools won't be able to do that.

ghaff · on Nov 12, 2015

Oh. To be clear. I think on-going training is great especially given the pace of change. And things like automated evaluation of code is something that actually works pretty well in MOOCs in my experience. (Unlike peer grading.) However, the argument for platforms like Udacity hasn't in general been that they're a great advancement for vocational training but that they disrupt higher ed generally. And that's probably a different valuation metric.

cwyers · on Nov 12, 2015

That depends on how sensitive higher ed is to competition from vocational training for things like programmers, right? Something like Udacity isn't going to replace a traditional CS degree, but something more like an IT degree could just be high-end vocational training offered by a traditional four-year institution, and as college costs rise, it could start to eat into the lower end.

cortesoft · on Nov 12, 2015

I feel that the ongoing learning requirements that software has lead to developers becoming VERY good at learning new software.... which means they don't need training, because they can self-teach. At least that has been my experience.

waterlesscloud · on Nov 12, 2015

Sure, but if they can spend $50 or $100 for an online course as part of their "self-teaching", then a goodly percentage probably will.

dthal · on Nov 5, 2015

How are people who are starting from zero on this supposed to understand it in 60 days? This disclosure only a little more than a transparency fig leaf.

azernik · on Nov 5, 2015

By the usual means that complicated issues are explained to the public - professional journalists poring over it for clauses of public concern. How many people actually read through the ACA, or the myriad budgets and semi-budgets passed in the US over the past decade with much even shorter public-disclosure periods than this?

dthal · on Nov 5, 2015

The ACA wasn't negotiated in secret. Budgets are normally continuing resolutions; they only negotiate the delta from previous years, which is usually small. And again, budget negotiations don't involve years of secret negotiations.

azernik · on Nov 5, 2015

I'm sorry, continuing resolutions are now our ideal? And even then, the full process is on a similar timescale to just the public disclosure period of this treaty.

Same with respect to full budgets - they're long, they're complicated, but the period during which they're publicly available is on a similar timescale to this 60-day period.

dthal · on Nov 5, 2015

>>I'm sorry, continuing resolutions are now our ideal?

I did not say that continuing resolutions are an ideal. My point was that although budgets are complex, only a relative small part of them changes from one cycle to the next. People don't have to digest the whole budget, because they already know what was in it before. They only have to understand the changes.

dthal · on Nov 3, 2015

The point about the late-stage investments being not-really-equity is a great point. So my question is: why do financial journalists just about always miss it when they write about tech unicorns?

dthal · on Oct 31, 2015

In the post, PG states that First Round's study is evidence of gender bias in VC financing. But footnote [2] is important: Uber was excluded as an outlier. Now...excluding Uber is reasonable (it is sort of an outlier), but so is not excluding it (it was a company that First Round invested in). When the conclusion from a data analysis depends on which way you go on something like this - which of two reasonable alternatives you pick - then the results are fragile and they don't really support either conclusion very well.

dthal · on Sept 20, 2015

I've played around with this some, and the recognition isn't perfect, but I'm very impressed with how well it can pick up a new pattern from even just one example.

dthal · on Aug 21, 2015

It seems like they should really call that top-line, $10 bn, number 'bookings' and the second, $2 bn, number 'revenue'.

dthal · on Aug 18, 2015

It seems that the main issue here is that with pre-registration, study authors have to pick a single measure of primary benefit at the outset, whereas before, they might have made that choice after getting results back. The original study is at PLoSONe, and it is not a difficult read[1]. From that source:

>>Prior to 2000, investigators had a greater opportunity to measure a range of variables and to select the most successful outcomes when reporting their results... Among the 25 preregistered trials published in 2000 or later, 12 reported significant, positive effects for cardiovascular-related variables other than the primary outcome.

That is, in most cases, there are large effects for some outcome, and if they get to choose the primary outcome after looking at some results, they could have been cherry-picking the outcome variables.

[1] http://journals.plos.org/plosone/article?id=10.1371/journal....

jrochkind1 · on Aug 19, 2015

> It seems that the main issue here is that with pre-registration, study authors have to pick a single measure of primary benefit at the outset, whereas before, they might have made that choice after getting results back

Right. And then you'd have to use proper statistical reasoning for that state of affairs. Which nobody ever does, cause they're not statisticians and it's complicated and it would reduce the chance of 'statistical significance'.

So they just use a standard calculation of statistical significance -- which is based on the assumption that you have picked a single hypothesis in advance and then done your test. So it's completely invalid to use it how everyone typically does.

Imagine you flip a coin 50 times. Then you see, okay, did I ever get 10 heads in a row? Nope? Okay, how about 5 heads followed by 5 tails? Nope. Okay.... try a couple dozen other things, oh, look, I got exactly 3 tails followed by exactly 3 heads followed by exactly 3 tails again! Let's run my test of statistical significance to see if that was just chance, or is likely significant -- oh hey, it's significant, this is likely a magic coin not random at all!

Nope. If you test everything you can think of, _something_ will come up as 'statistically significant', but it's not really, those tests of statistical significance -- which calculate how likely it is the results you got happened by random chance happenstance vs an actual correlation likely to be repeatable -- are no longer valid if you go hunting for significance like that.

bunderbunder · on Aug 18, 2015

if they get to choose the primary outcome after looking at some results, they could have been cherry-picking the outcome variables

Not just could, would. Choosing your hypothesis after you run the experiment is (or at least should be) a cardinal sin in science for good reason. At the standard p-value cutoff of .05, even when there's absolutely no effect going on the probability of getting a spurious positive result when you do n comparisons is equal to 1 - (.95^n).

So that 5% chance of a type I error if you only look at one test statistic jumps to 40% if you look at ten, and to 72% if you look at 25.

Here's a nice piece of gonzo journalism that deals with this issue: http://io9.com/i-fooled-millions-into-thinking-chocolate-hel...

function_seven · on Aug 18, 2015

I have a question. Would I be on firm statistical footing if I started looking for effects after the study, as long as I choose a p-value such that (1-((1-p)^n) < 0.05?

In other words, I run a study to determine if jellybeans cause acne[1]. The result is inconclusive (p < .05). Now--after the results are collected--I wish to check the correlation between color and acne. There are 20 colors. Would it be statistically sound to "correct" for my cherrypicking by setting p = 0.0025? That would result in

    1 - (0.9975^20) ≈ 0.049

[1] https://xkcd.com/882/

bunderbunder · on Aug 18, 2015

It sounds like what you're describing is roughly the same as the Bonferroni correction: https://en.wikipedia.org/wiki/Bonferroni_correction

It's no panacea. When you use it you reduce your statistical power - in other words, your likelihood of detecting genuine effects goes down.

conistonwater · on Aug 18, 2015

> Would I be on firm statistical footing if I started looking for effects after the study, as long as I choose a p-value such that (1-((1-p)^n) < 0.05?

No. For one thing, how do you actually know n? If in your analysis you can make c independent binary choices, n would be on the order of 2^c. For most reasonable sequence of choices, n would be impractically large. And if you think you know n, how would you convince everyone else that your value of n is reasonable/trustworthy?

The xkcd case (which is solved by applying the Bonferroni correction) is special because there's only one test run in 20 experiments, so the correction is more straightforward.

There is a good paper about this garden of forking paths by Gelman and Loken [1].

[1] (pdf) http://www.stat.columbia.edu/~gelman/research/unpublished/p_...

Alex3917 · on Aug 18, 2015

> At the standard p-value cutoff of .05, even when there's absolutely no effect going on the probability of getting a spurious positive result when you do n comparisons is equal to 1 - (.95^n).

That's only the probability of getting a positive result by chance; the total probability of getting a positive result even if there is no effect is much higher.

rcthompson · on Aug 18, 2015

"Probability of getting a positive result by chance" and "probability of getting a positive result when there is no effect" are the same thing. This is what the p-value measures. (The full phrase is generally a combination of the two: "probability of getting a positive result by chance when there is no effect"; but that's so long that people shorten it.)

Alex3917 · on Aug 18, 2015

There are lots of reasons you could get a false positive result that have nothing to do with chance: e.g. non-representative sample, miscalibrated equipment, biased methodology, researcher fraud, flawed statistical analysis, etc.

C.f. http://blog.minitab.com/blog/adventures-in-statistics/how-to...

rcthompson · on Aug 18, 2015

Indeed, but the comment that you quoted was referring to the case where the methodology and p-values are sound, and the only issue is testing multiple hypotheses without correction.

jsprogrammer · on Aug 18, 2015

Cardinals and sin are the realm of religion. Not sure why it's being brought up here.

Choosing a hypothesis after the experiment is run is perfectly valid as long as your experiment is valid for that hypothesis. Besides, you would always run a new experiment again anyway.

bunderbunder · on Aug 18, 2015

So perhaps it's worthwhile to trot out the idea of exploratory vs. confirmatory research.

In exploratory research you collect a bunch of data, and then mine it for interesting associations that might merit further study. It's an essential part of the scientific process, but anything you find from doing it needs to be treated as extremely tentative because it's liable to produce spurious results at least as often as it finds genuine effects.

But that's not what this paper's talking about. It's talking about experiments that are being used to support the approval of new treatments and drugs. That's confirmatory research. In that realm you absolutely must paint the target on the wall before you throw your darts.

jsprogrammer · on Aug 18, 2015

Those are ways of looking at it, but experiments have definite structures and that structure can be exploited to create experiments that potentially reveal more information than another experiment.

If the experiment is constructed correctly, it can support validation of multiple hypotheses.

I would completely believe that most experimentation being performed currently is not structured to make further exploitation possible.

nitrogen · on Aug 19, 2015

I would completely believe that most experimentation being performed currently is not structured to make further exploitation possible.

Could you provide an example of an experimental structure that is immune to the statistics being discussed?

jsprogrammer · on Aug 19, 2015

I should have clarified that statement as I don't think it's quite correct.

It should read:

I would completely believe that most experimentation being performed currently is not structured to make further exploitation, of the type desired/wished, possible.

There are almost certainly facts available that are not discovered/discussed from past experiments. Many of them are likely trivial and/or not what researchers would wish or hope that their data could tell them. However, they can still be mined.

In any case though, you would still re-run experiments to further validate/reject the hypotheses. That is simply basic science.

thedufer · on Aug 18, 2015

If you choose your hypothesis after running the experiment, you can almost guarantee a positive outcome, regardless of whether there's any effect.

Are you claiming that this statement is untrue, or irrelevant?

> Besides, you would always run a new experiment again anyway.

Then surely the initial experiment doesn't count. Otherwise, why are you re-running it?

jsprogrammer · on Aug 18, 2015

Yes, you can always choose any hypothesis you want. It's largely irrelevant.

Every experiment will support analysis through a set of hypotheses. Just because you didn't select all of those hypotheses before the experiment ran doesn't mean you can't select it after the experiment.

Imagine that an experiment has been run, but you do not know the results (or even what was done). Now you select a hypothesis, if the experiment required to validate the hypothesis is the same as what was run previously, you can now look at and use the results.

A hypothesis is like running a query against a database. Many queries are valid, even though the data may not have changed.

>Otherwise, why are you re-running it?

Science requires it. Doctrine from one-off experimentation is religion (hard to dump).

obastani · on Aug 18, 2015

It's fine if you run a separate experiment to justify the hypothesis. If you choose the hypothesis after conducting the experiment, then the p-values you obtain for that hypothesis are invalid.

avereveard · on Aug 18, 2015

no and here is why https://xkcd.com/882/

narrator · on Aug 18, 2015

To be fair, Viagra was originally being developed as a high blood pressure medication. They decided to switch to erectile dysfunction when they found out why study participants were hoarding the pills.

A lot of science is about serendipity.

alephnil · on Aug 18, 2015

Pre-registration would not hinder them doing that discovery, but they would have had to do another registration and study to get it to the market with the new use. They would likely still be able to use the outcome of the original study to assess the safety of the drug.

zmmmmm · on Aug 19, 2015

Which actually makes an important but subtle point: just because the rate of positive effects identified went down doesn't mean the effects that would otherwise have been identified were all false. It just means we don't know. There may be extremely strong evidence of effects that would overcome any amount of multiple testing correction, but they still aren't allowed to use it. It might take years or even decades for them to do a follow up study to validate the result, which means a significant number of people are harmed by not having access to the drug in the interim. Just playing devil's advocate to make the point that it's not a given that we're getting an better overall outcome by being this stringent.

eru · on Aug 19, 2015

> [...], which means a significant number of people are harmed by not having access to the drug in the interim.

Yes, that might happen. The more likely outcome though is that we are saved from a lot of drugs that don't work any better than chance (or even worse).

dthal · on Aug 18, 2015

Fair enough... but the issue here is about findings of an effect versus no effect. It's about statistical significance. In order for single-comparison p-values and the like to be valid, there has to be a (one) single comparison. There is a way to do 'any-of-k' testing, but the required effect sizes get larger.

dthal · on Aug 18, 2015

Is this dataset available anywhere? I see the code on github, but I don't see the data anywhere.

dthal · on Aug 18, 2015

The blog post seems to be getting modified at this moment. When I first saw it, it didn't have anything about the misclassifications, but that has been added now.

HN For You