Wait, but we're doing that already, and it works well (Qwen 2.5 VL)? If need be, you can always resort to structured generation to enforce schema conformity?
In the demo, O1 implements an incorrect version of the "squirrel finder" game?
The instructions state that the squirrel icon should spawn after three seconds,
yet it spawns immediately in the first game (also noted by the guy doing the demo).
Yeah, now that you mention it I also see that. It was clearly meant to spawn after 3 seconds. Seems on successive attempts it also doesn't quite wait 3 seconds.
I'm kind of curious if they did a little bit of editing on that one. Almost seems like the time it takes for the squirrel to spawn is random.
If you don't have to pay for child care because you can just take off the time to pick up your kids from school, you are saving money your job otherwise forced you to spend.
Being forced to commute at peak time costs more - on a retail salary the difference between a peak and off-peak ticket can mean that the first hour or two of working is essentially pointless, as you're just paying back the cost of getting there in the first place.
Having to pay an extra surcharge to visit the dentist, because you can only go on a Saturday because you need to be at work other days. Flexible working would allow you to just take the Tuesday off no problem and go when it's cheaper.
I'm sure there are lots of other examples that apply to different lifestyles.
hours with your child aren't fungible. you can't pay the babysitter to go see the dance recital for you if you want to be the parent instead of the babysitter being the parent. all the money in the world isn't going to make up for missing the soccer game where your kid makes the winning goal.
Well to be pedantic, with all the money in the world you wouldn't be working for Ikea and the problem wouldn't exist so really that's a problem also solved by money...
Generally though higher paid employees tend to have more sway within a company structure and likely don't need to miss these important events, the win here is that something that was generally true for mid management up for most companies now extends down through all the ranks.
You can manage things in your life when they occur instead of spending money to displace them or risk losing your job because of them.
In another way, if you present a worker with the option between two jobs with the same hourly rate, one having flexible working hours and the other not, which would expect to be more likely choice? You can then measure the value of this choice by changing the hourly rates between the two until you see changes in outcome and you would be able to estimate exactly how much "more money" it appears to be "worth."
It looks like Runpod currently (checked right now) has "Low" availability of 8x MI300 SXM (8x$4.89/h), H100 NVL (8x$4.39/h), and H100 (8x$4.69/h) nodes for anyone w/ some time to kill that wants to give the shootout a try.
You're joking/trolling right? There are literally 10's of thousands of H100s available on gpulist right now, does that mean there's no cloud demand for Nvidia gpus? (I notice from your comment history that you seem to be some sort of bizarre NVDA stan account, but come on, be serious)
In Mixtral 8x7B, the 8 means that the model uses Mixture-of-Experts (MoE) layers with 8 experts. The 7B means that if you were to remove 7 of the 8 experts in each layer, then you would end up with a 7B model (which would have exactly the same architecture as Mistral 7B). Therefore, a 1x7B model has 7B params. An 8x7B model has 1 * 7B + (8-1) * sz_expert params, where sz_expert is some constant value that the MoE layers increase by when adding one expert. In the case of Mixtral 8x7B the model size is 46.3GB, so, sz_expert ≈ 5.6B.
If these assumptions port over to 8x22B, then 8x22B has, at 281GB, sz_expert ≈ 13.8B.