> I don't know how to force this issue as a European. There are just too many levels of abstraction between me and Brussels.
> EU moves so much faster when it comes to regulations like forcing all of us in Denmark to use timesheets, annoying lids on our bottles, and invasive surveillance laws.
Rediscovering the principle of subsidiarity from first principles...
> I'll need to investigate further but it doesn't seem promising.
That's what I meant by "waiting a few days for updates" in my other comment. Qwen 3.5 release, I remember a lot of complaints about: "tool calling isn't working properly" etc.
That was fixed shortly after: there was some template parsing work in llama.cpp. and unsloth pulled out some models and brought back better one for improving something else I can't quite remember, better done Quantization or something...
The model does call tools successfully giving sensible parameters but it seems to not picking the right ones in the right order.
I'll try in a few days. It's great to be able to test it already a few hours after the release. It's the bleeding edge as I had to pull the last from main. And with all the supply chain issues happening everywhere, bleeding edge is always more risky from a security point of view.
There is always also the possibility to fine-tune the model later to make sure it can complete the custom task correctly. But the code for doing some Lora for gemma4 is probably not yet available. The 50% extra speed seems really tempting.
(Comparing Q3.5-27B to G4 26B A4B and G4 31B specifically)
I'd assume Q3.5-35B-A3B would performe worse than the Q3.5 deep 27B model, but the cards you pasted above, somehow show that for ELO and TAU2 it's the other way around...
Very impressed by unsloth's team releasing the GGUF so quickly, if that's like the qwen 3.5, I'll wait a few more days in case they make a major update.
Overall great news if it's at parity or slightly better than Qwen 3.5 open weights, hope to see both of these evolve in the sub-32GB-RAM space. Disappointed in Mistral/Ministral being so far behind these US & Chinese models
Gemma 4 31B has now wiped out several of those models from the pareto frontier, now that it has pricing. Gemma 4 26B A4B has an Elo, but no pricing, so it still isn't on that chart. The Gemma 4 E2B/E4B models still aren't on the arena at all, but I expect them to move the pareto frontier as well if they're ever added, based on how well they've performed in general.
That Pareto plot doesn't seem include the Gemma 4 models anywhere (not just not at the frontier), likely because pricing wasn't available when the chart was generated. At least, I can't find the Gemma 4 models there. So, not particularly relevant until it is updated for the models released today.
> Very impressed by unsloth's team releasing the GGUF so quickly, if that's like the qwen 3.5, I'll wait a few more days in case they make a major update.
Same here. I can't wait until mlx-community releases MLX optimized versions of these models as well, but happily running the GGUFs in the meantime!
absolute n00b here is very confused about the many variations; it looks like the Mac optimized MX versions aren’t available in Ollama yet (I mostly use claude code with this)
the benchmarks showing the "old" Chinese qwen models performing basically on par with this fancy new release kinda has me thinking the google models are DOA no? what am I missing?
That's not using tech that you're describing here. You're talking about literally learning some basic computer skills (such as word processor, excel, reading email, some basic website building, use printer, and some amount of programming)
For those, obviously you need a computer and completely agree that those are important skills to learn... But you maybe need to spend 1h/week during last 2 years of middle school on those at the computer lab (as it's been done since the 90s in many schools around the world)
But for any other course such as Math, English (or whichever primary language in your country), second languages, history, etc. : that's where using tech is a mistake
A bit of tech is ok, but it cannot be "everyone does their homework and read lesson on a iPad/Chromebook"
I am pretty skeptical about the value of learning to build websites. I think it is too tempting for students to devote significant time to something that is not foundational knowledge and where they won't get any valuable feedback anyway.
It makes me think back to my writing assignments in grades 6-12. I spent considerable time making sure the word processor had the exact perfect font, spacing, and formatting with cool headers, footers, and the footnotes, etc. Yet, I wouldn't even bother to proofread the final text before handing it in. What a terrible waste of a captive audience that could have helped me refine my arguments and writing style, rather than waste their time on things like careless grammatical errors.
Anyway, I do agree with the idea of incorporating Excel, and even RStudio for math and science as tools, especially if they displace Ed-tech software that adds unnecessary abstractions, or attempts to replace interaction with knowledgeable teachers. One other exception might be Anki or similar, since they might move rote memorization out of the classroom, so that more time can be spent on critical thinking.
Building websites, I agree has little value, but using it as a way to explain basics of how the web works I think is pretty valuable. Web likely isn't going anywhere for a long time, having some basic knowledge of how it works I think very useful for a lot of people. I hate the idea of any more MS apps like Excel being regularly incorporated, but basic usage of something similar definitely can help know of how to use a useful tool/computer skill. Even in the early 90's we had computer labs for learning computer skills which I think there is value. But forcing tech everywhere into teaching is an issue IMO.
The beautiful thing about programming (which also makes edtech such an appealing dream to chase) is that you get immediate feedback from the computer and don't have to wait for someone whose attention is at least semi-scarce to mark your paper.
re: Anki. It is not as optimized but you can do SRS with physical flash-cards.
* Have something like 5 bins, numbered 1-5.
* Every day you add your new cards to bin nr. 1 shuffle and review. Correct cards go to bin nr. 2, incorrect cards stay in bin nr. 1.
* Every other day do the same with bin nr. 1 and 2, every forth with bin nr. 1, 2 and 3 etc. except incorrect cards go in the bin below. More complex scheduling algorithms exist.
* In a classroom setting the teacher can print out the flashcards and hand out review schedule for the week (e.g. Monday: add these 10 new cards and review 1; Tuesday: 10 new cards and review box 1 and 2; Wednesday: No new cards and and review box 1 and 3; etc.)
* If you want to be super fancy, the flash card publisher can add audio-chips to the flash-cards (or each box-set plus QR code on the card).
Would it be a mistake to use Desmos in a math classroom, or 3Blue1Brown style animations, to build up visual intuition? Should we not teach basic numerical and statistical methods in Python? Should kids be forced to use physical copies of newspapers and journal articles instead of learning how to look things up in a database?
I'm all for going back to analog where it makes sense, but it seems wrongheaded to completely remove things that are relevant skills for most 21st century careers.
> Would it be a mistake to use Desmos in a math classroom, or 3Blue1Brown style animations, to build up visual intuition?
I don't think there's anything wrong with showing kids some videos every now and then. I still have fond memories of watching Bill Nye.
> Should we not teach basic numerical and statistical methods in Python?
No. Those should be done by hand, so kids can develop an intuition for it. The same way we don't allow kids learning multiplication and division to use calculators.
>> Should we not teach basic numerical and statistical methods in Python?
> No. Those should be done by hand, so kids can develop an intuition for it. The same way we don't allow kids learning multiplication and division to use calculators.
I would think that it would make sense to introduce Python in the same way that calculators, and later graphing calculators are introduced, and I believe (just based on hearing random anecdotes) that this is already the case in many places.
I'm a big proponent of the gradual introduction of abstraction, which my early education failed at, and something Factorio and some later schooling did get right, although the intent was rarely communicated effectively.
First, learn what and why a thing exists at a sufficiently primitive level of interaction, then once students have it locked in, introduce a new layer of complexity by making the former primitive steps faster and easier to work with, using tools. It's important that each step serves a useful purpose though. For example, I don't think there's much of a case for writing actual code by hand and grading students on missing a semicolon, but there's probably a case for working out logic and pseudocode by hand.
I don't think there's a case for hand-drawing intricate diagrams and graphs, because it builds a skill and level of intimacy with the drawing aspect that's just silly, and tests someone's drawing capability rather than their understanding of the subject, but I suppose everyone has they're own opinion on that.
That last one kind of crippled me in various classes. I already new better tools and methods existed for doing weather pattern diagrams or topographical maps, but it was so immensely tedious and time-consuming that it totally derailed me to the point where I'd fail Uni labs despite it not being very difficult content, only because the prof wanted to teach it like the 50s.
Fwiw calculators were banned in my school. Only started to use one in university - and there it also didnt really help with anything as the math is already more complex
I was allowed to use calculators when I started algebra in seventh grade.
I found that calculators didn't help all that much once you got into symbolic stuff. They were useful for the final reductions, obviously, but for algebra the lion's share of the work is symbolic and at least the relatively cheap two-line TI calculator I was using couldn't do anything symbolic.
I know that there are calculators that can do Computer Algebra System stuff, and those probably should be held off on until at least calculus.
Until most kids are about 12 - 14 years old, they're learning much more basic concepts than you're describing. I don't think anyone is trying to take intro to computer science out of high schools or preventing an advanced student younger than that from the same.
I would rather a teacher have to draw a concept on a board than have each student watch an animation on their computer. Obviously, the teacher projecting the animation should be fine, but it seems like some educators and parents can't handle that and it turns into a slippery slope back to kids using devices.
So for most classrooms full of students in grades prior to high school, the answer to your list of (presumably rhetorical) questions is "Yes."
There's an in-between point my math teacher loved using: an overhead projector. Hand-drawn transparencies that could be made beforehand or on the fly, protected large so everyone could see, without hiding the teacher behind a computer - they'd still stand at the front of the class facing the students.
Those are great examples. Not familiar with Desmos, but 3Blue1Brown style animations are great.
The problem is that people seem to want to go to extremes. Either go all out on doing everything in tablets or not use any technology in education at all.
its not just work skills, its also a better understanding that is gained from things such as the maths animations you mentioned.
> The problem is that people seem to want to go to extremes. Either go all out on doing everything in tablets or not use any technology in education at all.
I think the latter is mostly a reaction to the former. I think there is a way to use technology appropriately in theory in many cases, but the administrators making these choices are largely technically illiterate and it's too tempting for the teachers implementing them to just hand control over to the students (and give themselves a break from actually teaching).
>Would it be a mistake to use Desmos in a math classroom
Maybe. Back in the day I had classes where we had to learn the rough shape of a number of basic functions, which built intuition that helped. This involved drawing a lot of them by hand. Initially by calculating points and estimating, and later by being given an arbitrary function and graphing it. Using Desmos too early would've prevented building these skills.
Once the skills are built, using it doesn't seem a major negative.
I think of it like a calculator. Don't let kids learning basic arithmetic to use a 4 function calculator, but once you hit algebra, that's find (but graphing calculators still aren't).
Best might be to mix it up, some with and some without, but no calculator is preferable to always calculator.
> (as it's been done since the 90s in many schools around the world)
I had computer lab in a catholic grade school in the mid-late 80's. Apple II's and the class was once a week and a mix of typing, logo turtle, and of course, The Oregon Trail.
The options from big companies to run untrusted open source code are:
1) a-la-Google: Build everything from source. The source is mirrored copied over from public repo. (Audit/trust the source every time)
2) only allow imports from a company managed mirror. All imported packages needs to be signed in some way.
Here only (1) would be safe. (2) would only be safe if it's not updating the dependencies too aggressively and/or internal automated or manual scanning on version bumps would catch the issue .
For small shops & individuals: kind of out of luck, best mitigation is to pin/lock dependencies and wait long enough for hopefully folks like Fibonar to catch the attack...
Bazel would be one way to let you do (1), but realistically if you don't have the bandwidth to build everything from source, you'd rely on external sources with rules_jvm_external or locked to a specific pip version rules_pyhton, so if the specific packages you depend on are affected, you're out of luck.
javac for better or worse is aggressively against doing optimizations to the point of producing the most ridiculously bad code. The belief tends to be that the JIT will do a better job fixing it if it has byte code that's as close as possible to the original code. But this only helps if a) the code ever gets JIT'd at all (rarely true for eg class initializers), and b) the JIT has the budget to do that optimization. Although JITs have the advantage of runtime information, they are also under immense pressure to produce any optimizations as fast as possible. So they rarely do the level of deep optimizations of an offline compiler.
Why should compiler optimize obviously dumb code? If developer wants to create billions of heap objects, compiler should respect him. Optimizing dumb code is what made C++ unbearable. When you write one code and compilers generates completely different code.
No, in the example they provided, programmer wrote obviously stupid code. It has nothing to do with necessity:
Long sum = 0L;
for (Long value : values) {
sum += value;
}
I also want to highlight that there are plenty of collections utilizing primitive types. They're not generic but they do the job, so if you have a bottleneck, you can solve it.
That said, TBH I think that adding autoboxing to the language was an error. It makes bad code look too innocent. Without autoboxing, this code would look like a mess and probably would have been caught earlier.
People complaining about how hard to get simple answer is don't appreciate the complexity in figuring out optimal models...
There's so many knobs to tweak, it's a non trivial problem
- Average/median length of your Prompts
- prompt eval speed (tok/s)
- token generation speed (tok/s)
- Image/media encoding speed for vision tasks
- Total amount of RAM
- Max bandwidth of ram (ddr4, ddr5, etc.?)
- Total amount of VRAM
- "-ngl" (amount of layers offloaded to GPU)
- Context size needed (you may need sub 16k for OCR tasks for instance)
- Size of billion parameters
- Size of active billion parameters for MoE
- Acceptable level of Perplexity for your use case(s)
- How aggressive Quantization you're willing to accept (to maintain low enough perplexity)
- even finer grain knobs: temperature, penalties etc.
Also, Tok/s as a metric isn't enough then because there's:
- thinking vs non-thinking: which mode do you need?
- models that are much more "chatty" than others in the same area (i remember testing few models that max out my modest desktop specs, qwen 2.5 non-thinking was so much faster than equivalent ministral non-thinking even though they had equivalent tok/s... Qwen would respond to the point quickly)
At the end, final questions are: are you satisfied with how long getting an answer took? and was the answer good enough?
The same exercise with paid APIs exists too, obviously less knobs but depending on your use case, there's still differences between providers and models. You can abstract away a lot of the knobs , just add "are you satisfied with how much it cost" on top of the other 2 questions
That's a big flaw of LLMs, not limited to RAGs: it lacks the fundamental understanding of "good and bad", like Richard Sutton said in that Dwarkesh podcast.
So if you flood the Internet with "of course the moon landing didn't happen" or "of course the earth is flat" or "of course <latest 'scientific fact' lacking verifiable, definitive proof> is true", you then get a model that's repeating you the same lies.
This makes the input data curating extremely important, but also it remains an unsolved problem for topics where's there's no consensus
> That's a big flaw of LLMs, not limited to RAGs: it lacks the fundamental understanding of "good and bad", like Richard Sutton said in that Dwarkesh podcast.
After paticipating in social media since the beginning I think this problem is not limited to LLMs.
There are certain things we can debunk all day every day and the only outcome isit happens again next day and this has been a problem since long before AI - and I personally think it started before social media as well.
> After paticipating in social media since the beginning I think this problem is not limited to LLMs.
Yup, but for LLMs the problem is worse... many more people trust LLMs and their output much more than they trust Infowars. And with basic media literacy education, you can fix people trusting bad sources... but you fundamentally can't fix an LLM, it cannot use preexisting knowledge (e.g. "Infowars = untrustworthy") or cues (domain recently registered, no imprint, bad English) on its own, neither during training nor during inference.
"water is wet" kind of study, as tariffs are precisely supposed to increase price for consumers for imported goods... But the last 3 paragraphs are interesting:
- Importers raised the price more than needed (i.e. blame tarifs to increase their profit margin)
- Price increases took one year to fully reflect to the customers, and persisted nearly one year after the tariffs expired.
- chicken-tax-like loopholes implemented wherever possible (for wine apparently it's raising the ABV to more than 14%)
You remind me of the fact that humans do not in fact have sensors in the skin to detect specifically wetness.
I think given the amount of ideas floating around, it is occasionally good to revisit things that are "known", just in case some underlying assumption changed, especially for economics which is harder to get right as it deals a lot with what human want and do.
I can't see how anyone can think "the exporters pay the tariff" makes any sense. TBH, we'll never know how many people thought it made sense because it didn't matter.
In the end money move around. If - for example - the government would just give the citizens the money from the tariffs in equal share (I mean not that I suggest they would, but technically possible), it would be like taking from the citizens that consume more and give it to the citizens that consume less.
So, yes, it is correct in a practical immediate sense that "the exporters pay the tariff" but that excludes many relevant issues like how prices evolve (which are paid by the consumers), what the government does with the money (it could share or not) and what others decide to produce (to avoid tariffs). But definitely many people didn't thought of all that ...
Your first 2 points make me extra bitter about COVID.
Less store hours. Higher prices. Inflation. People in school got a terrible education and it affected my workforce. (But hey 1% of people died, as predicted if we did nothing at all... )
It only reinforces the importance of competition over protectionism.
I used to be a walmart fan, but my local store is cheaper now. I didn't bother to look at prices until things were getting silly.
> But hey 1% of people died, as predicted if we did nothing at all
Nope. Compare the death rates of Sweden vs its neighbours in the Nordics (the closest comparisons we have with similar weather/culture/etc.). Or if you don't care about minimising variables, in the US between states that did lockdowns and mask mandates and those that didn't. In every comparable (e.g. excluding rural vs urban) case, there were more deaths in "doing nothing" than implementing the same basic public health axioms that have held true for centuries.
> Inflation
That was also helped by Russia invading Ukraine, which increased global prices of multiple important raw materials. But yes, inflation after a period of deflation/economic contraction/restricted travel and consumption was to be expected.
> People in school got a terrible education and it affected my workforce
It's definitely a bigger issue for them than it is for you. And yeah, it sucks for them. Would have been pretty terrible to tell teachers (who overwhelmingly skew older) they should risk their lives just to keep kids occupied too.
> It only reinforces the importance of competition over protectionism.
The thing too many forget is that if we didn't flatten the curve our entire medical system was going to collapse. It's insane that people don't yet understand this concept and can't even empathize with medical professionals. Yes, we all struggled, but try talking to medical professionals to see how they did.
When something doesn't happen because enough measures were taken, then it wasn't worth it because it didn't happen?
> The thing too many forget is that if we didn't flatten the curve our entire medical system was going to collapse
Yep, if things were going well there wouldn't have been makeshift morgues with refrigerated trucks, sick people having to be moved around to different countries, the military deploying field hospitals, corpses piling in the streets. Those examples are from a variety of countries, which shows how bad the situation was globally.
You had 6 weeks of staying at home, and then quarantines for international travellers after that. In return, you had no COVID-19 at all for several years. Seems a fair trade.
Norway had that too; without lockdown. Curfews would require a change in the constitution and the last time they happened was during WWII which makes them doubly unpopular.
Sweden all-cause mortality was indeed higher if an immediate pre-pandemic year is taken as a base. However, pre-pandemic years in Sweden show a substantial dip in all-cause mortality, something that neighboring countries did not see. It is not that simple.
On my 32GB Ryzen desktop (recently upgraded from 16GB before the RAM prices went up another +40%), did the same setup of llama.cpp (with Vulkan extra steps) and also converged on Qwen3-Coder-30B-A3B-Instruct (also Q4_K_M quantization)
On the model choice: I've tried latest gemma, ministral, and a bunch of others. But qwen was definitely the most impressive (and much faster inference thanks to MoE architecture), so can't wait to try Qwen3.5-35B-A3B if it fits.
I've no clue about which quantization to pick though ... I picked Q4_K_M at random, was your choice of quantization more educated?
Quant choice depends on your vram, use case, need for speed, etc. For coding I would not go below Q4_K_M (though for Q4, unsloth XL or ik_llama IQ quants are usually better at the same size). Preferably Q5 or even Q6.
> EU moves so much faster when it comes to regulations like forcing all of us in Denmark to use timesheets, annoying lids on our bottles, and invasive surveillance laws.
Rediscovering the principle of subsidiarity from first principles...
reply