More

kruador · 2025-09-15T11:02:15 1757934135

Blame the European regulators who decided that it was no longer necessary to have standard pack sizes.

Pack sizes were regulated in 1975 for volume measures (wine, beer, spirits, vinegar, oils, milk, water, and fruit juice) and in 1980 for weights (butter, cheese, salt, sugar, cereals [flour, pasta, rice, prepared cereals], dried fruits and vegetables, coffee, and a number of other things). In 2007, all of that was repealed - and member states were now forbidden from regulating pack sizes!

I think the rationale was that now the unit price (price per unit of measurement) was mandatory to display, consumers would still know which of two different packs on the same shelf was better value. But standard pack sizes don't just provide value-for-money comparisons, as this article shows.

antonyh · 2025-09-16T15:25:39 1758036339

Ironically it seems (from memory, I've not researched it deeply) that continental butter has not changed from 250g, whereas the British brands have moved first to 200g. I could understand if they switched to 225g as essentially a half-pound block, but 200g isn't any closer to an useful Imperial measure than 250g.

kruador · 2025-07-14T11:04:08 1752491048

It wasn't possible on the 386. Ken Shirriff discusses how the Intel 80386's register file was built at https://www.righto.com/2025/05/intel-386-register-circuitry..... Only four of the registers are built to allow 32-, 16- or 8-bit writes. Reads output the entire register onto the bus and the ALU does the appropriate masking. The twist is for the legacy 16-bit upper half-registers - themselves really a legacy of the 8080, and the requirement to be able to directly translate 8080 code opcode-for-opcode. The output of these has to be shifted down 8 bits to be in the right place for the ALU, then these bits have to be selected.

AMD seem to have decided to regularise the instruction set for 64-bit long mode, making all the registers consistently able to operate as 64-bit, 32-bit, 16-bit, and 8-bit, using the lowest bits of each register. This only occurs if using a REX prefix, usually to select one of the 8 additional architectural registers added for 64-bit mode. To achieve this, the bits that are used to select the 'high' part of the legacy 8086 registers in 32- or 16-bit code (and when not using the REX prefix) are used instead to select the lowest 8 bits of the index and pointer registers.

From the "Intel 64 and IA-32 Architectures Software Developer's Manual":

"In 64-bit mode, there are limitations on accessing byte registers. An instruction cannot reference legacy high-bytes (for example: AH, BH, CH, DH) and one of the new byte registers at the same time (for example: the low byte of the RAX register). However, instructions may reference legacy low-bytes (for example: AL, BL, CL, or DL) and new byte registers at the same time (for example: the low byte of the R8 register, or RBP). The architecture enforces this limitation by changing high-byte references (AH, BH, CH, DH) to low byte references (BPL, SPL, DIL, SIL: the low 8 bits for RBP, RSP, RDI, and RSI) for instructions using a REX prefix."

In 64-bit code there is very little reason at all to be using bits 15:8 of a longer register.

This possibly puts another spin on Intel's desire to remove legacy 16- and 32-bit support (termed 'X86S'). It would remove the need to support AH, BH, CH and DH - and therefore some of the complex wiring from the register file to support the shifting. If that's what it currently does.

Actually, looking at Agner Fog's optimisation tables (https://www.agner.org/optimize/instruction_tables.pdf) it appears there is significant extra latency in using AH/BH/CH/DH, which suggests to me that the processor actually implements shifting into and out of the high byte using extra micro-ops.

aleph_minus_one · 2025-07-14T15:01:20 1752505280

> In 64-bit code there is very little reason at all to be using bits 15:8 of a longer register.

I disagree: there only exists BSWAP r32 (and by 64 extension BSWAP r64): https://www.felixcloutier.com/x86/bswap

No BSWAP r16 exists. Why? in 32 bit mode, it was not needed, because you could simply use

XCHG r/m8, r8

with, say, cl and ch (to swap the endianness of cx).

In 64 bit mode, you can thus only the endianness of a 16 bit value for the "old" registers ax, cx, dx, bx using one instruction. If you want to swap the 16 bit part of one of the "new" registers, you add least have to do a 32 bit (logical) right shift (SHL) after a BSWAP r32 (EDIT: jstarks pointed out that you could also use ROL r/m16, 8 to do this in one instruction on x86-64). By the way: this solution has a pitfall over BSWAP: BSWAP preserves the flags register, while SHL does not.

jstarks · 2025-07-14T15:17:02 1752506222

What about ROL r/m16, 8?

aleph_minus_one · 2025-07-14T15:31:09 1752507069

This would indeed work (and is likely the better solution), but in opposite to BSWAP and XCHG, it also changes flags.

hollowonepl · 2025-07-23T13:55:32 1753278932

In the meantime I also read somewhere that this feature is only available in the long mode

kruador · 2025-07-01T12:21:14 1751372474

No, SDRAM means Synchronous DRAM, where the data is clocked out of the DRAM chips instead of just appearing on the bus some time after the Column Address Strobe is asserted. Clocking it means that the data doesn't appear before the CPU (or other bus master) is ready to receive it, and that it doesn't disappear before the CPU has read it.

Static RAM (SRAM) is a circuit that retains its data as long as the power is supplied to it. Dynamic RAM (DRAM) must be refreshed frequently. It's basically a large array of tiny capacitors which leak their stored charge through imperfect transistor switches, so a charged capacitor must be regularly recharged. You would think that you would need to read the bit and rewrite its value in a second cycle, but it turns out that reading the value is itself a destructive operation and requires the chip to internally recharge the capacitors.

Further, the chip is organised in rows and columns - generally there are the same number of Sense Amplifiers as columns, with a whole row of cells discharging into their corresponding Sense Amplifiers on each read cycle, the Sense Amplifiers then being used to recharge that row of cells. The column signals select which Sense Amplifier is connected to the output. So you don't need to read every row and column of a chip, just some column on every row. The Sense Amplifier is a circuit that takes the very tiny charge from the cell transistor and brings it up to a stable signal voltage for the output.

So why use DRAM at all if it has this need to be constantly refreshed? Because the Static RAM circuit requires 4-6 transistors per cell, while DRAM only requires 1. You get close to 4-6 times as much storage from the same number of transistors.

kruador · 2025-07-01T12:08:05 1751371685

The Sinclair ZX80 and ZX81 have static RAM internally, which you wouldn't expect for a) a computer that's designed to be as cheap as possible and b) uses a Zilog Z80 which has built-in refresh circuitry.

The reason is that the designers saved a few chips by repurposing the Z80's refresh circuit as a counter/address generator, when generating the video signal. Specifically, it uses the instruction fetch cycle to read the character code from RAM, then it uses the refresh cycle to read the actual line of character data from the ROM. The ZX80 nominally clocks the Z80 at 3.25MHz, but a machine cycle is four clocks (two for fetch, two for refresh), so it's effectively the same speed as a 0.8125 MHz 6502.

I wrote a long section here about how the ZX80 uses the CPU to generate the screen and the extra logic that involves, but it was getting too long :) The ZX81 is basically just a cost-reduced ZX80 where all the discrete logic chips are moved into one semi-custom chip.

Doing this makes external RAM packs more expensive too. You couldn't use the real refresh address coming from the Z80 because the video generator would be hopping around a small range of addresses in the ROM, rather than covering the whole of RAM (or at least each row of the DRAM). The designer has two options:

1. Use static RAM in the external RAM pack, making it substantially more expensive for the RAM itself; 2. Use DRAM in the external RAM pack, and add extra refresh circuitry to refresh the DRAM when the main computer is using the refresh cycle doing its video madness.

I think most RAM packs did the second option.

kruador · 2025-05-16T11:52:16 1747396336

Most 8-bit CPUs didn't even have a hardware multiply instruction. To multiply on a 6502, for example, or a Z80, you have to add repeatedly. You can multiply by a power of 2 by shifting left, so you can get a bigger result by switching between shifting and adding or subtracting. Although, again, on these earlier CPUs you can only shift by one bit at a time, rather than by a variable number of bits.

There's also the difference between multiplying by a hard-coded value, which can be implemented with shifts and adds, and multiplying two variables, which has to be done with an algorithm.

The 8086 did have multiply instructions, but they were implemented as a loop in the microcode, adding the multiplicand, or not, once for each bit in the multiplier. More at https://www.righto.com/2023/03/8086-multiplication-microcode.... Multiplying by a fixed value using shifts and adds could be faster.

The prototype ARM1 did not have a multiply instruction. The architecture does have a barrel shifter which can shift one of the operands by any number of bits. For a fixed multiplication, it's possible to compute multiplying by a power of two, by (power of two plus 1), or by (power of two minus 1) in a single instruction. The latter is why ARM has both a SUB (subtract) instruction, computing rd := rs1 - Operand2, and a RSB (Reverse SuBtract) instruction, computing rd := Operand2 - rs1. The second operand goes through the barrel shifter, allowing you to write an instruction like 'RSB R0, R1, R1, #4' meaning 'R0 := (R1 << 4) - R1', or in other words '(R1 * 16) - R1', or R1 * 15.

ARMv2 added in MUL and MLA (MuLtiply and Accumulate) instructions. The hardware ARM2 implementation uses a Booth's encoder to multiply 2 bits at a time, taking up to 16 cycles for 32 bits. It can exit early if the remaining bits are all 0s.

Later ARM cores implemented an optional wider multiplier (that's the 'M' in 'ARM7TDMI', for example) that could multiply more bits at a time, therefore executing in fewer cycles. I believe ARM7TDMI was 8-bit, completing in up to 4 cycles (again, offering early exit). Modern ARM cores can do 64-bit multiplies in a single cycle.

cbm-vic-20 · 2025-05-16T12:08:46 1747397326

The base RISC-V instruction set does not include hardware multiply instructions. Most implementations do include the M (or related) extensions that provide them, but if you are building a processor that doesn't need it, you don't need to include it.

kruador · on April 8, 2025

This is, in some ways, reintroducing something that other source control systems forced on you (and you can see it in one of the videos that Scott linked, about using BitKeeper - Ep.4 Bits and Booze, https://www.youtube.com/watch?v=MPFgOnACULU). The previous tools I used (SourceGear Vault, MS Team Foundation Services) required you to have a separate working tree for each branch - the two were directly tied together. That's sometimes useful if you need to have the two versions running concurrently, but for short-lived topic branches or, as you say, working on multiple topics at the same time, it can be very inconvenient.

Initially it was jarring to not get a different working directory for each branch, but I soon got used to it. Working in the same directory for multiple branches means that untracked files stay around - can be helpful for things like IDE workspace configuration, which is specific to me and the project, but not the branch.

You can of course have multiple clones of the repository - even clones of clones - but pushing/pulling branches from one to another is a lot more work than just checking out a branch in a different worktree.

My general working practice now is to keep release versions in their own worktree, and using the default worktree (where the .git directory lives) for development on the main branch. That means I don't need to keep resyncing up my external dependencies (node_modules, for example) when switching between working on different releases. But I can see a good overview of my branches, and everything on the remote, from any worktree.

kruador · on April 3, 2025

I've seen a suggestion that they're using ccTLDs.

Which might explain why the British Indian Ocean Territory - population, one US military base - has such a high tariff. The BIOT, aka Diego Garcia, has the ccTLD .io.

lostmsu · on April 3, 2025

In that case, where is the tariff rate for USSR (.su)?

kruador · on Feb 12, 2025

I can't replicate the initial problem, at least pushing to Bitbucket. I'm using Windows, so I didn't use `touch` - instead I used 'echo' to create a new file in a shallow clone of my repo. That repo is 126 MB on Bitbucket, and the shallow clone downloaded 6395 objects taking 40.68 MB.

I've tried with a new file both having content ('Test shallow clone push'), and again with an empty file. In both cases it pushed 3 objects, and in the empty file case it reused one (it turns out my repo already has some empty files in it).

It's always possible that this is (or was) a GitHub bug - I haven't tried it there.

kruador · on Feb 12, 2025

See my top-level response, but basically nothing is mangled. Instead Git internally treats it as a 'graft' and knows not to look for parents of the prior commit.

I started that comment as a reply to you but I realised that a) it may just have been a bug that might already be fixed and b) it looks like the Stack Overflow answer was speculative and not tested!

kruador · on Feb 12, 2025

It isn't mangled. The commit is there as-is. Instead the repository has a file, ".git/shallow", which tells it not to look for the parents of any commit listed there. If you do a '--depth 1' clone, the file will list the single commit that was retrieved.

This is similar to the 'grafts' feature. Indeed 'git log' says 'grafted'.

You can test this using "git cat-file -p" with the commit that got retrieved, to print the raw object.

> git clone --depth 1 https://github.com/git/git > git log

commit 388218fac77d0405a5083cd4b4ee20f6694609c3 (grafted, HEAD -> master, origin/master, origin/HEAD) Author: Junio C Hamano <gitster@pobox.com> Date: Mon Feb 10 10:18:17 2025 -0800

    The ninth batch

    Signed-off-by: Junio C Hamano <gitster@pobox.com>

> git cat-file -p 388218fac77d0405a5083cd4b4ee20f6694609c3

tree fc620998515e75437810cb1ba80e9b5173458d1c parent 50e1821529fd0a096fe03f137eab143b31e8ef55 author Junio C Hamano <gitster@pobox.com> 1739211497 -0800 committer Junio C Hamano <gitster@pobox.com> 1739211512 -0800

The ninth batch

Signed-off-by: Junio C Hamano <gitster@pobox.com>

I can't reproduce the problem pushing to Bitbucket, using the most recent Git for Windows (2.47.1.windows.2). It only sent 3 objects (which would be the blob of the new file, the tree object containing the new file, and the commit object describing the tree), not the 6000+ in the repository I tested it on.

It may be that there was a bug that has now been fixed. Or it may be something that only happens/happened with GitHub (i.e. a bug at the receiving end, not the sending one!)

I note that the Stack Overflow user who wrote the answer left a comment underneath saying

"worth noting: I haven't tested this; it's just some simple applied math. One clone-and-push will tell you if I was right. :-)"

HN For You