Did you see the article references [1][2] from 2006 and 2017 that already argue that recursion is a security problem? It's not new just not well-known.
>> I would argue that the title is misleading and overly alarmist here. This particular bug may have involved recursion and a stack overflow, but that's like saying "malloc kills" in the title of an article about a heap overflow bug.
Let's see what the article[1] you cited says:
Rule 3: Do not use dynamic memory allocation after initialization.
Rationale: This rule appears in most coding guidelines for safety-critical software. The reason is simple: Memory allocators, such as malloc, and garbage collectors often have unpredictable behavior that can significantly impact performance.
If you think recursion is a known security problem, do you also think using the heap is a known security problem?
Arguably, Stack Clash is just a compiler bug--recursive code shouldn't be able to jump the guard pages. This was fixed in Clang in 2021 [1], in GCC even earlier, and in MSVC earlier than that.
Recursion per se isn't an issue; unbounded stack use is. If you either know your input size is bounded (e.g. it's not user-generated) or use tail-recursion (which should get compiled to a loop), it's fine.
If your algorithm does unbounded heap allocations instead, you're still going to get oomkilled. The actual vulnerability is not enforcing request resource limits. Things like xml bombs can then exacerbate this by expanding a highly compressed request (so a small amount of attacker work can generate a large amount of receiver work).
Exactly. The article would have been much more informative if it had detailed why the usual approaches to limiting resource usage wouldn't work to prevent DoS here.
That idea works in general but causes false positives: No artificial limit you pick is "right" and the false positives can be avoided by getting rid of the recursion altogether.
PS: It's not one single function, not direct but indirect recursion.
Sure if it's indirect I agree it will get messy fast with a dozen functions suddenly needing to handle an additional parameter, but unrelated to that... I'd really like to know who needs recursion for this that's deeper than 3 or 4 levels. What's the use case? Such xml surely would be unreadable and unwritable to humans, but if it's used as some form of exchange format between systems, what would that be? How would it end up with such deeply nested entities? It sounds like something you deliberately implement that way to show how "smart" you are, but not "hey that seems the reasonable thing to do here".
This makes me wonder: does any of the popular xml libs have a sort of safe mode, where custom entities and similar features are disabled, those schema urls ignored, and namespaces just flattened (and whatever else I forgot or don't even know about)? You know for when I know I only need to parse simple xml files that should contain a couple plain tags and attributes, and want to reduce attack surface.
There are parsers that only implement a tiny subset of XML. And Expat has compile time flags to disable some of that machinery where not needed. It's arguably no longer XML then though.
That really depends. Segmented stack makes it equivalent to heap. Heap might or might not fail gracefully. If the OS permits overcommit and you've mapped a large region, then an arbitrary process on the machine could trigger the OOM when writing to a supposedly allocated (but not previously written) piece of memory.
Presumably you configure resource limits in production but that just means the "correct" process gets unexpectedly killed instead of an arbitrary one.
Code handling arbitrary input needs to carefully limit resource usage. There's no avoiding it.
People who spent hours finding the right security contacts for companies without luck would likely disagree. The key failure is not the single missing file, but that security contacts are too hard to find and the effect that has.