- 100MB 'image' (ie executable code; the executable itself plus all the OS libraries loaded.)
- 40MB heap
- 50MB "mapped file", mostly fonts opened with mmap() or the windows equivalent
- 45MB stack (each thread gets 2MB)
- 40MB "shareable" (no idea)
- 5MB "unusable" (appears to be address space that's not usable because of fragmentation, not actual RAM)
Generally if something's using a lot of RAM, the answer will be bitmaps of various sorts: draw buffers, decompressed textures, fonts, other graphical assets, and so on. In this case it's just allocated but not yet used heap+stacks, plus 100MB for the code.
Edit: I may be underestimating the role of binary code size. Visual Studio "devenv.exe" is sitting at 2GB of 'image'. Zoom is 500MB. VSCode is 300MB. Much of which are app-specific, not just Windows DLLs.
muskstinks 1 days ago [-]
Tx for the breakdown. I will play around with it later on my windows machine.
But isn't it crazy how we throw out so much memory just because of random buffers? It feels wrong to me
pjc50 1 days ago [-]
As pointed out below, quite a lot of that isn't in RAM - see "working set".
There's a common noob complaint about "Linux using all my RAM!" where people are confused about the headline free/buffers numbers. If there's a reasonable chance data could be used again soon it's better to leave it in RAM; if the RAM is needed for something else, the current contents will get paged out. Having a chunk of RAM be genuinely unallocated to anything is doing nothing for you.
fluoridation 1 days ago [-]
Nitpick: What you're describing is the disk cache. If a process requests more memory than is free, the OS will not page out pages used for the cache, it will simply either release them (if they're on the read cache) or flush them (if they're on the write cache).
HappMacDonald 8 hours ago [-]
Of course it's doing something for you. Room to defrag other areas of RAM, room to load something new without moving something else out of the way first.
Your perspective sounds like the concept that space in a room does nothing for you until/unless you cram it full of hoarded items.
Melatonic 58 minutes ago [-]
If we're talking about a storage locker that's instantly reconfigurable then it is probably better to be approximately filling it.
Why would anyone buy a locker 5x the size of their needs ?
groby_b 19 hours ago [-]
If you didn't have the "random" buffers, you'd complain how slow it is. Syntax highlighting? Needs a boatload of caching to be efficient. Code search? Hey, you want a cached code index. Plugins? Gotta run your python code somewhere.
Run vi/nano/micro/joe - they're optimizing for memory to some extent. vi clocks in at under 8 MB. You're giving up a lot of "nice" things to get there.
wat10000 1 days ago [-]
Turning these numbers into "memory consumption" gets complicated to the point of being intractable.
The portions that are allocated but not yet used might just be page table entries with no backing memory, making them free. Except for the memory tracking the page table entries. Almost free....
A lot of "image" will be mmapped and clean. Anything you don't actually use from that will be similarly freeish. Anything that's constantly needed will use memory. Except if it's mapped into multiple processes, then it's needed but responsibility is spread out. How do you count an app's memory usage when there's a big chunk of code that needs to sit in RAM as long as any of a dozen processes are running? How do you count code that might be used sometime in the next few minutes or might not be depending on what the user does?
gmueckl 1 days ago [-]
This assumes that executable code pages can be shared between processes. I'm skeptical that this is still a notable optimization on modern systems because dynamic linking writes to executable memory to perform relocations in the loaded code. So this would counteract copy on write. And at least with ASLR, the result should be different for each process anyway.
cataphract 23 hours ago [-]
ld writes to the GOT. The executable segment where .text lives is not written to (it's position independent code in dynamic libraries).
ASLR is not an obstacle -- the same exact code can be mapped into different base addresses in different processes, so they can be backed by the same actual memory.
manwe150 19 hours ago [-]
That’s true on most systems (modern or not), but actually never been true on Windows due to PE/COFF format limitations. But also, that system doesn’t/can’t do effective ASLR because of the binary slide being part of the object file spec.
gmueckl 19 hours ago [-]
I can't reconcile this with the code that GCC generates for accessing global variables. There is no additional indirection there, just a constant 0 address that needs to be replaced later.
cataphract 18 hours ago [-]
Assuming the symbol is defined in the library, when the static linker runs (ld -- we're not talking ld.so), it will decide whether the global variable is preemptable or not, that is, if it can be resolved to a symbol outside the dso. Generally, by default it is, though this depends on many things -- visibility attributes, linker scripts, -Bsymbolic, etc. If it is, ld will have the final code reach into the GOT. If not, it can just use instruction (PC) relative offsets.
gmueckl 18 hours ago [-]
I've never observed a (non-LTO) linker exchange instructions. I want to see an example before I can believe this.
cataphract 2 hours ago [-]
I'm not sure if you're just trolling, but I'll give the same example I gave before (you can get even wilder simplifications -- called relaxations -- with TLS, since there are 4 levels of generality there). I'm not sure what you meant by "changing isntructions", but in the first case the linker did the fixup indicated by the relocation and in the second reduced the generality of the reference (one less level of indirection by changing mov to lea) because it knew the symbol could not be preempted (more exactly, the R_X86_64_REX_GOTPCRELX relocation allows the linker to do the relaxation if it can determine that it's safe to)
OK, I spent a few additional minutes digging into this. It's been too long since I looked at those mechanisms. Turns out my brain was stuck in pre-PIE world.
Global variables in PIC shared libraries are really weird: the shared library's variable is placed into the main program image data segment and the relocation is happening in the shared library, which means that there is an indirection generated in the library's machine code.
wat10000 19 hours ago [-]
Are you looking at the code before or after the static linker runs?
wat10000 23 hours ago [-]
Dynamic linking doesn't have to write to code. I'm not familiar with other platforms, but on macOS, relocations are all in data, and any code that needs a relocation will indirect through non-code pages. I assume it's similar on other OSes.
This optimization is essential. A typical process maps in hundreds of megabytes of code from the OS. There are hundreds of processes running at any given time. Eyeballing the numbers on an older Mac I have here (a newer one would surely be worse) I'd need maybe 50GB of RAM just to hold the code of all the running processes if the pages couldn't be shared.
Capricorn2481 1 days ago [-]
But I have sublime text open with a hundred files and it's using 12mb.
pjc50 1 days ago [-]
And how does that breakdown in vmmap? I'm guessing that's working set vs. the whole virtual memory allocation (which is definitely always an overestimate and not the same as RAM)
wild_egg 24 hours ago [-]
Virtual memory doesn't matter at all. It's virtual. You can take 2TB of address space, use 5MB of it, and nothing on the system cares.
corysama 20 hours ago [-]
Some ten years ago I used an earlier version of https://unity.com/how-to/analyze-memory-usage-memory-profili... to accidentally discover a memory leak that was due to some 3rd party code with a lambda that captured an ancient, archived version of Microsoft's C# vector which had a bug. There were multiple layers of impossibility of me finding that through inspection. But, with a functional tool, it was obvious.
Ten years before that I worked on a bespoke commercial game engine that had its own memory tracker. First thing we did with it was fire up a demo program, attach the memory analyzer to it, then attach a second instance of the memory analyzer to the first one and found a memory error in the memory analyzer.
Now that I'm out of gamedev, I feel like I'm working completely blind. People barely acknowledge the existence of debuggers. I don't know how y'all get anything to work.
A quick google for open-source C++ solutions turns up https://github.com/RudjiGames/MTuner which happens to have been updated today. From a game developer, of course XD
inetknght 1 days ago [-]
> I look at memory profiles of rnomal apps and often think "what is burning that memory".
As a corrolary to this: I look at CPU utilization graphs. Programs are completely idle. "What is burning all that CPU?!"
I remember using a computer with RAM measured in two-digit amounts of MiB. CPU measured in low hundreds of MHz. It felt just as fast -- sometimes faster -- as modern computers. Where is all of that extra RAM being used?! Where is all of that extra performance going?! There's no need for it!
ThrowawayR2 1 days ago [-]
Next time you see someone on HN blithely post "CPU / RAM is cheaper than developer time", it's them. That is the sort of coder who are collectively wasting our CPU and RAM.
shepherdjerred 1 days ago [-]
If you ran a business, would you rather your devs work on feature X that could bring in Y revenue, or spend that same time reducing CPU/RAM/storage utilization by Z% and gives the benefit of ???
Melatonic 56 minutes ago [-]
Why not both ?
didgetmaster 15 hours ago [-]
That is why we have slow, bloated software. The companies that create the software do not have to pay any of the operational costs to run it.
If you have to buy extra RAM or pay unnecessary electrical or cooling expenses because the code is bad; it's not their problem. There is no software equivalent to MPG measurements for cars where efficient engine designs are rewarded at the time of purchase.
kardos 1 days ago [-]
There is probably some low hanging fruit to be harvested in terms of memory optimizations, and it could be a selling point for the next while as the memory shortage persists
bigstrat2003 20 hours ago [-]
You work on both. Sometimes you need to prioritize one, sometimes the other. And the benefit of the second option is "it makes our product higher quality, both because that is our work ethic but also because our customers will appreciate a quality product".
shepherdjerred 16 hours ago [-]
The business is only going to care about the bottom line. If it's not slow enough to cause business problems, they are not going to say "here's a week to make software faster"
Likewise engineers are only going to care about doing their job. If the business doesn't reward them from taking on optimization work, why would they do it?
This is not true of all engineers and all businesses. Some businesses really do 'get it' and will allow engineers to work on things that don't directly help stated goals. Some engineers are intrinsically motivated and will choose to work on things despite that work not helping their career.
What I'm really getting is, yes, engineers choose "slower" technologies (e.g. electron, React) because there are other benefits, e.g. being able to get work done faster. This is a completely rational choice even if it does lead to "waste" and poor performance.
anthk 1 days ago [-]
Even an editor running under Inferno plus Inferno itself would be lighter than the current editors by large. And that with the VM being weighted on. And Limbo it's a somewhat high level language...
pjc50 1 days ago [-]
> I remember using a computer with RAM measured in two-digit amounts of MiB
Yes, so do I. It was limited to 800x600x16 color mode or 320x200x256. A significant amount of memory gets consumed by graphical assets, especially in web browsers which tend to keep uncompressed copies of images around so they can blit them into position.
But a lot is wasted, often by routing things through single bottlenecks in the whole system. Antivirus programs. Global locks. Syncing to the filesystem at the wrong granularity. And so on.
badsectoracula 22 hours ago [-]
FWIW a two digit amount of MB is usually at least 16MB (though with low hundreds of MHz it was probably at least 32MB if not 64MB) and most such systems could easily do 1024x768 at 16bit, 24bit or 32bit color. At least my mid-90s PC could :-P (24bit color specifically, i had some slow Cirrus Logic adapter that stored the framebuffer in triplets of R,G,B, probably to save RAM but at the cost of performance).
VorpalWay 1 days ago [-]
I too wonder that. And it is true on an OS level as well. The only worthwhile change in desktop environments since the early 2000s has been search as you type launchers. Other than that I would happily use something equivalent to Windows XP or (more likely) Linux with KDE 3. It seems everything else since then has mostly been bloat and stylistic design changes. The latter being a waste of time in my opinion.
Of course, some software other than desktop environments have seen important innovation, such as LSPs in IDEs which allows avoiding every IDE implementing support for every language. And SSDs were truly revolutionary in hardware, in making computers feel faster. Modern GPUs can push a lot more advanced graphics as well in games. And so on. My point above was just about your basic desktop environment. Unless you use a tiling window manager (which I tried but never liked) nothing much has happened for a very long time. So just leave it alone please.
_dain_ 1 days ago [-]
>The only worthwhile change in desktop environments since the early 2000s has been search as you type launchers.
Add to that: unicode handling, support for bigger displays, mixed-DPI, networking and device discovery is much less of a faff, sound mixing is better, power management and sleep modes much improved. And some other things I'm forgetting.
ziml77 22 hours ago [-]
There are some people who would exclude all of those an enhancements because they don't care about them (yes, even Unicode, I've seen some people on here argue against supporting anything other than ASCII)
VorpalWay 21 hours ago [-]
Unicode is a fair point, I do speak a language that has a couple of letters that are affected. And of course many many more people across the world are way more affected by that. I didn't really consider that part of the desktop environment though, but I could see the argument for why it might (the file manager for example will need to deal with it, as would translations in the menus etc).
I was primarily thinking about enhancements in the user interactions. Things you see on a day to day basis. You really don't see if you use unicode, ASCII, ISO-somenumbers, ShiftJIS etc (except when transferring information between systems).
Cold_Miserable 14 hours ago [-]
That's me. 1 byte ASCII is all that's needed for text.
HPsquared 1 days ago [-]
Work expands to fill the available time. This applies to CPU time just as it does to project management.
gwbas1c 1 days ago [-]
Basically, the short answer is that most memory managers allocate more memory than a process needs, and then reuse it.
IE, in a JVM (Java) or dotnet (C#) process, the garbage collector allocates some memory from the operating system and keeps reusing it as it finds free memory and the program needs it.
These systems are built with the assumption that RAM is cheap and CPU cycles aren't, so they are highly optimized CPU-wise, but otherwise are RAM inefficient.
ben-schaaf 1 days ago [-]
Completely agree, it would be very helpful to get even just a breakdown of what the ram is being used for. It's unfortunately a lot of work to instrument.
> sublime consumes 200mb. I have 4 text files open. What is it doing?
To add to what others have said: Depending on the platform a good amount will be the system itself, various buffers and caches. If you have a folder open in the side bar, Sublime Text will track and index all the files in there. There's also no limit to undo history that is kept in RAM.
There's also the possibility that that 200MB includes the subprocesses, meaning the two python plugin hosts and any processes your plugins spawn - which can include heavy LSP servers.
senfiaj 1 days ago [-]
It's partly because there are layers of abstractions (frameworks, libraries / runtimes / VM, etc). Also, today's software often has other pressures, like development time, maintainability, security, robustness, accessibility, portability (OS / CPU architecture), etc. It's partly because the complexity / demand has increased.
200Mb for Sublime does not seem so bad when compared to Postman using 4Gb on my machine...
veunes 1 days ago [-]
Part of the problem is that modern apps aren't really "one thing" anymore
BiteCode_dev 11 hours ago [-]
rnomal ?
toss1 20 hours ago [-]
>>I'm always confused as hell how little insight we have in memory consumption.
>>I look at memory profiles of rnomal apps and often think "what is burning that memory".
Because companies starting with Microsoft approach it as an infinite resource, and have done so literally for generations of programmers — it is now ancient tradition.
Back in the x86 days when both memory and memory handles were constrained (64k of them, iirc) I went to a MS developer conference. One problem starting to plague everyone was users' computers running out of memory when actual memory in use was less than half, and the problem was not that memory was used, but all available handles were consumed.
I randomly ended up talking to the (at the time) leader of the Excel team, so I thought I'd ask him about good practices, asking "Does it make sense to have the software look at the task and make an estimate of the full amount of RAM required and allocate it off one handle and track our usage ourselves within that block?" I was speechless when he answered: "Sure, if you wanted to optimize the snot out of it — we just allocate another handle."
That two-line answer just blew my mind and instantly explained so much about problems I saw at the time, and since.
It also made sense in the context of another talk they gave at a previous conference where the message was they anticipate the increased power of the next generation of hardware and write their new version for that hardware, not the then-current hardware. It makes sense, but in the new light, it seems almost like a cousin of planned obsolescence — "How can we squander all the new power Intel is giving us?". And the result was decades after word processing and spreadsheets had usable performance on 640K DOS machines, new machines with orders of magnitude more power and RAM, actually run slower from a user perspective.
I'm hoping this memory crunch (having postponed a memory upgrade for my daily driver and now noticing it is 10x the price) will at least have the benefit of driving developers to maybe get back some craft of designing in optimization.
Melatonic 46 minutes ago [-]
Software engineers seem to be more and more abstracted from the hardware they use. You also (rarely) back in the day had to worry about things like IRQ ports and optimising for tiny amounts of latency.
Personally I am fine with programmers not spending tons of time optimising down to every last piece because we do have so much more ram and compute relative to the old days. My bigger issue is that things are also a laggy mess even when there is plenty of resources available. I understand these things go hand in hand but I would much rather see more optimisations for the things users will actually notice than just going for metrics. A nice combo of the two would be ideal.
That being said what's probably most appalling is the amount some modern programs hard crash even when they have plenty of resources.
Capricorn2481 1 days ago [-]
> sublime consumes 200mb. I have 4 text files open. What is it doing?
Huh? Sublime Text? I have like 100 files open and it uses 12mb. Sublime is extremely lean.
Do you have plugins installed?
muskstinks 1 days ago [-]
I do not have plugins installed and i have only a handful of files open on macos.
Memroy statistics says 200mb and a peak of 750mb in the past (for whatever reason)
Capricorn2481 1 days ago [-]
Is that in Task Manager, or is that not a reliable place to look for these statistics?
Edit: From what I can tell, Sublime is allocated 100mb of virtual memory even if it's only using about 10mb in practice.
Twirrim 1 days ago [-]
A lot of programs over-allocate on virtual memory, but don't actually use it, and the OS is smart enough to just pretend like it allocated it. I'm sure there's probably some justification for it somewhere, but it's hard not to see it as some absurd organically achieved agreement. Developers used to ask for more memory than their application actually needed and caused all sorts of OOM problems for end users. OS developers realised this and made the OS lie to the app to tell it it got what it asked for, and only give it memory as needed. Now developers just can't be bothered to set any realistic amount of memory, because what's the point, the OS is going to ignore it anyway.
Electron really loves to claim absurd amounts of memory, e.g. slack has claimed just over 1TB of virtual memory, but is only using just north of 200MB.
I’d like to know the memory profile of this. The bottleneck is obviously sort which buffers everything in memory. So if we replace this with awk using a hash map to keep count of unique words, then it’s a much smaller data set in memory:
tr -s '[:space:]' '\n' < file.txt | awk '{c[$0]++} END{for(w in c) print c[w], w}' | sort -rn
I’m guessing this will beat Python and C++?
pjscott 21 hours ago [-]
> I’d like to know the memory profile of this. The bottleneck is obviously sort which buffers everything in memory.
That's not obvious to me. I checked the manuals for sort(1) in GNU and FreeBSD, and neither of them buffer everything in memory by default. Instead they read chunks to an in-memory buffer, sort each chunk, and (if there are multiple chunks) use the filesystem as temporary storage for an external mergesort.
This sorting program was originally developed with memory-starved computers in mind, and the legacy shows.
alok-g 2 hours ago [-]
Isn't using bash effectively saying, I have a bunch of functions already written in say C which I'll use but would not count those towards the lines of code? You could do the same in C and C++ itself too.
In other words, I am not sure if the comparison you are making is a fundamental one.
knome 21 hours ago [-]
>which buffers everything in memory
gnu sort can spill to disk. it has a --buffer-size option if you want to manually control the RAM buffer size, and a --temporary-directory option for instructing it where to spill data to disk during sort if need be.
1vuio0pswjnm7 1 days ago [-]
Been waiting for online commentary about programming to start acknowledging this situation as it pertains to writing programs
Memory and storage are not "cheap" anymore. Power may also rise in cost
Under these conditions, memory usage and binary size are irrefutably relevant^1
To some, this might feel like going backwards in time toward the mainframe era. Another current HN item with over 100 points, "Hold on to your hardware", reflects on how consumer hardware may change as a result
To me, the past was a time of greater software efficiency; arguably this was necessitated by cost. Perhaps higher costs in the present and future could lead to better software quality. But whether today's programmers are up for the challenge is debatable. It's like young people in finance whose only experience is in a world with "zero" interest rates. It's easier to whine about lowering rates than to adapt
With the money and poltical support available to "AI" companies, the incentive for efficiency of any kind is lacking. Perhaps their "no limits" operations, e.g., its effects on supply, may provide an incentive for others' efficiency
1. As an underpowered computer user that compiles own OS and writes own simple programs, I've always rejected large binary size and excessive memory use, even in times of "abundance"
The issue with retrofitting things to an existing well established language is that those new features will likely be underutilized. Especially in other existing parts of the standard library, since changing those would break backwards compatibly. std::optional is another example of this, which is not used much in the c++ standard library, but would be much more useful if used across the board.
Contrast this with Rust, which had the benefit of being developed several decades later. Here Option and str (string views) were in the standard library from the beginning, and every library and application uses them as fundamental vocabulary types. Combined with good support for chaining and working with these types (e.g. Option has map() to replace the content if it exists and just pass it along if None).
Retrofitting is hard, and I have no doubt there will be new ideas that can't really be retrofitted well into Rust in another decade or two as well. Hopefully at that point something new will come along that learned from the mistakes of the past.
menaerus 1 days ago [-]
Retrofitting new patterns or ideas is underutilized only when it is not worth the change. string_view example is trivial and anyone who cared enough about the extra allocations that could have happened already (no copy-elision taking place) rolled their own version of string_view or simply used char+len pattern. Those folks do not wait for the new standard to come along when they can already have the solution now.
std::optional example OTOH is also a bad example because it is heavily opinionated, and having it baked into the API across the standard library would be a really wrong choice to do.
VorpalWay 1 days ago [-]
Existing APIs for file IO in STL don't return string views into the file buffer of the library (when using buffered IO). That is something you could do, as an example.
Optional being opinionated I don't think I agree with. It is better to have an optional of something that can't be null (such as a reference) than have everything be implicitly nullable (such as raw pointers). This means you have to care about the nullable case when it can happen, and only when it can happen.
There is a caveat for C++ though: optional<T&> is larger in memory than a rae pointer. Rust optimises this case to be the same size (one pointer) by noting that the zero value can never be valid, so it is a "niche" that can be used for something else, such as the None variant of the Option. Such niche optimisation applies widely across the language, to user defined types as well. That would be impossible tp retrofit on C++ without at the very least breaking ABI, and probably impossible even on a language level. Maybe it could be done on a type by type basis with an attribute to opt in.
menaerus 1 days ago [-]
I work on a codebase which is heavily influenced by the same sentiment you share wrt optional and I can tell you it's a nightmare. Has the number of bugs somehow magically decreased? No, it did not, as a matter of fact the complexity that it introduces, which is to be honest coupled along with the monadic programming patterns which are normally enforced within such environments, just made it more probable to introduce buggy code at no obvious advantage but at the great cost - ergonomics, reasoning about the code, and performance. So, yeah, I will keep the position that it is heavily opinionated and not solving any real problem until I see otherwise - the evidence in really complex C++ production code. I have worked with many traditional C and C++ codebases so that is my baseline here. I prefer working with latter.
jandrewrogers 1 days ago [-]
Niche optimizations are trivial to automate in modern C++ if you wish. Many code bases automagically generate them.
The caveat is that niche optimizations are not perfectly portable, they can have edge cases. Strict portability is likely why the C++ standard makes niche optimization optional.
TuxSH 20 hours ago [-]
> optional<T&>
This is a C++26 feature which will have pointer-like semantics, aren't you confusing it with optional<reference_wrapper<T>> ?
pjc50 1 days ago [-]
C# gained similar benefits with Span<>/ReadOnlySpan<>. Essential for any kind of fast parser.
cdcarter 1 days ago [-]
Swift too, in 6.3!
groundzeros2015 1 days ago [-]
In C you have char*
rcxdude 1 days ago [-]
Which isn't very good for substrings due to the null-termination requirement.
guenthert 7 hours ago [-]
strtok() happily punches those holes in. Now you could argue, the resulting strings, while null-terminated, aren't true substrings (as the origin string is now corrupted), but in the context of parsing (particularly here using white-space as delimiter), that wouldn't be much an issue.
groundzeros2015 20 hours ago [-]
Struct Substring { char start, end };
My point is ownership being transferred implicitly in a struct assignment is a complexity introduced by C++.
In C the concern of allocating memory and using it is separate.
String_view is attempt to add more separation. But C programmers were already there.
rcxdude 11 hours ago [-]
Well yeah, and you could always do the same thing in C++, but having a standard structure for it makes interoperability a lot easier.
1718627440 9 hours ago [-]
An object of type char * isn't necessarily null-terminated.
rcxdude 6 hours ago [-]
No, but that makes it no longer a string as far as most C functions are concerned.
kccqzy 1 days ago [-]
And the type system does not tell you if you need to call free on this char* when you’re done with it.
groundzeros2015 21 hours ago [-]
Correct. Haphazardly passing ownership of individual nodes around is a C++ and OOP anti-pattern.
pjc50 1 days ago [-]
In C you only have char*.
groundzeros2015 20 hours ago [-]
You can compose char* in a struct.
tosti 1 days ago [-]
wchar exists.
(And the possibility to implement whatever you want, ofc.)
gwbas1c 1 days ago [-]
A lot of frameworks that use variants of "mark and sweep" garbage collection instead of automatic reference counting are built with the assumption that RAM is cheap and CPU cycles aren't, so they are highly optimized CPU-wise, but otherwise are RAM inefficient.
I wonder if frameworks like dotnet or JVM will introduce reference counting as a way to lower the RAM footprint?
pjc50 1 days ago [-]
Reference counting in multithreaded systems is much more expensive than it sounds because of the synchronization overhead. I don't see it coming back. I don't think it saves massive amounts of memory, either, especially given my observation with vmmap upthread that in many cases the code itself is a dominant part of the (virtual) memory usage.
zozbot234 1 days ago [-]
If you use an ownership/lifetime system under the hood you only pay that synchronization overhead when ownership truly changes, i.e. when a reference is added or removed that might actually impact the object's lifecycle. That's a rare case with most uses of reference counting; most of the time you're creating a "sub"-reference and its lifetime is strictly bounded by some existing owning reference.
cogman10 1 days ago [-]
There are 2 unavoidable atomic updates for RC, the allocation and the free event. That alone will significantly increase the amount of traffic per thread back to main memory.
A lifetime system could possibly eliminate those, but it'd be hard to add to the JVM at this point. The JVM sort of has it in terms of escape analysis, but that's notoriously easy to defeat with pretty typical java code.
ridiculous_fish 23 hours ago [-]
Why would an allocation require an atomic write for a reference count?
Swift routinely optimizes out reference count traffic.
cogman10 21 hours ago [-]
> Why would an allocation require an atomic write for a reference count?
It won't always require it, but it usually will because you have to ensure the memory containing the reference count is correctly set before handing off a pointer to the item. This has to be done almost first thing in the construction of the item.
It's not impossible that a smart compiler could see and remove that initialization and destruction if it can determine that the item never escapes the current scope. But if it does escape it by, for example, being added to a list or returned from a function, then those two atomic writes are required.
adrian_b 1 days ago [-]
Incrementing or decrementing a shared counter is done with an atomic instruction, not with a locked critical section.
This has negligible overhead in most cases. For instance, if the shared counter is already in some cache memory the overhead is smaller than a normal non-atomic access to the main memory. The intrinsic overhead of an atomic instruction is typically about the same as that of a simple memory access to data that is stored in the L3 cache memory, e.g. of the order of 10 nanoseconds at most.
Moreover, many memory allocators use separate per-core memory heaps, so they avoid any accesses to shared memory that need atomic instructions or locking, except in the rare occasions when they interact with the operating system.
usrnm 1 days ago [-]
Atomic operations, especially RMW operations are very expensive, though. Not as expensive as a syscall, of course, but still a lot more expensive than non-atomic ones. Exactly because they break things like caches
cogman10 1 days ago [-]
Not only that, they write back to main memory. There's limited bandwidth between the CPU and main memory and with multithreading you are looking at pretty significantly increasing the amount of data transferred between the CPU and memory.
This is such a problem that the JVM gives threads their own allocation pools to write to before flushing back to the main heap. All to reduce the number of atomic writes to the pointer tracking memory in the heap.
gwbas1c 1 days ago [-]
That's why Rust has Rc<> for single-threaded structs, and Arc<> for thread-safe structs.
vaylian 1 days ago [-]
Unlikely. Maybe I'm overly optimistic, but I think it's fairly likely that the RAM situation will have sorted itself out in a few years. Adding reference counting to the JVM and .NET would also take considerable time.
It makes more sense for application developers to think about the unnecessary complexity that they add to software.
xyzzy_plugh 1 days ago [-]
That's not strictly true. Mark and sweep is tunable in ways ARC is not. You can increase frequency, reducing memory at the cost of increased compute, for example.
cogman10 1 days ago [-]
M&S also doesn't necessitate having a moving and compacting GC. That's the thing that actually makes the JVM's heap greedy.
Go also does M&S and yet uses less memory. Why? Because go isn't compacting, it's instead calling malloc and free based on the results of each GC. This means that go has slower allocation and a bigger risk of memory fragmentation, but also it keeps the go memory usage reduced compared to the JVM.
griffindor 1 days ago [-]
Nice!
> Peak memory consumption is 1.3 MB. At this point you might want to stop reading and make a guess on how much memory a native code version of the same functionality would use.
I wish I knew the input size when attempting to estimate, but I suppose part of the challenge is also estimating the runtime's startup memory usage too.
> Compute the result into a hash table whose keys are string views, not strings
If the file is mmap'd, and the string view points into that, presumably decent performance depends on the page cache having those strings in RAM. Is that included in the memory usage figures?
Nonetheless, it's a nice optimization that the kernel chooses which hash table keys to keep hot.
The other perspective on this is that we sought out languages like Python/Ruby because the development cost was high, relative to the hardware. Hardware is now more expensive, but development costs are cheaper too.
The take away: expect more push towards efficiency!
pjc50 1 days ago [-]
>> Peak memory consumption is 1.3 MB. At this point you might want to stop reading and make a guess on how much memory a native code version of the same functionality would use.
At this point I'd make two observations:
- how big is the text file? I bet it's a megabyte, isn't it? Because the "naive" way to do it is to read the whole thing into memory.
- all these numbers are way too small to make meaningful distinctions. Come back when you have a gigabyte. It gets more interesting when the file doesn't fit into RAM at all.
> all these numbers are way too small to make meaningful distinctions. Come back when you have a gigabyte.
I have to disagree. Bad performance is often a result of a death of a thousands cuts. This function might be one among countless similarly inefficient library calls, programs and so on.
rcxdude 1 days ago [-]
If you're not putting a representative amount of data through the test, you have no idea if the resource usage you're seeing scales with the amount of data or is just a fixed overhead if the runtime.
1 days ago [-]
kloop 1 days ago [-]
> how big is the text file? I bet it's a megabyte, isn't it?
The edit in the article says ~1.5kb
pjc50 1 days ago [-]
Single page on many systems, which makes using mmap() for it even funnier.
Filligree 1 days ago [-]
Not to mention inefficient in memory use. I would have expected a mention of interning; using string-views is fine, but making it a view of 4kB cache pages is not really.
Though I believe the “naive” streaming read could very well be superior here.
hrmtst93837 12 hours ago [-]
"It only matters at scale" is how you end up with code that hoses mobile apps and embedded boards the moment somebody swaps in a file that is merely bigger than the toy input. These bugs start small. If the code slurps the whole input, staying under 1 GB is not proof of anything, it just means you have not hit the edge yet.
hrmtst93837 23 hours ago [-]
[dead]
hrmtst93837 1 days ago [-]
[dead]
zozbot234 1 days ago [-]
> If the file is mmap'd, and the string view points into that, presumably decent performance depends on the page cache having those strings in RAM.
Not so much, because you only need some fraction of that memory when the program is actually running; the OS is free to evict it as soon as it needs the RAM for something else. Non-file-backed memory can only be evicted by swapping it out and that's way more expensive,
veunes 1 days ago [-]
I suspect it'll be selective
tzot 1 days ago [-]
Well, we can use memoryview for the dict generation avoiding creation of string objects until the time for the output:
import re, operator
def count_words(filename):
with open(filename, 'rb') as fp:
data= memoryview(fp.read())
word_counts= {}
for match in re.finditer(br'\S+', data):
word= data[match.start(): match.end()]
try:
word_counts[word]+= 1
except KeyError:
word_counts[word]= 1
word_counts= sorted(word_counts.items(), key=operator.itemgetter(1), reverse=True)
for word, count in word_counts:
print(word.tobytes().decode(), count)
We could also use `mmap.mmap`.
akx 1 days ago [-]
This doesn't do the same thing though, since it's not Unicode aware.
OP's .split_ascii() doesn't handle U+2009 as well.
edit: OP's fully native C++ version using Pystd
zahlman 1 days ago [-]
Hmm? Which code are you looking at?
contravariant 1 days ago [-]
There's bound to be a way to turn a stream of bytes into a stream of unicode code points (at least I think that's what python is doing for strings). Though I'm explicitly not volunteering to write the code for it.
est 1 days ago [-]
import mmap, codecs
from collections import Counter
def word_count(filepath):
freq = Counter()
decode = codecs.getincrementaldecoder('utf-8')().decode
with open(filepath, 'rb') as f, mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
for chunk in iter(lambda: mm.read(65536), b''):
freq.update(decode(chunk).split())
freq.update(decode(b'', final=True).split())
return freq
zahlman 1 days ago [-]
Sure, but making one string from the file contents is surely much better than having a separate string per word in the original data.
... Ah, but I suppose the existing code hasn't avoided that anyway. (It's also creating regex match objects, but those get disposed each time through the loop.) I don't know that there's really a way around that. Given the file is barely a KB, I rather doubt that the illustrated techniques are going to move the needle.
In fact, it looks as though the entire data structure (whether a dict, Counter etc.) should a relatively small part of the total reported memory usage. The rest seems to be internal Python stuff.
contravariant 1 days ago [-]
For reasons I never quite understood python has a collections.Counter for the purpose of counting things. It's a bit cleaner.
zahlman 13 hours ago [-]
> It's a bit cleaner.
That's pretty much the reason why. Raymond Hettinger explains the philosophy well while discussing the `random` standard library module: https://www.youtube.com/watch?v=Uwuv05aZ6ug
I feel like much of this has been forgotten of late, though. From what I've seen, i's really quite hard to get anything added to the standard library unless you're a core dev who's sufficiently well liked among other core devs, in which case you can pretty much just do it. Everyone else will (understandably) be put through a PhD thesis defense, then asked to try the idea out as a PyPI package first (and somehow also popularize the package), and then if it somehow catches on that way, get declined anyway because it's easy for everyone to just get it from PyPI (see e.g. Requests).
I personally was directed to PyPI once when I was proposing new methods for the builtin `str`. Where the entire point was not to have to import or instantiate anything.
fix4fun 1 days ago [-]
Digression: Nowadays when RAM is expensive good old zram is gaining popularity ;) Try to check on trends.google.com . Since 2025-09 search for it doubled ;)
bcjdjsndon 1 days ago [-]
A few things
- since GC languages became prevalent, and maybe high level programming in general, coders arent as economic with their designs. Memory isn't something a coder should worry about apparently.
- far more people code apps in web languages because they don't know anything else. These are anywhere from 5-10 levels of abstraction away from the metal, naturally inefficient.
- increasing scope... I can only describe this one by example, web browsers must implement all manner of standards etc that it's become a mammoth task, especially compared to 90s. Same for compilers, oses, heck even computers thenselves were all one-man jobs at some point because things were simpler cos we knew less.
dgb23 1 days ago [-]
Not a C++ programmer and I think the solution is neat.
But it's not necessarily an apples to apples comparison. It's not unfair to python because of the runtime overhead. It's unfair because it's a different algorithm with fundamentally different memory characteristics.
A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count. For most people that would be the first/naive approach as well when they programmed something like this I think. And it would showcase what the actual overhead of the python version is.
VorpalWay 1 days ago [-]
> A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count.
Wouldn't memory mapping the data in Python be the more fair comparison? If the language doesn't support that, then this seems to absolutely be a fair comparison.
> For most people that would be the first/naive approach as well when they programmed something like this I think.
I disagree, my mind immediately goes to mmap when I have to deal with a single file that I have to read in it's entirety. I think the non-obvious solution here is rather io-uring (which I would expect to be faster if dealing with lots of small files, as you can load them async concurrently from the file system).
dgb23 1 days ago [-]
I'd make the bet that "most people" (who can program) would not think of mmap, but either about streaming or would even just load the whole thing into memory.
Ask a bunch of coding agents and they will give you these two versions, which means it's likely that the LLMs have seen these way more often than the mmap version. Both Opus and GPT even pushed back when I asked for mmap, both said it would "add complexity".
Filligree 1 days ago [-]
It does add complexity, and the optimal solution is probably not to use it. Consider what happens if a 4kB page has only a single unique word in it—you’d still need to load it to memory to read the string, it just isn’t accounted against your process (maybe).
I would have expected something like this:
- Scan the file serially.
- For each word, find and increment a hash table entry.
- Sort and print.
In theory, technically, this does require slightly more memory—but it’s a tiny amount more; just a copy of each unique word, and if this is natural language then there aren’t very many. Meanwhile, OOP’s approach massively pressures the page cache once you get to the “print” step, which is going to be the bulk of the runtime.
It’s not even a full copy of each unique word, actually, because you’re trading it off against the size of the string pointers. That’s… sixteen bytes minimum. A lot of words are smaller than that.
VorpalWay 1 days ago [-]
That is a valid solution, but what IO block size should you use for the best performance? What if you end up reading half a word at the end of a chunk?
Handling that is in my opinion way more complex than letting the kernel figure it out via mmap. The kernel knows way more than you about the underlying block devices, and you can use madvise with MADV_SEQUENTIAL to indicate that you will read the whole file sequentially. (That might free pages prematurely if you keep references into the data rather than copy the first occurance of each word though, so perhaps not ideal in this scenario.)
zahlman 1 days ago [-]
> It's unfair because it's a different algorithm with fundamentally different memory characteristics. A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count.
The C++ code is still building a tally by incrementing keys of a hash map one at a time, and then dumping (reversed) key/value pairs out into a list and sorting. The file is small and the Python code is GCing the `line` each time through the outer loop. At any rate it seems like a big chunk of the Python memory usage is just constant (sort of; stuff also gets lazily loaded) overhead of the Python runtime, so.
zahlman 1 days ago [-]
> This sounds like a job for Python. Indeed, an implementation takes fewer than 30 lines of code.
I don't know if the implementation is written in a "low-level" way to be more accessible to users of other programming languages, but it can certainly be done more simply leveraging the standard library:
from collections import Counter
import sys
with open(sys.argv[1]) as f:
words = Counter(word for line in f for word in line.split())
for word, count in words.most_common():
print(count, word)
At the very least, manually creating a (count, word) list from the dict items and then sorting and reversing it in-place is ignoring common idioms. `sorted` creates a copy already, and it can be passed a sort key and an option to sort in reverse order. A pure dict version could be:
import sys
with open(sys.argv[1]) as f:
counts = {}
for line in f:
for word in line.split():
counts[word] = counts.get(word, 0) + 1
stats = sorted(counts.items(), key=lambda item: item[1], reverse=True)
for word, count in stats:
print(count, word)
(No, of course none of this is going to improve memory consumption meaningfully; maybe it's even worse, although intuitively I expect it to make very little difference either way. But I really feel like if you're going to pay the price for Python, you should get this kind of convenience out of it.)
Anyway, none of this is exactly revelatory. I was hoping we'd see some deeper investigation of what is actually being allocated. (Although I guess really the author's goal is to promote this Pystd project. It does look pretty neat.)
veunes 1 days ago [-]
Not "C++ everywhere again" but maybe "understanding memory again"
kristianp 22 hours ago [-]
How much memory does the C++ compiler use when compiling the program? I wonder how that compares to the python program? Not a completely unrelated metric.
Would the rust compiler use much more memory compiling a comparable program to the C++ version?
xantronix 17 hours ago [-]
The point is a bit moot. The compiler is the one place where it's okay to be a little spendy, because you're not compiling each time you need to execute the program.
tombert 1 days ago [-]
I've been rewriting a lot of my stuff in Rust to save memory.
Rust is high-level enough to still be fun for me (tokio gives me most of the concurrency goodies I like), but the memory usage is often like 1/10th or less compared to what I would write in Clojure.
Even though I love me some lisp, pretty much all my Clojure utilities are in Rust land now.
perching_aix 19 hours ago [-]
Since we're doing this, I do wonder then: is going from a 1.3K input file to 21K peak memory usage (16x) really optimal?
It's certainly a lot better than 1000x, sure, but still surprised me.
wbsun 1 days ago [-]
It’s less about 'old vs. new' and more about the evolving trade-offs dictated by the constraints of the era. There have always been engineers trying to squeeze every last drop of performance out of the bits available to them.
yakkomajuri 1 days ago [-]
The abrupt ending was funny and then I realized the author is Finnish and it all made sense.
Nice post.
(P.S. I'm also Finnish)
90d 1 days ago [-]
Speaking about optimization, is Windows just too far gone at this point? It is comical the amount of resources it uses at "idle".
callamdelaney 1 days ago [-]
I shove everything in memory, it's a design decision. Memory is still cheap, relatively.
est 1 days ago [-]
I think py version can be shortened as:
from collections import Counter
stats = Counter(x.strip() for l in open(sys.argv[1]) for x in l)
Do you mean something specific, because that sounds like a criticism but with some blanks that need to be filled in.
If you just mean they come across as annoyed by AI, that's true, but that's also way too wide of a category to infer basically anything else about them.
muskstinks 1 days ago [-]
The critisism is valid. The problem is how you value this critism.
I agree they are stealing it but I also see the benefit of it for society and for myself.
Suckerberg downloaded terabytes of books for training, while people around me got sued to hell 20 years ago for downloading one mp3 file.
yieldcrv 1 days ago [-]
they got sued for uploading actually
and Zuck isn’t sued for downloading either, he is sued for reproduction by the AI not being derivative enough, but so far all branches of government support that
anthk 1 days ago [-]
Anna's Archive. Aaron Swartz.
FB and so are CIA fronts and they can do anything they please. Until they hit against Disney and lobbying giants and if a CIA idiot tries to sue/bribe/blackmail them they can order Hollywood to rot their images into pieces with all the wars they promoted in Middle East and Latin America just to fill the wallets of CEO's. That among some social critique movie on FB about getting illegal user data all over the world to deny insurances and whatnot. And OFC with a clear mention of the Epstein case with related people, just in case the Americans forgot about it.
Then the US industry and military complex would collapse in months with brainwashed kids running away from the army. Not to mention to the Call of Duty franchise and the like. It would be the end of Boeing and several more, of course. To hell to profit driven wars for nothing.
Ah, yes, AIPAC lobbies and the like. Good luck taming right wing wackos hating the MAGA cult more than the 'woke' people themselves. These will be the first ones against you after sinking the US image for decades, even more than the illegal Iraq war with no WMD's and the Bush/Cheney mafia.
The outcome of this? proper and serious engineering a la Airbus. Instant profit-driven MBA and war sickos being kicked out from the spot. OFC the AI snakeoil sellers except for the classical AI/NN against concrete cases (image detection and the like), these will survive fine, even better because these kind of jobs are highly specific and they are not statistical text parrots. They can provide granted results unlike LLM's prone to degrade because the human based content feeding needs to be continuous, while for tumour detection a big enough sample can cover a 99% of the cases.
R&D on electric vehicles/energy and nuclear power like nowhere else. And, for sure, the EV equivalent of a Ford T for Americans. A cheap and reliable one, good enough for the common Joe/Mary without being a luxury item. A new Golden Age would rise, for sure.
But the oil mafia will try to fight them like crazy.
MrBuddyCasino 1 days ago [-]
I don't know how anyone can call the most amazing invention in computer science of the last 20 years "copyright infringement factories". We went from the ST:NG ship computer being futuristic tech to "we kinda have this now". Its like calling cars "air pollution factories", as if that was their only purpose and use.
A fundamentally anti-civilisational mindset.
muskstinks 1 days ago [-]
You can see both sides, critzise how its done and still wanting to have the result of it.
Its a little bit hypocritic which often enough ends in realism aka "okay we clearly can't fight their copyright infridgments because they are too powerful and too rich but at least we can use the good side of it".
Nothing btw. enforces all of this to happen THAT fast besides capitalism. We could slow down, we could do it better or more right.
vor_ 1 days ago [-]
I'm sorry, but you're acting obtuse if you pretend you don't know why they're being called that.
saintfire 1 days ago [-]
The people pushing this technology, that accelerates climate change, have lobbied the government to circumvent typical roadblocks created by society to limit sensationalist development. Incidentally, the same people who talk about how dangerous AI will be for society, but don't worry, they're going to be the one to deliver it safely.
Now, I don't believe AI will ever amount to enough to be a critical threat to human life, you know, beyond the immense amounts of wasted energy they propose to convert into something more useful, like a market crash or heat and noise, or both.
Not sure how you can call someone opposed to any of that "anti-civilisational" matter-of-factly.
ElectronCharge 1 days ago [-]
LLMs are amazing technology. It's crazy to interact with something that knows a lot about effectively everything that's ever been written, as well as mimicking human cognition to a large degree.
What LLMs are NOT is intelligent in the same way as a human, which is to say they are not "AGI". They may be loosely AGI-equivalent for certain tasks, software development being the poster child. LLMs have no equivalent of "judgement", and they lie ("hallucinate") with impunity if they don't know the answer. Even with coding, they'll often do the wrong thing, such as writing tests that don't test anything.
It seems likely that LLMs will be one component of a truly conscious AI (AGI+), in the same way our subconscious facility to form sentences is part of our intelligence. We'll see how quickly the other pieces arrive, if ever.
amelius 1 days ago [-]
> AI sociopaths have purchased all the world's RAM in order to run their copyright infringement factories at full blast
The ultimate bittersweet revenge would be to run our algorithms inside the RAM owned by these cloud companies. Should be possible using free accounts.
gostsamo 1 days ago [-]
> how much memory a native code version of the same functionality would use.
native to what? how c++ is more native than python?
VorpalWay 1 days ago [-]
Native code usually refers to code which is compiled to machine code (for the CPU it will run on) ahead of time, as opposed to code running in a byte code VM (possibly with JIT).
I would consider all of C, C++, Zig, Rust, Fortran etc to produce native binaries. While things like Cython exist, that wasn't what was used here (and for various reasons would likely still have more overhead than those I mentioned).
fluoridation 1 days ago [-]
Native to the hardware platform.
yieldcrv 1 days ago [-]
as long as you know what architecture questions to ask, agentic coding can help with this next phase of optimization really quickly
delaying comp sci differentiation for a few months
I wonder if assembly based solutions will become in vogue
I look at memory profiles of rnomal apps and often think "what is burning that memory".
Modern compression works so well, whats happening? Open your taskmaster and look through apps and you might ask yourself this.
For example (lets ignore chrome, ms teams and all the other bloat) sublime consumes 200mb. I have 4 text files open. What is it doing?
Alone for chrome to implement tab suspend took YEARS despite everyone being aware of the issue. And addons existed which were able to do this.
I bought more ram just for chrome...
- 100MB 'image' (ie executable code; the executable itself plus all the OS libraries loaded.)
- 40MB heap
- 50MB "mapped file", mostly fonts opened with mmap() or the windows equivalent
- 45MB stack (each thread gets 2MB)
- 40MB "shareable" (no idea)
- 5MB "unusable" (appears to be address space that's not usable because of fragmentation, not actual RAM)
Generally if something's using a lot of RAM, the answer will be bitmaps of various sorts: draw buffers, decompressed textures, fonts, other graphical assets, and so on. In this case it's just allocated but not yet used heap+stacks, plus 100MB for the code.
Edit: I may be underestimating the role of binary code size. Visual Studio "devenv.exe" is sitting at 2GB of 'image'. Zoom is 500MB. VSCode is 300MB. Much of which are app-specific, not just Windows DLLs.
But isn't it crazy how we throw out so much memory just because of random buffers? It feels wrong to me
There's a common noob complaint about "Linux using all my RAM!" where people are confused about the headline free/buffers numbers. If there's a reasonable chance data could be used again soon it's better to leave it in RAM; if the RAM is needed for something else, the current contents will get paged out. Having a chunk of RAM be genuinely unallocated to anything is doing nothing for you.
Your perspective sounds like the concept that space in a room does nothing for you until/unless you cram it full of hoarded items.
Why would anyone buy a locker 5x the size of their needs ?
Run vi/nano/micro/joe - they're optimizing for memory to some extent. vi clocks in at under 8 MB. You're giving up a lot of "nice" things to get there.
The portions that are allocated but not yet used might just be page table entries with no backing memory, making them free. Except for the memory tracking the page table entries. Almost free....
A lot of "image" will be mmapped and clean. Anything you don't actually use from that will be similarly freeish. Anything that's constantly needed will use memory. Except if it's mapped into multiple processes, then it's needed but responsibility is spread out. How do you count an app's memory usage when there's a big chunk of code that needs to sit in RAM as long as any of a dozen processes are running? How do you count code that might be used sometime in the next few minutes or might not be depending on what the user does?
ASLR is not an obstacle -- the same exact code can be mapped into different base addresses in different processes, so they can be backed by the same actual memory.
Global variables in PIC shared libraries are really weird: the shared library's variable is placed into the main program image data segment and the relocation is happening in the shared library, which means that there is an indirection generated in the library's machine code.
This optimization is essential. A typical process maps in hundreds of megabytes of code from the OS. There are hundreds of processes running at any given time. Eyeballing the numbers on an older Mac I have here (a newer one would surely be worse) I'd need maybe 50GB of RAM just to hold the code of all the running processes if the pages couldn't be shared.
Ten years before that I worked on a bespoke commercial game engine that had its own memory tracker. First thing we did with it was fire up a demo program, attach the memory analyzer to it, then attach a second instance of the memory analyzer to the first one and found a memory error in the memory analyzer.
Now that I'm out of gamedev, I feel like I'm working completely blind. People barely acknowledge the existence of debuggers. I don't know how y'all get anything to work.
A quick google for open-source C++ solutions turns up https://github.com/RudjiGames/MTuner which happens to have been updated today. From a game developer, of course XD
As a corrolary to this: I look at CPU utilization graphs. Programs are completely idle. "What is burning all that CPU?!"
I remember using a computer with RAM measured in two-digit amounts of MiB. CPU measured in low hundreds of MHz. It felt just as fast -- sometimes faster -- as modern computers. Where is all of that extra RAM being used?! Where is all of that extra performance going?! There's no need for it!
If you have to buy extra RAM or pay unnecessary electrical or cooling expenses because the code is bad; it's not their problem. There is no software equivalent to MPG measurements for cars where efficient engine designs are rewarded at the time of purchase.
Likewise engineers are only going to care about doing their job. If the business doesn't reward them from taking on optimization work, why would they do it?
This is not true of all engineers and all businesses. Some businesses really do 'get it' and will allow engineers to work on things that don't directly help stated goals. Some engineers are intrinsically motivated and will choose to work on things despite that work not helping their career.
What I'm really getting is, yes, engineers choose "slower" technologies (e.g. electron, React) because there are other benefits, e.g. being able to get work done faster. This is a completely rational choice even if it does lead to "waste" and poor performance.
Yes, so do I. It was limited to 800x600x16 color mode or 320x200x256. A significant amount of memory gets consumed by graphical assets, especially in web browsers which tend to keep uncompressed copies of images around so they can blit them into position.
But a lot is wasted, often by routing things through single bottlenecks in the whole system. Antivirus programs. Global locks. Syncing to the filesystem at the wrong granularity. And so on.
Of course, some software other than desktop environments have seen important innovation, such as LSPs in IDEs which allows avoiding every IDE implementing support for every language. And SSDs were truly revolutionary in hardware, in making computers feel faster. Modern GPUs can push a lot more advanced graphics as well in games. And so on. My point above was just about your basic desktop environment. Unless you use a tiling window manager (which I tried but never liked) nothing much has happened for a very long time. So just leave it alone please.
Add to that: unicode handling, support for bigger displays, mixed-DPI, networking and device discovery is much less of a faff, sound mixing is better, power management and sleep modes much improved. And some other things I'm forgetting.
I was primarily thinking about enhancements in the user interactions. Things you see on a day to day basis. You really don't see if you use unicode, ASCII, ISO-somenumbers, ShiftJIS etc (except when transferring information between systems).
IE, in a JVM (Java) or dotnet (C#) process, the garbage collector allocates some memory from the operating system and keeps reusing it as it finds free memory and the program needs it.
These systems are built with the assumption that RAM is cheap and CPU cycles aren't, so they are highly optimized CPU-wise, but otherwise are RAM inefficient.
> sublime consumes 200mb. I have 4 text files open. What is it doing?
To add to what others have said: Depending on the platform a good amount will be the system itself, various buffers and caches. If you have a folder open in the side bar, Sublime Text will track and index all the files in there. There's also no limit to undo history that is kept in RAM.
There's also the possibility that that 200MB includes the subprocesses, meaning the two python plugin hosts and any processes your plugins spawn - which can include heavy LSP servers.
https://waspdev.com/articles/2025-11-04/some-software-bloat-...
Visual Studio runs the memory profiler in debug mode right from the start, it is the default configuration, you need to disable it.
https://learn.microsoft.com/en-us/visualstudio/profiling/mem...
>>I look at memory profiles of rnomal apps and often think "what is burning that memory".
Because companies starting with Microsoft approach it as an infinite resource, and have done so literally for generations of programmers — it is now ancient tradition.
Back in the x86 days when both memory and memory handles were constrained (64k of them, iirc) I went to a MS developer conference. One problem starting to plague everyone was users' computers running out of memory when actual memory in use was less than half, and the problem was not that memory was used, but all available handles were consumed.
I randomly ended up talking to the (at the time) leader of the Excel team, so I thought I'd ask him about good practices, asking "Does it make sense to have the software look at the task and make an estimate of the full amount of RAM required and allocate it off one handle and track our usage ourselves within that block?" I was speechless when he answered: "Sure, if you wanted to optimize the snot out of it — we just allocate another handle."
That two-line answer just blew my mind and instantly explained so much about problems I saw at the time, and since.
It also made sense in the context of another talk they gave at a previous conference where the message was they anticipate the increased power of the next generation of hardware and write their new version for that hardware, not the then-current hardware. It makes sense, but in the new light, it seems almost like a cousin of planned obsolescence — "How can we squander all the new power Intel is giving us?". And the result was decades after word processing and spreadsheets had usable performance on 640K DOS machines, new machines with orders of magnitude more power and RAM, actually run slower from a user perspective.
I'm hoping this memory crunch (having postponed a memory upgrade for my daily driver and now noticing it is 10x the price) will at least have the benefit of driving developers to maybe get back some craft of designing in optimization.
Personally I am fine with programmers not spending tons of time optimising down to every last piece because we do have so much more ram and compute relative to the old days. My bigger issue is that things are also a laggy mess even when there is plenty of resources available. I understand these things go hand in hand but I would much rather see more optimisations for the things users will actually notice than just going for metrics. A nice combo of the two would be ideal.
That being said what's probably most appalling is the amount some modern programs hard crash even when they have plenty of resources.
Huh? Sublime Text? I have like 100 files open and it uses 12mb. Sublime is extremely lean.
Do you have plugins installed?
Memroy statistics says 200mb and a peak of 750mb in the past (for whatever reason)
Edit: From what I can tell, Sublime is allocated 100mb of virtual memory even if it's only using about 10mb in practice.
Electron really loves to claim absurd amounts of memory, e.g. slack has claimed just over 1TB of virtual memory, but is only using just north of 200MB.
tr -s '[:space:]' '\n' < file.txt | sort | uniq -c | sort -rn
I’d like to know the memory profile of this. The bottleneck is obviously sort which buffers everything in memory. So if we replace this with awk using a hash map to keep count of unique words, then it’s a much smaller data set in memory:
tr -s '[:space:]' '\n' < file.txt | awk '{c[$0]++} END{for(w in c) print c[w], w}' | sort -rn
I’m guessing this will beat Python and C++?
That's not obvious to me. I checked the manuals for sort(1) in GNU and FreeBSD, and neither of them buffer everything in memory by default. Instead they read chunks to an in-memory buffer, sort each chunk, and (if there are multiple chunks) use the filesystem as temporary storage for an external mergesort.
This sorting program was originally developed with memory-starved computers in mind, and the legacy shows.
In other words, I am not sure if the comparison you are making is a fundamental one.
gnu sort can spill to disk. it has a --buffer-size option if you want to manually control the RAM buffer size, and a --temporary-directory option for instructing it where to spill data to disk during sort if need be.
Memory and storage are not "cheap" anymore. Power may also rise in cost
Under these conditions, memory usage and binary size are irrefutably relevant^1
To some, this might feel like going backwards in time toward the mainframe era. Another current HN item with over 100 points, "Hold on to your hardware", reflects on how consumer hardware may change as a result
To me, the past was a time of greater software efficiency; arguably this was necessitated by cost. Perhaps higher costs in the present and future could lead to better software quality. But whether today's programmers are up for the challenge is debatable. It's like young people in finance whose only experience is in a world with "zero" interest rates. It's easier to whine about lowering rates than to adapt
With the money and poltical support available to "AI" companies, the incentive for efficiency of any kind is lacking. Perhaps their "no limits" operations, e.g., its effects on supply, may provide an incentive for others' efficiency
1. As an underpowered computer user that compiles own OS and writes own simple programs, I've always rejected large binary size and excessive memory use, even in times of "abundance"
Contrast this with Rust, which had the benefit of being developed several decades later. Here Option and str (string views) were in the standard library from the beginning, and every library and application uses them as fundamental vocabulary types. Combined with good support for chaining and working with these types (e.g. Option has map() to replace the content if it exists and just pass it along if None).
Retrofitting is hard, and I have no doubt there will be new ideas that can't really be retrofitted well into Rust in another decade or two as well. Hopefully at that point something new will come along that learned from the mistakes of the past.
std::optional example OTOH is also a bad example because it is heavily opinionated, and having it baked into the API across the standard library would be a really wrong choice to do.
Optional being opinionated I don't think I agree with. It is better to have an optional of something that can't be null (such as a reference) than have everything be implicitly nullable (such as raw pointers). This means you have to care about the nullable case when it can happen, and only when it can happen.
There is a caveat for C++ though: optional<T&> is larger in memory than a rae pointer. Rust optimises this case to be the same size (one pointer) by noting that the zero value can never be valid, so it is a "niche" that can be used for something else, such as the None variant of the Option. Such niche optimisation applies widely across the language, to user defined types as well. That would be impossible tp retrofit on C++ without at the very least breaking ABI, and probably impossible even on a language level. Maybe it could be done on a type by type basis with an attribute to opt in.
The caveat is that niche optimizations are not perfectly portable, they can have edge cases. Strict portability is likely why the C++ standard makes niche optimization optional.
This is a C++26 feature which will have pointer-like semantics, aren't you confusing it with optional<reference_wrapper<T>> ?
My point is ownership being transferred implicitly in a struct assignment is a complexity introduced by C++.
In C the concern of allocating memory and using it is separate.
String_view is attempt to add more separation. But C programmers were already there.
(And the possibility to implement whatever you want, ofc.)
I wonder if frameworks like dotnet or JVM will introduce reference counting as a way to lower the RAM footprint?
A lifetime system could possibly eliminate those, but it'd be hard to add to the JVM at this point. The JVM sort of has it in terms of escape analysis, but that's notoriously easy to defeat with pretty typical java code.
Swift routinely optimizes out reference count traffic.
It won't always require it, but it usually will because you have to ensure the memory containing the reference count is correctly set before handing off a pointer to the item. This has to be done almost first thing in the construction of the item.
It's not impossible that a smart compiler could see and remove that initialization and destruction if it can determine that the item never escapes the current scope. But if it does escape it by, for example, being added to a list or returned from a function, then those two atomic writes are required.
This has negligible overhead in most cases. For instance, if the shared counter is already in some cache memory the overhead is smaller than a normal non-atomic access to the main memory. The intrinsic overhead of an atomic instruction is typically about the same as that of a simple memory access to data that is stored in the L3 cache memory, e.g. of the order of 10 nanoseconds at most.
Moreover, many memory allocators use separate per-core memory heaps, so they avoid any accesses to shared memory that need atomic instructions or locking, except in the rare occasions when they interact with the operating system.
This is such a problem that the JVM gives threads their own allocation pools to write to before flushing back to the main heap. All to reduce the number of atomic writes to the pointer tracking memory in the heap.
It makes more sense for application developers to think about the unnecessary complexity that they add to software.
Go also does M&S and yet uses less memory. Why? Because go isn't compacting, it's instead calling malloc and free based on the results of each GC. This means that go has slower allocation and a bigger risk of memory fragmentation, but also it keeps the go memory usage reduced compared to the JVM.
> Peak memory consumption is 1.3 MB. At this point you might want to stop reading and make a guess on how much memory a native code version of the same functionality would use.
I wish I knew the input size when attempting to estimate, but I suppose part of the challenge is also estimating the runtime's startup memory usage too.
> Compute the result into a hash table whose keys are string views, not strings
If the file is mmap'd, and the string view points into that, presumably decent performance depends on the page cache having those strings in RAM. Is that included in the memory usage figures?
Nonetheless, it's a nice optimization that the kernel chooses which hash table keys to keep hot.
The other perspective on this is that we sought out languages like Python/Ruby because the development cost was high, relative to the hardware. Hardware is now more expensive, but development costs are cheaper too.
The take away: expect more push towards efficiency!
At this point I'd make two observations:
- how big is the text file? I bet it's a megabyte, isn't it? Because the "naive" way to do it is to read the whole thing into memory.
- all these numbers are way too small to make meaningful distinctions. Come back when you have a gigabyte. It gets more interesting when the file doesn't fit into RAM at all.
The state of the art here is : https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times... , wherein our hero finds the terrible combination of putting the whole file in a single string and then running strlen() on it for every character.
I have to disagree. Bad performance is often a result of a death of a thousands cuts. This function might be one among countless similarly inefficient library calls, programs and so on.
The edit in the article says ~1.5kb
Though I believe the “naive” streaming read could very well be superior here.
Not so much, because you only need some fraction of that memory when the program is actually running; the OS is free to evict it as soon as it needs the RAM for something else. Non-file-backed memory can only be evicted by swapping it out and that's way more expensive,
edit: OP's fully native C++ version using Pystd
... Ah, but I suppose the existing code hasn't avoided that anyway. (It's also creating regex match objects, but those get disposed each time through the loop.) I don't know that there's really a way around that. Given the file is barely a KB, I rather doubt that the illustrated techniques are going to move the needle.
In fact, it looks as though the entire data structure (whether a dict, Counter etc.) should a relatively small part of the total reported memory usage. The rest seems to be internal Python stuff.
That's pretty much the reason why. Raymond Hettinger explains the philosophy well while discussing the `random` standard library module: https://www.youtube.com/watch?v=Uwuv05aZ6ug
I feel like much of this has been forgotten of late, though. From what I've seen, i's really quite hard to get anything added to the standard library unless you're a core dev who's sufficiently well liked among other core devs, in which case you can pretty much just do it. Everyone else will (understandably) be put through a PhD thesis defense, then asked to try the idea out as a PyPI package first (and somehow also popularize the package), and then if it somehow catches on that way, get declined anyway because it's easy for everyone to just get it from PyPI (see e.g. Requests).
I personally was directed to PyPI once when I was proposing new methods for the builtin `str`. Where the entire point was not to have to import or instantiate anything.
- since GC languages became prevalent, and maybe high level programming in general, coders arent as economic with their designs. Memory isn't something a coder should worry about apparently.
- far more people code apps in web languages because they don't know anything else. These are anywhere from 5-10 levels of abstraction away from the metal, naturally inefficient.
- increasing scope... I can only describe this one by example, web browsers must implement all manner of standards etc that it's become a mammoth task, especially compared to 90s. Same for compilers, oses, heck even computers thenselves were all one-man jobs at some point because things were simpler cos we knew less.
But it's not necessarily an apples to apples comparison. It's not unfair to python because of the runtime overhead. It's unfair because it's a different algorithm with fundamentally different memory characteristics.
A fairer comparison would be to stream the file in C++ as well and maintain internal state for the count. For most people that would be the first/naive approach as well when they programmed something like this I think. And it would showcase what the actual overhead of the python version is.
Wouldn't memory mapping the data in Python be the more fair comparison? If the language doesn't support that, then this seems to absolutely be a fair comparison.
> For most people that would be the first/naive approach as well when they programmed something like this I think.
I disagree, my mind immediately goes to mmap when I have to deal with a single file that I have to read in it's entirety. I think the non-obvious solution here is rather io-uring (which I would expect to be faster if dealing with lots of small files, as you can load them async concurrently from the file system).
Ask a bunch of coding agents and they will give you these two versions, which means it's likely that the LLMs have seen these way more often than the mmap version. Both Opus and GPT even pushed back when I asked for mmap, both said it would "add complexity".
I would have expected something like this:
- Scan the file serially.
- For each word, find and increment a hash table entry.
- Sort and print.
In theory, technically, this does require slightly more memory—but it’s a tiny amount more; just a copy of each unique word, and if this is natural language then there aren’t very many. Meanwhile, OOP’s approach massively pressures the page cache once you get to the “print” step, which is going to be the bulk of the runtime.
It’s not even a full copy of each unique word, actually, because you’re trading it off against the size of the string pointers. That’s… sixteen bytes minimum. A lot of words are smaller than that.
Handling that is in my opinion way more complex than letting the kernel figure it out via mmap. The kernel knows way more than you about the underlying block devices, and you can use madvise with MADV_SEQUENTIAL to indicate that you will read the whole file sequentially. (That might free pages prematurely if you keep references into the data rather than copy the first occurance of each word though, so perhaps not ideal in this scenario.)
The C++ code is still building a tally by incrementing keys of a hash map one at a time, and then dumping (reversed) key/value pairs out into a list and sorting. The file is small and the Python code is GCing the `line` each time through the outer loop. At any rate it seems like a big chunk of the Python memory usage is just constant (sort of; stuff also gets lazily loaded) overhead of the Python runtime, so.
I don't know if the implementation is written in a "low-level" way to be more accessible to users of other programming languages, but it can certainly be done more simply leveraging the standard library:
At the very least, manually creating a (count, word) list from the dict items and then sorting and reversing it in-place is ignoring common idioms. `sorted` creates a copy already, and it can be passed a sort key and an option to sort in reverse order. A pure dict version could be: (No, of course none of this is going to improve memory consumption meaningfully; maybe it's even worse, although intuitively I expect it to make very little difference either way. But I really feel like if you're going to pay the price for Python, you should get this kind of convenience out of it.)Anyway, none of this is exactly revelatory. I was hoping we'd see some deeper investigation of what is actually being allocated. (Although I guess really the author's goal is to promote this Pystd project. It does look pretty neat.)
Would the rust compiler use much more memory compiling a comparable program to the C++ version?
Rust is high-level enough to still be fun for me (tokio gives me most of the concurrency goodies I like), but the memory usage is often like 1/10th or less compared to what I would write in Clojure.
Even though I love me some lisp, pretty much all my Clojure utilities are in Rust land now.
It's certainly a lot better than 1000x, sure, but still surprised me.
Nice post.
(P.S. I'm also Finnish)
from collections import Counter
stats = Counter(x.strip() for l in open(sys.argv[1]) for x in l)
If you just mean they come across as annoyed by AI, that's true, but that's also way too wide of a category to infer basically anything else about them.
I agree they are stealing it but I also see the benefit of it for society and for myself.
Suckerberg downloaded terabytes of books for training, while people around me got sued to hell 20 years ago for downloading one mp3 file.
and Zuck isn’t sued for downloading either, he is sued for reproduction by the AI not being derivative enough, but so far all branches of government support that
FB and so are CIA fronts and they can do anything they please. Until they hit against Disney and lobbying giants and if a CIA idiot tries to sue/bribe/blackmail them they can order Hollywood to rot their images into pieces with all the wars they promoted in Middle East and Latin America just to fill the wallets of CEO's. That among some social critique movie on FB about getting illegal user data all over the world to deny insurances and whatnot. And OFC with a clear mention of the Epstein case with related people, just in case the Americans forgot about it.
Then the US industry and military complex would collapse in months with brainwashed kids running away from the army. Not to mention to the Call of Duty franchise and the like. It would be the end of Boeing and several more, of course. To hell to profit driven wars for nothing.
Ah, yes, AIPAC lobbies and the like. Good luck taming right wing wackos hating the MAGA cult more than the 'woke' people themselves. These will be the first ones against you after sinking the US image for decades, even more than the illegal Iraq war with no WMD's and the Bush/Cheney mafia.
The outcome of this? proper and serious engineering a la Airbus. Instant profit-driven MBA and war sickos being kicked out from the spot. OFC the AI snakeoil sellers except for the classical AI/NN against concrete cases (image detection and the like), these will survive fine, even better because these kind of jobs are highly specific and they are not statistical text parrots. They can provide granted results unlike LLM's prone to degrade because the human based content feeding needs to be continuous, while for tumour detection a big enough sample can cover a 99% of the cases.
R&D on electric vehicles/energy and nuclear power like nowhere else. And, for sure, the EV equivalent of a Ford T for Americans. A cheap and reliable one, good enough for the common Joe/Mary without being a luxury item. A new Golden Age would rise, for sure. But the oil mafia will try to fight them like crazy.
A fundamentally anti-civilisational mindset.
Its a little bit hypocritic which often enough ends in realism aka "okay we clearly can't fight their copyright infridgments because they are too powerful and too rich but at least we can use the good side of it".
Nothing btw. enforces all of this to happen THAT fast besides capitalism. We could slow down, we could do it better or more right.
Now, I don't believe AI will ever amount to enough to be a critical threat to human life, you know, beyond the immense amounts of wasted energy they propose to convert into something more useful, like a market crash or heat and noise, or both.
Not sure how you can call someone opposed to any of that "anti-civilisational" matter-of-factly.
What LLMs are NOT is intelligent in the same way as a human, which is to say they are not "AGI". They may be loosely AGI-equivalent for certain tasks, software development being the poster child. LLMs have no equivalent of "judgement", and they lie ("hallucinate") with impunity if they don't know the answer. Even with coding, they'll often do the wrong thing, such as writing tests that don't test anything.
It seems likely that LLMs will be one component of a truly conscious AI (AGI+), in the same way our subconscious facility to form sentences is part of our intelligence. We'll see how quickly the other pieces arrive, if ever.
The ultimate bittersweet revenge would be to run our algorithms inside the RAM owned by these cloud companies. Should be possible using free accounts.
native to what? how c++ is more native than python?
I would consider all of C, C++, Zig, Rust, Fortran etc to produce native binaries. While things like Cython exist, that wasn't what was used here (and for various reasons would likely still have more overhead than those I mentioned).
delaying comp sci differentiation for a few months
I wonder if assembly based solutions will become in vogue