www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why Bloat Is Still =?UTF-8?B?U29mdHdhcmXigJlz?= Biggest Vulnerability

reply tim <not_valid_email gmail.com> writes:
I thought I would get a discussion started on software bloat.

Maybe D can be part of the solution to this problem?
Feb 12
parent reply tim <not_valid_email gmail.com> writes:
On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:
 I thought I would get a discussion started on software bloat.

 Maybe D can be part of the solution to this problem?
oops forgot link to article, https://spectrum.ieee.org/lean-software-development
Feb 12
next sibling parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:
 On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:
 I thought I would get a discussion started on software bloat.

 Maybe D can be part of the solution to this problem?
oops forgot link to article, https://spectrum.ieee.org/lean-software-development
Agreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess. /P
Feb 12
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via Digitalmars-d
wrote:
 On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:
 On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:
 I thought I would get a discussion started on software bloat.
 
 Maybe D can be part of the solution to this problem?
No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build. Tools like cargo or dub actively encourage this model of software development. Which is utterly crazy, if you think about it. Unless you pin every dependency to exact versions (who even does that?!), every time you build your code you're potentially getting a (subtly) different set of dependencies. That means the program you've been trying to debug 5 mins ago may not even be the same program you're debugging now. Now of course it's possible to turn off this behaviour while debugging, but still, the fact that that's the default behaviour is just nuts. Over the long term, this means that you cannot reliably reproduce older versions of your software -- because the versions of dependencies version 1.0 depended on may not even exist anymore, now that your program is at version 2.0. If your customer reports a problem, you have no way of debugging it; you can't even reproduce the exact image your customer is running anymore, let alone make any fixes to it. The only thing left to do is to tell them "just upgrade to the latest version". Which is the kind of insanity that's familiar to everyone of us these days. Nevermind the fallacy that "newer == better". Especially not in the current atmosphere of software development, where so-called "patch" releases are not patch releases at all, but full-featured new releases complete with full-fledged new, untested features (because why waste resources making a patch release + a separate new feature release, when you can just bundle the two together, save development costs, and give Marketing all the more excuse to push new features onto customers and thereby making more money). The number of bugs introduced with each "patch" release may well exceed the number of bugs fixed. All this not even to mention the insanity that sometimes specifying just *one* dependency will pull in tens or even hundreds of recursive dependencies. A hello world program depends on a standard I/O package, which in turn depends on a date-formatting package, which in turn depends on the locales package, which in turn depends on the internet timeserver client package, which depends on the crytography package, ad nauseaum. And so it takes a totally insane amount of packages just to print Hello World on the screen. Not to mention the whole concept of depending on some 3rd party code that exists on some remote server somewhere out there on the wild wild west (www) of the 'net is just crazy. The article linked below alludes to obsolete NPM / Node packages being taken over by malicious actors in order to inject malicious code into unwitting software. There's also the problem that your code is not compilable if for whatever reason you lost network connectivity. Which means if you suddenly find yourself in an emergency and have to make a small fix to your program, you won't be able to recompile it. Good luck.
 https://spectrum.ieee.org/lean-software-development
Agreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess.
[...] Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address. Today's web app scene is exemplary of the insanity in software development. It takes GBs of memory and multicore GHz CPUs to run a ridiculously complex web browser in order to be able to run some bloated web app with tons of external dependencies at the *same speed* as an equivalent lean native program in the 80's used to run on 64 KB of memory and a 16 kHz single-core CPU. What's wrong with the picture here? And don't even get me started on the IoT scene, which is a mind-bogglingly insane concept in and of itself. Why does my toaster need to run a million LoC operating system sporting an *internet connection*?! Or indeed, a *stuffed animal toy* that some well-meaning parent give my son as a "gift", that has a built-in internet interface that can be used for downloading audio clips (it's cute, it downloaded a clip of my son's name so that the toy could address him by name -- WHY OH WHY... argh). I betcha said OS running on this thing has not been updated (and isn't ever going to be) for at least 5 years, and carries who knows how many unpatched security vulnerabilities. I wouldn't be surprised if a good chunk of today's botnets consist of exploited household appliances running far too much more software than they actually require for their primary operations. Perhaps this internet-"enabled" stuffed animal is among the esteemed members of such a botnet. (Thankfully the battery has run out since -- and I'm not planning to replace it, ever. Sorry, botnet.) These are just milder examples of the IoT madness. Don't get me started on internet-enabled webcams that can be (and have been) used for far more nefarious purposes than running some script kiddie's botnet. Years ago, if somebody had told me that some random car driving by the house could hack into my babycam and make it emit a scary noise to scare the baby, I'd have laughed them out of the house as some delusive paranoid. Unfortunately, today this is actual reality, no thanks to insecure misconfigured WiFi routers whose OS haven't been updated in eons and household appliances having internet access that they have no business to. In principle, the same thing applies to Docker images that contain far more stuff than they rightly should. No thanks to these non-solutions to security issues, nowadays it's no longer enough to keep up with your OS's security patches, because patching the host OS does not patch the OSes bundled with each Docker image. And for many applications, nobody's gonna patch their Docker images (the whole reason they went the route of Docker is because they can't be bothered with actual, proper integration with their host OS, they just want to target a static known OS that works for their broken code, and therefore have zero incentive to make any changes at all now that their code works). So your host OS may very well be completely patched, but thanks to these needlessly bloated Docker images your PC still has as many security holes as a cheese grater. // And there's the totally insane concept of running arbitrary code from unknown, untrusted online sources. Javascript, ActiveX, scripting in emails, in documents, etc.. Eye-candy for the customer, completely unnecessary functionally-speaking, and an absolute catastrophe security-wise. The entire concept is flawed to begin with, and things like sandboxing, etc., are merely afterthoughts, bandages that don't actually fix the festering wound underneath. Sooner or later something will give. And the past 20 or so years of internet history proves this over and over again, to this very day. But in spite of the countless arbitrary-code execution vulnerabilities, nobody is ready to tackle the root of the problem: 3rd party code from unknown, untrusted online sources have NO BUSINESS running on my PC. But almost every major application these days are literally dying in their eagerness to run such code -- by default. Your browser, your email reader, your word processor, your spreadsheet app, just about everything, really, just can't wait to get their hands on some fresh unknown 3rd party code in order to run it at the user's expense. And the usual anemic response when a major exploit happens shows that what the security community is doing -- all they can do given the circumstances, really -- is, to quote Walter again, merely plugging individual holes in a cheese grater. // The underlying problem is that the incentives in software development are all wrong these days. Instead of incentivising code quality, security, and conservation of resources, the primary incentive is money. I.e., ship software as early as possible in order to beat your competitors, which in practice means do as little work as you can possibly get away with in order to get the product out the door. Code quality is a secondary concern (we're gonna throw it all out by next release anyway), conservation of resources is a non-issue (resources are cheap, just tell the customer to buy the latest and greatest hardware, our hardware partners will give us a kick-back for the free promotion), and security isn't even on the list. Developing software the "right" way is not profitable; questionable practices like importing millions of LoC from dynamic remote dependencies get the job done faster and leads to more profit, therefore that's what people will do. And of course, this state of incentives is good for big companies that are making huge profits off it, so they're not going to let things change for the better as long as they have a say in it. And they're the ones that are employing and paying programmers to produce this trash, so anyone who doesn't agree with them won't last very long in this career. Therefore guess what kind of code the majority of programmers are producing every day. Definitely not lean, security-conscious code. As someone once joked, the most profitable software venture is a business of two departments: virus writers and anti-virus development. Welcome to software development hell. T -- Life is complex. It consists of real and imaginary parts. -- YHL
Feb 12
next sibling parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
 On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via 
 Digitalmars-d wrote:
 On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:
 On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:
<snips>
 https://spectrum.ieee.org/lean-software-development
Agreed .. two days ago I needed to pull a 13GB docker image from Nvidia repository ... a totally out of control mess.
[...] Reducing code size is, to paraphrase Walter, to plug one hole in a cheese grater. There are so other many things wrong with the present state of software that code size doesn't even begin to address.
Hey, at the end the title of the post is: Why Bloat Is Still Software’s __Biggest__ Vulnerability Let's start to plug the biggest! :-P Long story short, the docker images was the last resource after having lost a three hours battle against PIP and conflicting dependencies, trying run 2 years old code (python ML environments sometimes is just crazy). Note that also using PIP involved GB of download, tensorflow, keras, etc
Feb 12
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Feb 12, 2024 at 05:48:32PM +0000, Paolo Invernizzi via Digitalmars-d
wrote:
 On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
[...]
 Reducing code size is, to paraphrase Walter, to plug one hole in a
 cheese grater. There are so other many things wrong with the present
 state of software that code size doesn't even begin to address.
Hey, at the end the title of the post is: Why Bloat Is Still Software’s __Biggest__ Vulnerability Let's start to plug the biggest! :-P
I'm skeptical whether it's the biggest. There are many holes in a cheese grater; plugging each one individually will always leave you with more holes afterwards. And they are all more-or-less the same size. :-D However nobody seems willing to entertain the possibility of removing the cheese grater altogether, which would be a much better solution.
 Long story short, the docker images was the last resource after having
 lost a three hours battle against PIP and conflicting dependencies,
 trying run 2 years old code (python ML environments sometimes is just
 crazy). Note that also using PIP involved GB of download, tensorflow,
 keras, etc
Which is why I said that these are all just holes in a cheese grater. Conflicting dependencies and the inability to compile old code are well-known (to me) symptoms of today's model of software development. I won't go so far as to say that anything requiring GBs of downloads is inherently broken -- perhaps for some applications, large amounts of code / data *is* unavoidable. But I can't believe that the *majority* of dependencies would require such incommensurate amounts of resources. At the most I'd expect one or two specialised dependencies that might need this, not every other package in your typical online code repo. // When I was in college in the 90's code reuse was a big topic. Everyone was talking about coding for libraries so that you don't have to reinvent the wheel. Eventually that led to DLL hell in the Windows world and .so hell in the Posix world. After 30 years, people are moving away from OS-level dependencies (DLLs and shared libs) to the likes of cargo, npm, dub, and the like. However, the underlying problem of dependency hell has not been solved. I'm at the point where I'm ready to call BS on the whole concept of code reuse. So I've gradually come to the conclusion that code reuse, i.e., dependencies, is inherently evil, and should be avoided like the plague unless you absolutely have no other choice. And where it can't be avoided, it should be as shallow as possible. The best dependencies are single-file dependencies like Adam's arsd libs, where you can literally copy the file into your workspace and just compile. The second best dependency is the single package, where you copy/clone the files into some subdir in your workspace and off you go. The worst kind of dependency is the one that recursively depends on other packages. These should be avoided as much as possible, because it's here that NP-completeness and dependency hell begin, and it's here where madness like multi-GB docker images is born. Copy-pasta is oft-maligned, and I agree that it's evil when it happens within a project. But I'm at the point where I'm almost ready to declare that copy-pasta is actually good and beneficial when it happens across projects. Much better to just copy the darned code into your local repo and modify it to whatever you need it to do, than to declare a dreaded dependency that's the beginning of the slippery slope into dependency hell and the inclusion of millions of lines of code bloat into your project. T -- What are you when you run out of Monet? Baroque.
Feb 12
prev sibling next sibling parent M. M. <matus email.cz> writes:
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
 On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via 
 Digitalmars-d wrote:
 [...]
No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build. Tools like cargo or dub actively encourage this model of software development. [...]
I enjoyed reading this. I largely agree with what you said. I also agree with your later post about ideal dependencies (like single files from arsd or single packages).
Feb 12
prev sibling next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
 All this not even to mention the insanity that sometimes 
 specifying just *one* dependency will pull in tens or even 
 hundreds of recursive dependencies. A hello world program 
 depends on a standard I/O package, which in turn depends on a 
 date-formatting package, which in turn depends on the locales 
 package, which in turn depends on the internet timeserver 
 client package, which depends on the crytography package, ad 
 nauseaum.  And so it takes a totally insane amount of packages 
 just to print Hello World on the screen.
"Funny" example of that. I wanted to learn of to do a react project from scratch. Not using a framework or anything, just pieces the stuff together to make it work myself. So babel, webpack, react, jest for testing and stylex for CSS. That's it. Arguably a lot by some standard, but by no means something wild, the JS equivalent of a build system and a test framework. The project currently has 1103 dependencies. Voila. Pure madness.
Feb 12
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Feb 12, 2024 at 11:49:31PM +0000, deadalnix via Digitalmars-d wrote:
 On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
 All this not even to mention the insanity that sometimes specifying
 just *one* dependency will pull in tens or even hundreds of
 recursive dependencies. A hello world program depends on a standard
 I/O package, which in turn depends on a date-formatting package,
 which in turn depends on the locales package, which in turn depends
 on the internet timeserver client package, which depends on the
 crytography package, ad nauseaum.  And so it takes a totally insane
 amount of packages just to print Hello World on the screen.
 
"Funny" example of that. I wanted to learn of to do a react project from scratch. Not using a framework or anything, just pieces the stuff together to make it work myself. So babel, webpack, react, jest for testing and stylex for CSS. That's it. Arguably a lot by some standard, but by no means something wild, the JS equivalent of a build system and a test framework. The project currently has 1103 dependencies. Voila. Pure madness.
Recently while working on my minimal druntime for wasm (one primary motivation for which is to let me write D code when I absolutely can't escape the tyranny of the browser instead of having to deal with the madness that is the JS ecosystem), I noticed a lot of cruft in druntime and Phobos, stuff that got piled on because we added this or that new feature / type modifier / etc.. Code that used to be straightforward acquired layers of complexity because now we have to deal with this or that case that we didn't need to worry about before. Also, past mistakes that we're still paying for, like the ubiquity of TypeInfos in internal APIs dating from when D didn't have templates. The recent push to templatize druntime has actually been a great saver, though: I got things like array operations working without incurring the bloat of TypeInfo's thanks to the current compiler emitting template calls instead of TypeInfo-dependent static calls for said operations. I think this is a very important step to move Phobos/druntime toward a pay-as-you-use model instead of the upfront cost of TypeInfo's. If only more projects are built with the pay-as-you-use model instead of the blanket "I need this dependency, let's pull in the whole hairball of recursive dependencies too". In an ideal world, things like std.stdio would only import things like floating-point formatting code only if you actually use %f and pass a float/double to format(). Otherwise it won't even import the module and you won't pull in anything that you don't actually need. (I actually have an incomplete replica of std.format written according to this philosophy -- it doesn't even instantiate floating-point formatting code unless you actually passed a float to format() at some point. In an ideal world the whole of druntime / phobos would be built this way, tiny standalone pieces that only get pulled in with actual use. Not like the last time I checked, where a Hello World program for some inscrutable reason pulled in BigInt code into the executable.) T -- Democracy: The triumph of popularity over principle. -- C.Bond
Feb 12
prev sibling next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Monday, 12 February 2024 at 17:30:23 UTC, H. S. Teoh wrote:
 On Mon, Feb 12, 2024 at 03:55:50PM +0000, Paolo Invernizzi via 
 Digitalmars-d wrote:
 On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:
 On Monday, 12 February 2024 at 14:49:02 UTC, tim wrote:
 I thought I would get a discussion started on software 
 bloat.
 
 Maybe D can be part of the solution to this problem?
No amount of D innovation is going to stop programmers infected with the madness of dynamic remote dependencies that pull in an arbitrary number of external modules. Potentially a different set of them every time you build. Tools like cargo or dub actively encourage this model of software development.
Nothing can stop irresponsibility completely, but you can go a long way to simplify and encourage responsibility and sanity. A c program will have less dependencies then a js one.
 Reducing code size is, to paraphrase Walter, to plug one hole 
 in a cheese grater. There are so other many things wrong with 
 the present state of software that code size doesn't even begin 
 to address.
I dont get this medafore, surely less grates makes clogging easier
Feb 12
prev sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 13/02/2024 6:30 AM, H. S. Teoh wrote:
 No amount of D innovation is going to stop programmers infected with the 
 madness of dynamic remote dependencies that pull in an arbitrary number 
 of external modules. Potentially a different set of them every time you 
 build. Tools like cargo or dub actively encourage this model of software 
 development.
 
 
 Which is utterly crazy, if you think about it. Unless you pin every 
 dependency to exact versions (who even does that?!), every time you 
 build your code you're potentially getting a (subtly) different set of 
 dependencies. That means the program you've been trying to debug 5 mins 
 ago may not even be the same program you're debugging now. Now of course 
 it's possible to turn off this behaviour while debugging, but still, the 
 fact that that's the default behaviour is just nuts.
What? Dub doesn't upgrade dependencies for you without you asking for it. It either has to be missing, or you ran ``dub upgrade``. To prevent that being an issue long term, you can vendor your cache into your repository. ``dub build --cache=local``. Unfortunately you have to provide that on cli every time. There are solutions here for those who care about it. If you don't care about it, of course it isn't a solved problem.
Feb 12
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Feb 13, 2024 at 01:56:08PM +1300, Richard (Rikki) Andrew Cattermole via
Digitalmars-d wrote:
 On 13/02/2024 6:30 AM, H. S. Teoh wrote:
[...]
 Which is utterly crazy, if you think about it. Unless you pin every
 dependency to exact versions (who even does that?!), every time you
 build your code you're potentially getting a (subtly) different set
 of dependencies. That means the program you've been trying to debug
 5 mins ago may not even be the same program you're debugging now.
 Now of course it's possible to turn off this behaviour while
 debugging, but still, the fact that that's the default behaviour is
 just nuts.
What? Dub doesn't upgrade dependencies for you without you asking for it. It either has to be missing, or you ran ``dub upgrade``. To prevent that being an issue long term, you can vendor your cache into your repository. ``dub build --cache=local``. Unfortunately you have to provide that on cli every time. There are solutions here for those who care about it. If you don't care about it, of course it isn't a solved problem.
And that's the point, *by default* you get the bad behaviour, you have to work to make it do the right thing. You have the put in the effort to learn to use `--cache=local` (and you have to know enough to even be aware that you might need to use it in the first place -- most people wouldn't even care 'cos they don't even know this is an issue). I'm not singling out dub here, I'm talking about the entire philosophy behind dub and similar tools. The defaults are very much designed such that you would just pull in hairball dependencies automatically without needing to give it so much as a thought. There is little, if any at all, incentive to make do with as little as possible to get your job done. On the contrary, the whole idea is very much to "buy the package deal", so to speak, download (and build and link) the entire bundle of stuff which gives you everything, bells and whistles and all, even if you actually only need to use less than 10% of it. A million LoC OS just to open the garage door, as the linked article puts it. T -- Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".
Feb 12
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
 The European Union has launched three pieces of legislation to this effect
Well, that'll fix it!
Feb 12
parent deadalnix <deadalnix gmail.com> writes:
On Monday, 12 February 2024 at 18:45:26 UTC, Walter Bright wrote:
 The European Union has launched three pieces of legislation
to this effect Well, that'll fix it!
This software has dependencies, do you agree?
Feb 12
prev sibling parent Kagamin <spam here.lot> writes:
On Monday, 12 February 2024 at 15:03:01 UTC, tim wrote:
 https://spectrum.ieee.org/lean-software-development
I hate bloated software too. Someone said here phobos reaches 64000 symbol limit so can't grow larger. Wait but what functionality phobos has? It's mostly templated on top of that. I wrote an experimental library, it has allocator, bitmanip, prng, constant time hex (for lulz), collections, logging, some crypto, hashes, file i/o, json, xml, networking (tcp, ssl, async), processes and threads, runtime, abstract stream, time and stopwatch, 319KB of text in total including unittests. I wonder if I'm doing it wrong. Encrypted shadow file exchange tool compiles to 24KB, I wrote it because I wanted to send files between machines running different OSes. HTTP client is 400 lines of code, it's small enough I didn't bother to extract it into a library, just copy it on demand, mostly because it's not clear what to do with pipelining across several requests.
Feb 13