www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Coding for solid state drives

reply Walter Bright <newshound2 digitalmars.com> writes:
http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives/

An interesting article. Anyone want to see if there are any modifications we 
should make to std.stdio to work better with SSDs? (Such as changing the buffer 
sizes.)
Apr 24 2015
next sibling parent "weaselcat" <weaselcat gmail.com> writes:
On Friday, 24 April 2015 at 08:27:06 UTC, Walter Bright wrote:
 http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives/

 An interesting article. Anyone want to see if there are any 
 modifications we should make to std.stdio to work better with 
 SSDs? (Such as changing the buffer sizes.)
Part 3 covers read/write optimizations in specific for anyone interested in reading.
Apr 24 2015
prev sibling next sibling parent reply "tcak" <tcak gmail.com> writes:
On Friday, 24 April 2015 at 08:27:06 UTC, Walter Bright wrote:
 http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives/

 An interesting article. Anyone want to see if there are any 
 modifications we should make to std.stdio to work better with 
 SSDs? (Such as changing the buffer sizes.)
This article is not about "coding", but information about SSDs. Considering spinning drives and SSDs separately means create two separate configurations for software. So you either: 1. Provide two separate code one is written for one configuration, and another for SSD. This way the performance can be kept high, 2. Configurations can be changed on run-time, so there will be one executable only. But values won't be constant, so there is no compile time determination of values (buffer size etc.) For most end user, 2nd is suitable.
Apr 24 2015
next sibling parent "Kagamin" <spam here.lot> writes:
Shouldn't system cache apply appropriate sync policy?
Apr 24 2015
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/24/2015 2:18 AM, tcak wrote:
 On Friday, 24 April 2015 at 08:27:06 UTC, Walter Bright wrote:
 http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives/


 An interesting article. Anyone want to see if there are any modifications we
 should make to std.stdio to work better with SSDs? (Such as changing the
 buffer sizes.)
This article is not about "coding", but information about SSDs.
The section I linked to is definitely about coding for SSDs.
 Considering spinning drives and SSDs separately means create two separate
 configurations for software. So you either:

 1. Provide two separate code one is written for one configuration, and another
 for SSD. This way the performance can be kept high,

 2. Configurations can be changed on run-time, so there will be one executable
 only. But values won't be constant, so there is no compile time determination
of
 values (buffer size etc.)


 For most end user, 2nd is suitable.
Things are configurable in std.stdio. But most people will just use the default settings. The default settings should be optimized for SSDs, not spinning drives.
Apr 24 2015
parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Friday, 24 April 2015 at 19:35:08 UTC, Walter Bright wrote:
 Things are configurable in std.stdio. But most people will just 
 use the default settings. The default settings should be 
 optimized for SSDs, not spinning drives.
That would be unwise - as HDDs are much slower (and still much more common), optimizing for SSDs at the expense of HDD performance will cause overall performance to be much worse until HDDs become rare. I mean, assuming that such optimizations aren't just theoretical.
Apr 24 2015
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/24/2015 10:26 PM, Vladimir Panteleev wrote:
 On Friday, 24 April 2015 at 19:35:08 UTC, Walter Bright wrote:
 Things are configurable in std.stdio. But most people will just use the
 default settings. The default settings should be optimized for SSDs, not
 spinning drives.
That would be unwise - as HDDs are much slower (and still much more common), optimizing for SSDs at the expense of HDD performance will cause overall performance to be much worse until HDDs become rare. I mean, assuming that such optimizations aren't just theoretical.
Hard disks are dead today for anyone who cares about performance. I still use them, but only for secondary storage.
Apr 25 2015
parent reply "Xinok" <xinok live.com> writes:
On Saturday, 25 April 2015 at 20:12:55 UTC, Walter Bright wrote:
 Hard disks are dead today for anyone who cares about 
 performance.

 I still use them, but only for secondary storage.
For anybody who wants to buy 4TB of storage for $100, hard drives are still very much alive. Not to mention USB flash drives and SD cards which don't have the performance characteristics of SSDs. Let's not be so hasty. Until SSDs truly replace all other forms of storage, it's best that we don't optimize D and Phobos for one type of storage only.
Apr 25 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/25/2015 1:42 PM, Xinok wrote:
 On Saturday, 25 April 2015 at 20:12:55 UTC, Walter Bright wrote:
 Hard disks are dead today for anyone who cares about performance.

 I still use them, but only for secondary storage.
For anybody who wants to buy 4TB of storage for $100, hard drives are still very much alive.
I presume what sensible people wanting speed do is what I do - I have a 256Gb SSD for my primary drive, and a 4TB drive as secondary.
 Not to mention USB flash drives and SD cards which don't have the
 performance characteristics of SSDs.
They wouldn't behave like spinning disks do, either.
 Let's not be so hasty. Until SSDs truly replace all other forms of storage,
it's
 best that we don't optimize D and Phobos for one type of storage only.
Um, it's currently optimized for HDs. But those aren't what people who want fast IO use.
Apr 25 2015
prev sibling next sibling parent reply "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Friday, 24 April 2015 at 08:27:06 UTC, Walter Bright wrote:
 http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives/

 An interesting article. Anyone want to see if there are any 
 modifications we should make to std.stdio to work better with 
 SSDs? (Such as changing the buffer sizes.)
This article seems to target operating system authors more than application programmers, as OS caches will invalidate most application-side changes. The HN comments are also mostly dismissive of this article: https://news.ycombinator.com/item?id=9431571
Apr 24 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/24/2015 10:24 PM, Vladimir Panteleev wrote:
 On Friday, 24 April 2015 at 08:27:06 UTC, Walter Bright wrote:
 http://codecapsule.com/2014/02/12/coding-for-ssds-part-6-a-summary-what-every-programmer-should-know-about-solid-state-drives/


 An interesting article. Anyone want to see if there are any modifications we
 should make to std.stdio to work better with SSDs? (Such as changing the
 buffer sizes.)
This article seems to target operating system authors more than application programmers, as OS caches will invalidate most application-side changes. The HN comments are also mostly dismissive of this article: https://news.ycombinator.com/item?id=9431571
"The high-level optimizations are important: * Choose a good SSD * Read and Write in "page" multiples and "page" aligned * Use lots of parallel IOs (high queue depth) * Do not put unrelated data in the same "page" A page used to be 4KB, SSDs are now switching to 8KB and will switch to 16KB later on. Just pick a reasonable size around that (16KB if you can do it will last you a while). Don't sweat the page multiples too much, the SSDs will most likely have to handle 4KB pages for a long while due to databases and such so they will keep some optimization around that size anyhow, it will make it easier for them if you use a larger size. I wouldn't heed any of the advice on single-threading, the biggest performance boost comes from parallelism and writes are anyway buffered by the SSD (a good SSD has a super-cap to have a good sized write cache)."
Apr 25 2015
prev sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Fri, 24 Apr 2015 01:27:15 -0700, Walter Bright wrote:

 if there are any
 modifications we should make to std.stdio to work better with SSDs?
 (Such as changing the buffer sizes.)
yes: don't do anything. it's OS task to cope with that.=
Apr 25 2015
parent reply "Laeeth Isharc" <nospamlaeeth nospam.laeeth.com> writes:
On Saturday, 25 April 2015 at 11:34:22 UTC, ketmar wrote:
 On Fri, 24 Apr 2015 01:27:15 -0700, Walter Bright wrote:

 if there are any
 modifications we should make to std.stdio to work better with 
 SSDs?
 (Such as changing the buffer sizes.)
yes: don't do anything. it's OS task to cope with that.
well beyond the area I know, but it seems like given the relative structure of costs for random seeks for SSDs you often want to process files in parallel, whereas the opposite is true for spinning platters. The OS can't help you here. perhaps not for the standard library, but maybe it would be nice to have a function to detect whether a path is on an SSD or not. I am not sure if there is a standard way to detect this. There is a hacker way here: https://stackoverflow.com/questions/908188/is-there-any-way-of-detecting-if-a-drive-is-a-ssd and some others check the output of smartmontools. But surely, it would be a start to make it easy for the user to know so she can shape her approach accordingly.
Apr 25 2015
next sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Sat, 25 Apr 2015 14:19:30 +0000, Laeeth Isharc wrote:

 But surely, it would be a start to make it easy for the user to know so
 she can shape her approach accordingly.
i believe that this must be controlled with `version` or cli arg, and it=20 belongs to application logic, not standard library.=
Apr 25 2015
parent reply "Laeeth Isharc" <laeeth nospamlaeeth.com> writes:
On Saturday, 25 April 2015 at 16:10:11 UTC, ketmar wrote:
 On Sat, 25 Apr 2015 14:19:30 +0000, Laeeth Isharc wrote:

 But surely, it would be a start to make it easy for the user 
 to know so
 she can shape her approach accordingly.
i believe that this must be controlled with `version` or cli arg, and it belongs to application logic, not standard library.
I defer to your greater expertise. But I should have thought that if csv parsing belongs in a standard library (something that is easy for a user to write himself) then detecting whether a path is on an SSD might perhaps too. (Bearing in mind it's more of a system thing not so easy for every user to write himself in a platform independent way). Laeeth.
Apr 25 2015
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Sat, 25 Apr 2015 16:40:51 +0000, Laeeth Isharc wrote:

 On Saturday, 25 April 2015 at 16:10:11 UTC, ketmar wrote:
 On Sat, 25 Apr 2015 14:19:30 +0000, Laeeth Isharc wrote:

 But surely, it would be a start to make it easy for the user to know
 so she can shape her approach accordingly.
i believe that this must be controlled with `version` or cli arg, and it belongs to application logic, not standard library.
=20 =20 I defer to your greater expertise. =20 But I should have thought that if csv parsing belongs in a standard library (something that is easy for a user to write himself) then detecting whether a path is on an SSD might perhaps too. (Bearing in mind it's more of a system thing not so easy for every user to write himself in a platform independent way).
that wasn't me who put csv parser in. along with json and xml parsers,=20 which people happily replacing anyway. and you want even more crap in=20 standard library.=
Apr 25 2015
prev sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Sat, 25 Apr 2015 16:40:51 +0000, Laeeth Isharc wrote:

 On Saturday, 25 April 2015 at 16:10:11 UTC, ketmar wrote:
 On Sat, 25 Apr 2015 14:19:30 +0000, Laeeth Isharc wrote:

 But surely, it would be a start to make it easy for the user to know
 so she can shape her approach accordingly.
i believe that this must be controlled with `version` or cli arg, and it belongs to application logic, not standard library.
=20 =20 I defer to your greater expertise. =20 But I should have thought that if csv parsing belongs in a standard library (something that is easy for a user to write himself) then detecting whether a path is on an SSD might perhaps too. (Bearing in mind it's more of a system thing not so easy for every user to write himself in a platform independent way).
and now something more serious: trying to detect what storage propgram=20 using is completely unreliable. you can't optimise for all cases, and you=20 can't even detect all cases. big raid which can be faster than SSD with=20 "SSD pattern"? ah, ok, nobody cares, we detected it as HDD. virtual=20 drive, which can be anything at all? fuse mount point? i can think out=20 alot of that. that's why operational mode should be controlled by cli switch. if user=20 *really* cares about performance, he *will* know what HW he has and how=20 to make program fully utilize it. and in other cases let OS i/o scheduler=20 do it work without trying to needlessly "help" it.=
Apr 25 2015
prev sibling parent "Vladimir Panteleev" <vladimir thecybershadow.net> writes:
On Saturday, 25 April 2015 at 14:19:31 UTC, Laeeth Isharc wrote:
 On Saturday, 25 April 2015 at 11:34:22 UTC, ketmar wrote:
 On Fri, 24 Apr 2015 01:27:15 -0700, Walter Bright wrote:

 if there are any
 modifications we should make to std.stdio to work better with 
 SSDs?
 (Such as changing the buffer sizes.)
yes: don't do anything. it's OS task to cope with that.
well beyond the area I know, but it seems like given the relative structure of costs for random seeks for SSDs you often want to process files in parallel, whereas the opposite is true for spinning platters. The OS can't help you here.
Well, actually, it should. In theory, all you need to do is to queue as many reads/writes as you can - using threads, fibers, async I/O calls, etc. This is not the same as sequentially reading/writing random blocks. The OS I/O scheduler should reorder the operations so that the accessed blocks are in order and physically close to each other.
Apr 25 2015