digitalmars.D - [Tidbit] making your D code more modular & unittestable

H. S. Teoh via Digitalmars-d (104/104) Mar 08 2017 While writing a tool for dissecting various file formats, I found a

Moritz Maxeiner (4/12) Mar 08 2017 Interesting read, thank you. Would you consider writing a blog
XavierAP (4/7) Mar 08 2017 https://en.wikipedia.org/wiki/Dependency_inversion_principle

H. S. Teoh via Digitalmars-d (15/24) Mar 08 2017 Yes.

Adam D. Ruppe (2/5) Mar 08 2017 Sounds like what the kernel does when you memory-map a file...

H. S. Teoh via Digitalmars-d (8/14) Mar 08 2017 I know... I probably should've just used std.mmfile.

Nick Sabalausky (Abscissa) (2/5) Mar 09 2017 Do tell!

Nick Sabalausky (Abscissa) (3/10) Mar 09 2017 Wishlist for D3: Some brilliant form of sugar for declaring a function

Nick Treleaven (9/20) Mar 10 2017 auto parseFile()(auto input if isRandomAccessRangeOf!ubyte &&

bpr (4/14) Mar 10 2017 That looks like a useful DIP. What has to happen to move it to

Walter Bright (6/7) Mar 10 2017 Bingo.

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

While writing a tool for dissecting various file formats, I found a
useful coding pattern that helps your D code be cleaner, more modular,
and more easily unittestable.

Initially, I wrote a parser module that directly accessed std.stdio.File
to parse file contents.  Pretty standard approach, but it had the
disadvantage of needing to do all sorts of shenanigans around rawRead,
setting up buffers, managing buffers, etc.. It obscured an otherwise
straightforward parser structure and made the code hard to maintain and
hard to unittest.

So I came up with an idea to abstract file contents as a random-access
range of ubyte with lazy loading, so that I can rewrite file parsing
code in the nice, comfortable syntax of array slices without incurring
the cost of actually loading the entire file into a ubyte[] buffer.

So I started a new module for wrapping std.stdio.File with a
random-access range API. My initial attempt looked something like this:

	module fileslice;
	import std.stdio;
	struct FileSlice {
		File f;
		... // range API
	}

Seems straightforward enough, and helps tidy up the code by localizing
buffer management to a single place. But again, the dependency on
std.stdio.File made it cumbersome to write unittests. It's better than
before, because now I don't have to setup buffers and call rawRead
direectly, but I'd still have to create temporary files just to be able
to test various parts of the code. It's ugly that unittests would have
side-effects on the filesystem.  Plus, I couldn't test error handling
directly because I can't insert artificial I/O errors into
std.stdio.File (at least, not without truly revolting system-dependent
hacks).

My first idea was to change the above code to this:

	module fileslice;
	import std.stdio;
	struct FileSliceImpl(F) {
		F f;	// N.B.: concrete File type abstracted away
		... // range API
	}

	// Preserve compatibility with existing code.
	alias FileSlice = FileSliceImpl!File;

This allows me to substitute std.stdio.File with mock file objects
designed to test various aspects of FileSliceImpl, for example:

	unittest {
		struct MockFile {
			ubyte[] data = [1,2,3]; // prebaked data
			void seek(long offset) { ... }
			ubyte[] rawRead(ubyte[] buf) { ... }
			...
		}
		FileSliceImpl!MockFile s;
		// ... do various tests on s
	}

This allowed me to unittest FileSliceImpl without actually writing
anything to the filesystem, and prebaking various test data into the
code as array literals.

Coming back to the parser module, I now had something like this:

	module parser;
	import fileslice;
	...
	auto parseFile(FileSlice input) {
		... // use nice input[x .. y] syntax, yay!
		return result;
	}

This was nice, but the unittests would need to create FileSlice
instances of various sorts, possibly by manually instantiating
FileSliceImpl with various mock files to be able to use prebaked ubyte[]
data as test cases.  It didn't taste right: that's too much coupling
between the parser and fileslice modules. FileSliceImpl really shouldn't
be visible outside the fileslice module.

But then inspiration struck: the only thing parseFile() *really*
depended on was array-slicing syntax in the input!  The whole issue with
lazy loading of data only mattered when the input happens to come from a
File. But there's no reason it couldn't come from an array, where
there's no need for any buffering or lazy loading! So the dependency on
FileSlice is actually redundant. So I rewrote the above code as:

	module parser;
	import std.range.primitives;
	// N.B.: no more import of fileslice needed!!
	...
	auto parseFile(Slice)(Slice input)
		if (isRandomAccessRange!Slice && hasSlicing!Slice &&
		    is(ElementType!Slice : ubyte))
	{
		... // use nice input[x .. y] syntax, yay!
		return result;
	}

Voila! Now my unittests can simply pass ubyte[] data directly into
parseFile, and the parser module is now no longer coupled to the
fileslice module.  The main program would be what puts the two together.
And now parseFile is able to handle data from any source, as long as
slicing and random-access are available.  We don't even need to import
std.stdio in the parser module!!

And to top that off, now that everything has become templatized, the
compiler will do attribute inference for us for free, so we don't even
need to deal with ugly attribute soup in the code. Huzzah!

tl;dr: Whenever you have a data structure or a function that depends on
a concrete type like File that introduces a dependency between modules,
templatize it!  In fact, templatize your code whenever possible -- the
more the better.  D templates make your code cleaner, more modular, and
more unittestable. And you get attribute inference for free.  Learn to
love templates! :-P


T

-- 
I'm still trying to find a pun for "punishment"...

Mar 08 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Wednesday, 8 March 2017 at 21:34:19 UTC, H. S. Teoh wrote:
 [...]

 tl;dr: Whenever you have a data structure or a function that 
 depends on a concrete type like File that introduces a 
 dependency between modules, templatize it!  In fact, templatize 
 your code whenever possible -- the more the better.  D 
 templates make your code cleaner, more modular, and more 
 unittestable. And you get attribute inference for free.  Learn 
 to love templates! :-P

Interesting read, thank you. Would you consider writing a blog 
post about this (you know, for community growth enhancement 
purposes ;p )?

Mar 08 2017

XavierAP <n3minis-git yahoo.es> writes:

On Wednesday, 8 March 2017 at 21:34:19 UTC, H. S. Teoh wrote:
 While writing a tool for dissecting various file formats, I 
 found a useful coding pattern that helps your D code be 
 cleaner, more modular, and more easily unittestable.

https://en.wikipedia.org/wiki/Dependency_inversion_principle

"Abstractions should not depend on details. Details should depend 
on abstractions."

Mar 08 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Wed, Mar 08, 2017 at 10:03:28PM +0000, XavierAP via Digitalmars-d wrote:
 On Wednesday, 8 March 2017 at 21:34:19 UTC, H. S. Teoh wrote:
 While writing a tool for dissecting various file formats, I found a
 useful coding pattern that helps your D code be cleaner, more
 modular, and more easily unittestable.

 
 https://en.wikipedia.org/wiki/Dependency_inversion_principle
 
 "Abstractions should not depend on details. Details should depend on
 abstractions."

Yes.

There's also the pattern of unification + generalization.  I.e., I
started with having code that depends on the API details of
std.stdio.File, then moved to a RA range API (that is, unification with
array slicing syntax), which allowed the generalization of the code to
apply to things that don't have a File interface.

Seen another way, initially multiple modules depended on the details of
the File API, but after unification with the RA range API and
refactoring, only 1 module needs to deal with the specifics of the File
API; everyone else uses general array slicing syntax which has a larger
scope of applicability.


T

-- 
Today's society is one of specialization: as you grow, you learn more and more
about less and less. Eventually, you know everything about nothing.

Mar 08 2017

Adam D. Ruppe <destructionator gmail.com> writes:

On Wednesday, 8 March 2017 at 21:34:19 UTC, H. S. Teoh wrote:
 So I came up with an idea to abstract file contents as a 
 random-access range of ubyte with lazy loading, so that I can 
 rewrite file parsing code in the nice

Sounds like what the kernel does when you memory-map a file...

Mar 08 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Mar 09, 2017 at 12:21:48AM +0000, Adam D. Ruppe via Digitalmars-d wrote:
 On Wednesday, 8 March 2017 at 21:34:19 UTC, H. S. Teoh wrote:
 So I came up with an idea to abstract file contents as a
 random-access range of ubyte with lazy loading, so that I can
 rewrite file parsing code in the nice

 
 Sounds like what the kernel does when you memory-map a file...

I know... I probably should've just used std.mmfile.

But it was fun to write my own RA range with slicing.  I did learn a
thing or two about what std.range considers to satisfy hasSlicing... it
wasn't quite as obvious as I had thought.


T

-- 
"I suspect the best way to deal with procrastination is to put off the
procrastination itself until later. I've been meaning to try this, but haven't
gotten around to it yet. " -- swr

Mar 08 2017

"Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:

On 03/08/2017 07:31 PM, H. S. Teoh via Digitalmars-d wrote:
 I did learn a
 thing or two about what std.range considers to satisfy hasSlicing... it
 wasn't quite as obvious as I had thought.

Do tell!

Mar 09 2017

"Nick Sabalausky (Abscissa)" <SeeWebsiteToContactMe semitwist.com> writes:

On 03/08/2017 04:34 PM, H. S. Teoh via Digitalmars-d wrote:
 	auto parseFile(Slice)(Slice input)
 		if (isRandomAccessRange!Slice && hasSlicing!Slice &&
 		    is(ElementType!Slice : ubyte))
 	{
 		... // use nice input[x .. y] syntax, yay!
 		return result;
 	}

Wishlist for D3: Some brilliant form of sugar for declaring a function 
that takes a range.

Mar 09 2017

Nick Treleaven <nick geany.org> writes:

On Thursday, 9 March 2017 at 20:54:23 UTC, Nick Sabalausky 
(Abscissa) wrote:
 On 03/08/2017 04:34 PM, H. S. Teoh via Digitalmars-d wrote:
 	auto parseFile(Slice)(Slice input)
 		if (isRandomAccessRange!Slice && hasSlicing!Slice &&
 		    is(ElementType!Slice : ubyte))
 	{
 		... // use nice input[x .. y] syntax, yay!
 		return result;
 	}

 Wishlist for D3: Some brilliant form of sugar for declaring a 
 function that takes a range.

auto parseFile()(auto input if isRandomAccessRangeOf!ubyte && 
hasSlicing) {

My spin on an inline parameter constraint idea by Kenji (his 
doesn't use auto and also has more concept-like sugar):

https://wiki.dlang.org/User:9rnsr/DIP:_Template_Parameter_Constraint

As mentioned in the link, inline constraints can help make more 
specific error messages when constraints fail.

Mar 10 2017

bpr <brogoff gmail.com> writes:

On Friday, 10 March 2017 at 14:58:09 UTC, Nick Treleaven wrote:
 On Thursday, 9 March 2017 at 20:54:23 UTC, Nick Sabalausky
 Wishlist for D3: Some brilliant form of sugar for declaring a 
 function that takes a range.

 auto parseFile()(auto input if isRandomAccessRangeOf!ubyte && 
 hasSlicing) {

 My spin on an inline parameter constraint idea by Kenji (his 
 doesn't use auto and also has more concept-like sugar):

 https://wiki.dlang.org/User:9rnsr/DIP:_Template_Parameter_Constraint

 As mentioned in the link, inline constraints can help make more 
 specific error messages when constraints fail.

That looks like a useful DIP. What has to happen to move it to 
the DIP repository 
https://github.com/dlang/DIPs/blob/master/GUIDELINES.md ?

Mar 10 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 3/8/2017 1:34 PM, H. S. Teoh via Digitalmars-d wrote:
 [...]

Bingo.

If your algorithmic code includes code to read / write files, that's a big sign 
things are not properly encapsulated in the code.

Using ranges is a great way to accomplish this, and as you discovered, a bonus 
is that the range can be replaced with an array for easy unit testing.

Mar 10 2017

D Programming

C/C++ Programming

Other

digitalmars.D - [Tidbit] making your D code more modular & unittestable