digitalmars.D.learn - Output range with custom string type

Jacob Carlborg (36/36) Aug 28 2017 I'm working on some code that sanitizes and converts values of different...

Moritz Maxeiner (36/48) Aug 28 2017 If you want the caller to be just in charge of allocation, that's

Jacob Carlborg (8/48) Aug 29 2017 I guess that would work.

Moritz Maxeiner (75/81) Aug 29 2017 Certainly, that's what dynamic arrays (aka vectors, e.g.

Jacob Carlborg (4/94) Aug 29 2017 Thanks.
Jacob Carlborg (5/23) Aug 31 2017 What's the reason to use "moveEmplace" instead of just assigning to the

Moritz Maxeiner (8/29) Aug 31 2017 The `move` part is to support non-copyable types (i.e. T with

Cecil Ward (8/44) Aug 28 2017 Q is it an option to let the caller provide all the storage in an

Jacob Carlborg <doob me.com> writes:

I'm working on some code that sanitizes and converts values of different 
types to strings. I thought it would be a good idea to wrap the 
sanitized string in a struct to have some type safety. Ideally it should 
not be possible to create this type without going through the sanitizing 
functions.

The problem I have is that I would like these functions to push up the 
allocation decision to the caller. Internally these functions use 
formattedWrite. I thought the natural design would be that the sanitize 
functions take an output range and pass that to formattedWrite.

Here's a really simple example:

import std.stdio : writeln;

struct Range
{
     void put(char c)
     {
         writeln(c);
     }
}

void sanitize(OutputRange)(string value, OutputRange range)
{
     import std.format : formattedWrite;
     range.formattedWrite!"'%s'"(value);
}

void main()
{
     Range range;
     sanitize("foo", range);
}

The problem now is that the data is passed one char at the time to the 
range. Meaning that if the user implements a custom output range, the 
user is in full control of the data. It will now be very easy for the 
user to make a mistake or manipulate the data on purpose. Making the 
whole idea of the sanitized type pointless.

Any suggestions how to fix this or a better idea?

-- 
/Jacob Carlborg

Aug 28 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote:
 I'm working on some code that sanitizes and converts values of 
 different types to strings. I thought it would be a good idea 
 to wrap the sanitized string in a struct to have some type 
 safety. Ideally it should not be possible to create this type 
 without going through the sanitizing functions.

 The problem I have is that I would like these functions to push 
 up the allocation decision to the caller. Internally these 
 functions use formattedWrite. I thought the natural design 
 would be that the sanitize functions take an output range and 
 pass that to formattedWrite.

 [...]

 Any suggestions how to fix this or a better idea?

If you want the caller to be just in charge of allocation, that's 
what std.experimental.allocator provides. In this case, I would 
polish up the old "format once to get the length, allocate, 
format second time into allocated buffer" method used with 
snprintf for D:

--- test.d ---
import std.stdio;
import std.experimental.allocator;

struct CountingOutputRange
{
private:
	size_t _count;
public:
	size_t count() { return _count; }
	void put(char c) { _count++; }
}

char[] sanitize(string value, IAllocator alloc)
{
	import std.format : formattedWrite, sformat;

	CountingOutputRange r;
	(&r).formattedWrite!"'%s'"(value); // do not copy the range

	auto s = alloc.makeArray!char(r.count);
	scope (failure) alloc.dispose(s);

         // This should only throw if the user provided allocator 
returned less
         // memory than was requested
	return s.sformat!"'%s'"(value);
}

void main()
{
	auto s = sanitize("foo", theAllocator);
	scope (exit) theAllocator.dispose(s);
	writeln(s);
}
--------------

Aug 28 2017

Jacob Carlborg <doob me.com> writes:

On 2017-08-28 23:45, Moritz Maxeiner wrote:

 If you want the caller to be just in charge of allocation, that's what 
 std.experimental.allocator provides. In this case, I would polish up the 
 old "format once to get the length, allocate, format second time into 
 allocated buffer" method used with snprintf for D:

 --- test.d ---
 import std.stdio;
 import std.experimental.allocator;
 
 struct CountingOutputRange
 {
 private:
      size_t _count;
 public:
      size_t count() { return _count; }
      void put(char c) { _count++; }
 }
 
 char[] sanitize(string value, IAllocator alloc)
 {
      import std.format : formattedWrite, sformat;
 
      CountingOutputRange r;
      (&r).formattedWrite!"'%s'"(value); // do not copy the range
 
      auto s = alloc.makeArray!char(r.count);
      scope (failure) alloc.dispose(s);
 
          // This should only throw if the user provided allocator 
 returned less
          // memory than was requested
      return s.sformat!"'%s'"(value);
 }
 
 void main()
 {
      auto s = sanitize("foo", theAllocator);
      scope (exit) theAllocator.dispose(s);
      writeln(s);
 }
 --------------

I guess that would work.

But if I keep the range internal, can't I just do the allocation inside 
the range and only use "formattedWrite"? Instead of using both 
formattedWrite and sformat and go through the data twice. Then of course 
the final size is not known before allocating.

-- 
/Jacob Carlborg

Aug 29 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote:
 [...]

 But if I keep the range internal, can't I just do the 
 allocation inside the range and only use "formattedWrite"? 
 Instead of using both formattedWrite and sformat and go through 
 the data twice. Then of course the final size is not known 
 before allocating.

Certainly, that's what dynamic arrays (aka vectors, e.g. 
std::vector in C++ STL) are for:

---
import core.exception;

import std.stdio;
import std.experimental.allocator;
import std.algorithm;

struct PoorMansVector(T)
{
private:
	T[]        store;
	size_t     length;
	IAllocator alloc;
public:
	 disable this(this);
	this(IAllocator alloc)
	{
		this.alloc = alloc;
	}
	~this()
	{
		if (store)
		{
			alloc.dispose(store);
			store = null;
		}
	}
	void put(T t)
	{
		if (!store)
		{
			// Allocate only once for "small" vectors
			store = alloc.makeArray!T(8);
			if (!store) onOutOfMemoryError();
		}
		else if (length == store.length)
		{
			// Growth factor of 1.5
			auto expanded = alloc.expandArray!char(store, store.length / 
2);
			if (!expanded) onOutOfMemoryError();
		}
		assert (length < store.length);
		moveEmplace(t, store[length++]);
	}
	char[] release()
	{
		auto elements = store[0..length];
		store = null;
		return elements;
	}
}

char[] sanitize(string value, IAllocator alloc)
{
	import std.format : formattedWrite, sformat;

	auto r = PoorMansVector!char(alloc);
	(&r).formattedWrite!"'%s'"(value); // do not copy the range
	return r.release();
}

void main()
{
	auto s = sanitize("foo", theAllocator);
	scope (exit) theAllocator.dispose(s);
	writeln(s);
}
---

Do be aware that the above vector is named "poor man's vector" 
for a reason, that's a hasty write down from memory and is sure 
to contain bugs.
For better vector implementations you can use at collection 
libraries such as EMSI containers; my own attempt at a DbI vector 
container can be found here [1]

[1] 
https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f007417/src/ds/vector.d

Aug 29 2017

Jacob Carlborg <doob me.com> writes:

On 2017-08-29 19:35, Moritz Maxeiner wrote:
 On Tuesday, 29 August 2017 at 09:59:30 UTC, Jacob Carlborg wrote:
 [...]

 But if I keep the range internal, can't I just do the allocation 
 inside the range and only use "formattedWrite"? Instead of using both 
 formattedWrite and sformat and go through the data twice. Then of 
 course the final size is not known before allocating.

 
 Certainly, that's what dynamic arrays (aka vectors, e.g. std::vector in 
 C++ STL) are for:
 
 ---
 import core.exception;
 
 import std.stdio;
 import std.experimental.allocator;
 import std.algorithm;
 
 struct PoorMansVector(T)
 {
 private:
      T[]        store;
      size_t     length;
      IAllocator alloc;
 public:
       disable this(this);
      this(IAllocator alloc)
      {
          this.alloc = alloc;
      }
      ~this()
      {
          if (store)
          {
              alloc.dispose(store);
              store = null;
          }
      }
      void put(T t)
      {
          if (!store)
          {
              // Allocate only once for "small" vectors
              store = alloc.makeArray!T(8);
              if (!store) onOutOfMemoryError();
          }
          else if (length == store.length)
          {
              // Growth factor of 1.5
              auto expanded = alloc.expandArray!char(store,
store.length 
 / 2);
              if (!expanded) onOutOfMemoryError();
          }
          assert (length < store.length);
          moveEmplace(t, store[length++]);
      }
      char[] release()
      {
          auto elements = store[0..length];
          store = null;
          return elements;
      }
 }
 
 char[] sanitize(string value, IAllocator alloc)
 {
      import std.format : formattedWrite, sformat;
 
      auto r = PoorMansVector!char(alloc);
      (&r).formattedWrite!"'%s'"(value); // do not copy the range
      return r.release();
 }
 
 void main()
 {
      auto s = sanitize("foo", theAllocator);
      scope (exit) theAllocator.dispose(s);
      writeln(s);
 }
 ---
 
 Do be aware that the above vector is named "poor man's vector" for a 
 reason, that's a hasty write down from memory and is sure to contain bugs.
 For better vector implementations you can use at collection libraries 
 such as EMSI containers; my own attempt at a DbI vector container can be 
 found here [1]
 
 [1] 
 https://github.com/Calrama/libds/blob/6a1fc347e1f742b8f67513e25a9fdbf79f00
417/src/ds/vector.d 
 

Thanks.

-- 
/Jacob Carlborg

Aug 29 2017

Jacob Carlborg <doob me.com> writes:

On 2017-08-29 19:35, Moritz Maxeiner wrote:

      void put(T t)
      {
          if (!store)
          {
              // Allocate only once for "small" vectors
              store = alloc.makeArray!T(8);
              if (!store) onOutOfMemoryError();
          }
          else if (length == store.length)
          {
              // Growth factor of 1.5
              auto expanded = alloc.expandArray!char(store,
store.length 
 / 2);
              if (!expanded) onOutOfMemoryError();
          }
          assert (length < store.length);
          moveEmplace(t, store[length++]);
      }

What's the reason to use "moveEmplace" instead of just assigning to the 
array: "store[length++] = t" ?

-- 
/Jacob Carlborg

Aug 31 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Thursday, 31 August 2017 at 07:06:26 UTC, Jacob Carlborg wrote:
 On 2017-08-29 19:35, Moritz Maxeiner wrote:

      void put(T t)
      {
          if (!store)
          {
              // Allocate only once for "small" vectors
              store = alloc.makeArray!T(8);
              if (!store) onOutOfMemoryError();
          }
          else if (length == store.length)
          {
              // Growth factor of 1.5
              auto expanded = alloc.expandArray!char(store, 
 store.length / 2);
              if (!expanded) onOutOfMemoryError();
          }
          assert (length < store.length);
          moveEmplace(t, store[length++]);
      }

 What's the reason to use "moveEmplace" instead of just 
 assigning to the array: "store[length++] = t" ?

The `move` part is to support non-copyable types (i.e. T with 
` disable this(this)`), such as another owning container 
(assigning would generally try to create a copy).
The `emplace` part is because the destination `store[length]` has 
been default initialized either by makeArray or expandArray and 
it doesn't need to be destroyed (a pure move would destroy 
`store[length]` if T has a destructor).

Aug 31 2017

Cecil Ward <d cecilward.com> writes:

On Monday, 28 August 2017 at 14:27:19 UTC, Jacob Carlborg wrote:
 I'm working on some code that sanitizes and converts values of 
 different types to strings. I thought it would be a good idea 
 to wrap the sanitized string in a struct to have some type 
 safety. Ideally it should not be possible to create this type 
 without going through the sanitizing functions.

 The problem I have is that I would like these functions to push 
 up the allocation decision to the caller. Internally these 
 functions use formattedWrite. I thought the natural design 
 would be that the sanitize functions take an output range and 
 pass that to formattedWrite.

 Here's a really simple example:

 import std.stdio : writeln;

 struct Range
 {
     void put(char c)
     {
         writeln(c);
     }
 }

 void sanitize(OutputRange)(string value, OutputRange range)
 {
     import std.format : formattedWrite;
     range.formattedWrite!"'%s'"(value);
 }

 void main()
 {
     Range range;
     sanitize("foo", range);
 }

 The problem now is that the data is passed one char at the time 
 to the range. Meaning that if the user implements a custom 
 output range, the user is in full control of the data. It will 
 now be very easy for the user to make a mistake or manipulate 
 the data on purpose. Making the whole idea of the sanitized 
 type pointless.

 Any suggestions how to fix this or a better idea?

Q is it an option to let the caller provide all the storage in an 
oversized fixed-length buffer? You could add a second helper 
function to compute and return a suitable safely pessimistic ott 
max value for the length reqd which could be called once 
beforehand to establish the reqd buffer size (or check it). This 
is the technique I am using right now. My sizing function is 
ridiculously fast as I am lucky in the particular use-case.

Aug 28 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Output range with custom string type