www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Databases and the D Standard Library

reply Adam Wilson <flyboynw gmail.com> writes:
Hi Everyone,

I've seen a lot of talk on the forums over the past year about the need 
for database support in the D Standard Library and I completely agree. 
At the end of the day the purpose of any programming language and its 
attendant libraries is to allow the developer to solve their problems 
quickly and efficiently; and a large subset of those solutions require 
some form of structured data store. To my mind, this makes some form of 
interface(s) to a data-store an essential component of the D Standard 
Library. And since this is something that my particular problem spaces 
also need, I thought it would be useful to attempt to do something about it.

First, I've seen a couple of promising projects, the most complete, and 
recent of which, dstddb (Github: https://github.com/cruisercoder/dstddb) 
hasn't seen a commit since June. An additional setback came when I tried 
to use it and was greeted with a litany of compiler errors.

This is *not* a problem, it's the natural course of a volunteer 
community such as ours; and I want to thank Erik Smith profusely for his 
work. Priorities and circumstances change and that means that valuable 
projects are inexplicably dropped.

But we still lack a critical component, and to get the conversation 
started, I'd like to break down the issues I've seen brought up in past 
threads on this subject and encourage you to bring your own. I may have 
ideas, but I can't possibly know the entire problem space.

1. Isn't this an enormous amount of work?

My answer: Absolutely, depending on your preferred scope of work.

In general, I've seen two distinct camps on this issue. One says that we 
should implement everything in D from the ground up, including 
re-implementation of the database drivers themselves in D. If this is 
your preferred scope of work then you will invariably become 
disheartened at the truly stupendous amount of work you face and give up.

The other camp says that we should make use of existing drivers and 
include them in the D Standard Library. This is difficult path to follow 
as the vanilla build of the D Standard Library now requires a 
significant number of foreign libraries, all with differing licenses, be 
built and distributed to everyone; regardless of whether or not they use 
them in their project. This is more-or-less than path the dstddb is/was on.

My idea: Focus on defining the interface, not the individual driver 
implementations.

If instead we focused on defining an interface that a "conforming 
implementation" had to follow, we would allow developers to only pull in 
the library they need or build a from-scratch library if they so desire.

Indeed this is the model that both Java (JDO) and .NET (ADO.NET) follow 
and I think we would be well advised to follow their lead here. Not only 
is the methodology battle-proven, it is also well understood by a 
significant portion of D's potential user-base. By way of example, 
Npgsql is the ADO.NET implementation of a driver for PostgreSQL.

2. There are so many different types of data storage systems, how do you 
design a system generic enough for all of them?

My answer: You don't. Nobody else has bothered trying, and I believe 
that our worry over that question is a large part of why we don't have 
anything substantive today.

My idea: Split the data storage systems out by category of data-store.
For example:
	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
	- Document: std.database.document (Mongo, CouchDB, etc.)
	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)

If you want something that doesn't fit into a category above, you're own 
your own, but you were also on your own in other languages.

3. We need to provide a single interface for all data-stores in the 
SQL/Document/Key-Value category.

My answer: Are you sure? The problem is that each underlying data-store 
has it's own dialect. For example, PostgreSQL and MSSQL are both 
ostensibly ANSI-SQL, except where they aren't. Re-targeting data-stores, 
even in the same category, is never going to be as simple as changing a 
connection string. And additionally, you will have to implement a 
super-set of features in the interface to support all the variations and 
throw exceptions where the chosen implementation does not support a 
specific feature.

My idea: Each data store has it's own implementation with it's own 
naming convention. For example (ADO.NET):
	- SqlConnection (MSSQL)
	- NpgsqlConnection (Npgsql)

Yes, this means that you have to change names in your code if you switch 
data-stores, but since you are already changing your queries, which is a 
much more difficult change, this isn't a significant additional cost. 
Also, the code becomes clearer to those who take over maintenance duties 
from the original author, especially when you are mixing data-stores. 
But in all honest, most developers will pick on technology and stick 
with it for the entirety of the software's lifespan.

4. We should hide querying from the developer because they are bad at 
it, security flaws, etc.

My answer: While agree in principal, especially with the security 
concerns, in reality what you are asking for is an ORM. In my opinion, 
that is a separate concern from a database interface, and is typically 
implemented as layer over the DB interface.

My idea: Don't do it. Save it for a different project.

5. D has so many useful features for data access, we should use as many 
as possible!

My answer: D absolutely has many useful features for data access and 
manipulation. But that doesn't mean that a good interface has to use any 
of them. The first job of a Database Interface, and indeed any library, 
is to get the job done with a minimum of overhead. Let's worry about 
that before going crazy adding in all the D goodness. Ranges have been a 
particular target for abuse here, and while I love ranges, I think the 
mechanics of data-store manipulation don't lend themselves well to 
working with ranges. I'd love to hear your ideas on this though.

My idea: Focus on a more conservative implementation in the style of JDO 
or ADO.NET. This will allow us to ship something that works in a 
reasonable time frame. I'm not saying that we can't use any of D's 
unique talents, but using those talents should be subordinate to 
designing an interface that works efficiently.

That is all I have for now. I am looking forward to hear your thoughts 
on this topic! Until then, I am going to go close out 2016 (PST) with 
family and friends and I wish you all a Happy New Year!

-- 
Adam Wilson
IRC: LightBender
import quiet.dlang.dev;
Dec 31
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
We do indeed need a good database abstraction.
But a core requirement for any implementation has yet to be met.

There has to be a standard way for asynchronous sockets. To implement 
this we need to take into consideration the event loop that it uses and 
more importantly allow it to be integrated for e.g. windowing.

So here is a dependency before we get a database abstraction into 
Phobos, a nice fast event loop manager that is generic.
Dec 31
parent reply Adam Wilson <flyboynw gmail.com> writes:
On 12/31/16 7:31 PM, rikki cattermole wrote:
 We do indeed need a good database abstraction.
 But a core requirement for any implementation has yet to be met.

 There has to be a standard way for asynchronous sockets. To implement
 this we need to take into consideration the event loop that it uses and
 more importantly allow it to be integrated for e.g. windowing.

 So here is a dependency before we get a database abstraction into
 Phobos, a nice fast event loop manager that is generic.
Or, alternatively, use existing drivers that have this capability built in? -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Dec 31
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 01/01/2017 4:46 PM, Adam Wilson wrote:
 On 12/31/16 7:31 PM, rikki cattermole wrote:
 We do indeed need a good database abstraction.
 But a core requirement for any implementation has yet to be met.

 There has to be a standard way for asynchronous sockets. To implement
 this we need to take into consideration the event loop that it uses and
 more importantly allow it to be integrated for e.g. windowing.

 So here is a dependency before we get a database abstraction into
 Phobos, a nice fast event loop manager that is generic.
Or, alternatively, use existing drivers that have this capability built in?
Which is fine if all you use is c's sockets or only that database connection for a thread. Which is not very realistic for game or web development.
Dec 31
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that 
 database connection for a thread.
The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
Dec 31
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.
The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).
Dec 31
parent reply Adam Wilson <flyboynw gmail.com> writes:
rikki cattermole wrote:
 On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.
The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).
Vibe.d is working on a native D event loop. We would probably want to integrate with that. EventCore: https://code.dlang.org/packages/eventcore -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 02/01/2017 3:03 PM, Adam Wilson wrote:
 rikki cattermole wrote:
 On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.
The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).
Vibe.d is working on a native D event loop. We would probably want to integrate with that. EventCore: https://code.dlang.org/packages/eventcore
No, it isn't generic enough. Nor can it handle windowing without a good bit of modifications. Mine in SPEW[0] is however ready for this task. [0] https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop
Jan 01
parent reply Adam Wilson <flyboynw gmail.com> writes:
rikki cattermole wrote:
 On 02/01/2017 3:03 PM, Adam Wilson wrote:
 rikki cattermole wrote:
 On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.
The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).
Vibe.d is working on a native D event loop. We would probably want to integrate with that. EventCore: https://code.dlang.org/packages/eventcore
No, it isn't generic enough. Nor can it handle windowing without a good bit of modifications. Mine in SPEW[0] is however ready for this task. [0] https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop
How much effort to make the changes? Their plans seem to indicate that they want to support UI integration. I ask because I need to use the library that is going to get the most support over time and that is vibe.d right now. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 02/01/2017 4:44 PM, Adam Wilson wrote:
 rikki cattermole wrote:
 On 02/01/2017 3:03 PM, Adam Wilson wrote:
 rikki cattermole wrote:
 On 01/01/2017 5:19 PM, Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:51:52 UTC, rikki cattermole wrote:
 Which is fine if all you use is c's sockets or only that database
 connection for a thread.
The C drivers typically offer handles of some sort (Windows HANDLE, *nix file descriptor, that kind of thing) that you can integrate into other event loops.
That's fine and all, but you've still got to deal with it on D's side so you can mix and match libraries that require access to the same event loop (such as Windows).
Vibe.d is working on a native D event loop. We would probably want to integrate with that. EventCore: https://code.dlang.org/packages/eventcore
No, it isn't generic enough. Nor can it handle windowing without a good bit of modifications. Mine in SPEW[0] is however ready for this task. [0] https://github.com/Devisualization/spew/tree/master/src/base/cf/spew/event_loop
How much effort to make the changes? Their plans seem to indicate that they want to support UI integration. I ask because I need to use the library that is going to get the most support over time and that is vibe.d right now.
Read and compare the code. To add anything into the Vibe.d one requires direct modification which is unacceptable for Phobos IMO. My aim for SPEW was to be as close as glib's[0] as reasonably possible. A well tested set of features. I'm sorry but this is just not acceptable[1] for an event loop: interface EventDriver { safe: /* nogc:*/ nothrow: property EventDriverCore core(); property EventDriverTimers timers(); property EventDriverEvents events(); property EventDriverSignals signals(); property EventDriverSockets sockets(); property EventDriverDNS dns(); property EventDriverFiles files(); property EventDriverWatchers watchers(); /// Releases all resources associated with the driver void dispose(); } Its fine for a web framework, aka a specific task but not when its generic. When we're discussing a generic event loop it shouldn't care about the different usage of it. All it knows is that there are events that come from sources and then mapped to a consumer. It isn't the most performant but that is ok. You would use a different implementation depending on your use case e.g. 1:1 is easy enough to do without a event loop manager. Just so you're aware, windowing is extremely hard to get right. Sockets, DNS, signals and timers ext. are easy to implement compared. [0] https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html [1] https://github.com/vibe-d/eventcore/blob/master/source/eventcore/driver.d
Jan 01
prev sibling next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 interface(s) to a data-store an essential component of the D 
 Standard Library.
Eh, I count it as would-be-nice just because it isn't that hard to just use the C ones, or another third party lib; it doesn't have to be Phobos itself. That said though, a basic db interface is quite simple and would be a nice batteries included bit - it is one of the things IMO that PHP did quite successfully (even if its interface sucked, it still just worked)
Dec 31
parent reply Adam Wilson <flyboynw gmail.com> writes:
Adam D. Ruppe wrote:
 On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 interface(s) to a data-store an essential component of the D Standard
 Library.
Eh, I count it as would-be-nice just because it isn't that hard to just use the C ones, or another third party lib; it doesn't have to be Phobos itself.
On that I beg to differ. The C libraries are not safe, they have wildly different API's, and they have high-complexity, which is a large risk-factor for bugs and/or security flaws. Any place where we can
 That said though, a basic db interface is quite simple and would be a
 nice batteries included bit - it is one of the things IMO that PHP did
 quite successfully (even if its interface sucked, it still just worked)
I'll agree that it isn't hard, and I think including it would help boost D's usability in the web service space. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01
parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 01 Jan 2017 17:55:01 -0800, Adam Wilson wrote:
 On that I beg to differ. The C libraries are not  safe, they have wildly
 different API's, and they have high-complexity, which is a large
 risk-factor for bugs and/or security flaws.
If we have the database interface defined, there's no reason we couldn't provide, for instance, a postgres-c-wrapper driver and a postgres-pure-d driver.
Jan 02
parent Adam Wilson <flyboynw gmail.com> writes:
On 1/2/17 8:33 AM, Chris Wright wrote:
 On Sun, 01 Jan 2017 17:55:01 -0800, Adam Wilson wrote:
 On that I beg to differ. The C libraries are not  safe, they have wildly
 different API's, and they have high-complexity, which is a large
 risk-factor for bugs and/or security flaws.
If we have the database interface defined, there's no reason we couldn't provide, for instance, a postgres-c-wrapper driver and a postgres-pure-d driver.
Precisely, I would love to enable this! -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 02
prev sibling next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
 My idea: Split the data storage systems out by category of data-store.
 For example:
 	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
This is doable; SQL is an ANSI and ISO standard, and it strongly constrains what you can do with your data.
 	- Document: std.database.document (Mongo, CouchDB, etc.)
 	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)
I'm not so certain about this. CouchDB has a rather different approach to things than MongoDB -- possibly not as divergent as Mongo from MySQL, but far more than Postgres from MySQL. Likewise, there are many key/value stores in existence, and they support many different operations. For instance, it looks like etcd2 has a notion of directories, where you can list items in a directory. Redis just lets you list keys with a given prefix. Redis lets you modify values in-place; etcd2 doesn't. We could define a common subset of operations for document databases and key/value stores, but most people probably wouldn't be satisfied with it. There's also a question of where you'd put Cassandra in that, since it's decidedly not a SQL database but tries to pretend it is.
 4. We should hide querying from the developer because they are bad at
 it, security flaws, etc.
 
 My answer: While agree in principal, especially with the security
 concerns, in reality what you are asking for is an ORM. In my opinion,
 that is a separate concern from a database interface, and is typically
 implemented as layer over the DB interface.
We can encourage people to use prepared queries with documentation and naming.
Dec 31
parent Adam Wilson <flyboynw gmail.com> writes:
Chris Wright wrote:
 On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
 My idea: Split the data storage systems out by category of data-store.
 For example:
 	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
This is doable; SQL is an ANSI and ISO standard, and it strongly constrains what you can do with your data.
 	- Document: std.database.document (Mongo, CouchDB, etc.)
 	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)
I'm not so certain about this. CouchDB has a rather different approach to things than MongoDB -- possibly not as divergent as Mongo from MySQL, but far more than Postgres from MySQL. Likewise, there are many key/value stores in existence, and they support many different operations. For instance, it looks like etcd2 has a notion of directories, where you can list items in a directory. Redis just lets you list keys with a given prefix. Redis lets you modify values in-place; etcd2 doesn't. We could define a common subset of operations for document databases and key/value stores, but most people probably wouldn't be satisfied with it. There's also a question of where you'd put Cassandra in that, since it's decidedly not a SQL database but tries to pretend it is.
Given that the Cassandra folks wrote an ADO.NET provider for it, I would suggest that it is easiest to treat it as a SQL database from an interface standpoint.
 4. We should hide querying from the developer because they are bad at
 it, security flaws, etc.

 My answer: While agree in principal, especially with the security
 concerns, in reality what you are asking for is an ORM. In my opinion,
 that is a separate concern from a database interface, and is typically
 implemented as layer over the DB interface.
We can encourage people to use prepared queries with documentation and naming.
Precisely, my focus would be an making the API as easy as possible to use with Parameterized Queries, and if that makes it harder to write non-parameterized queries, oh well. :D -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2017-01-01 04:24, Adam Wilson wrote:

 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
      - SqlConnection (MSSQL)
      - NpgsqlConnection (Npgsql)

 Yes, this means that you have to change names in your code if you switch
 data-stores, but since you are already changing your queries, which is a
 much more difficult change, this isn't a significant additional cost.
I don't think we should try to make implementations different just because. If you have an SQL builder or an ORM on top of the interface that abstract the differences in the SQL syntax, it's possible to switch driver, within reason. -- /Jacob Carlborg
Jan 01
parent reply Chris Wright <dhasenan gmail.com> writes:
On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:

 On 2017-01-01 04:24, Adam Wilson wrote:
 
 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
      - SqlConnection (MSSQL)
      - NpgsqlConnection (Npgsql)

 Yes, this means that you have to change names in your code if you
 switch data-stores, but since you are already changing your queries,
 which is a much more difficult change, this isn't a significant
 additional cost.
I don't think we should try to make implementations different just because. If you have an SQL builder or an ORM on top of the interface that abstract the differences in the SQL syntax, it's possible to switch driver, within reason.
Those both limit your ability to use the underlying database to its full potential. They offer a chance for queries that seem simple and efficient to become horribly inefficient. I ran across a problem in NHibernate about a decade ago. We had a straightforward HQL query involving joins. It took over a minute to run. We wrote the simple equivalent in SQL and it completed in milliseconds. Fortunately, NHibernate had the ability to run raw SQL queries.
Jan 01
next sibling parent Adam Wilson <flyboynw gmail.com> writes:
Chris Wright wrote:
 On Sun, 01 Jan 2017 10:29:28 +0100, Jacob Carlborg wrote:

 On 2017-01-01 04:24, Adam Wilson wrote:

 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
       - SqlConnection (MSSQL)
       - NpgsqlConnection (Npgsql)

 Yes, this means that you have to change names in your code if you
 switch data-stores, but since you are already changing your queries,
 which is a much more difficult change, this isn't a significant
 additional cost.
I don't think we should try to make implementations different just because. If you have an SQL builder or an ORM on top of the interface that abstract the differences in the SQL syntax, it's possible to switch driver, within reason.
Those both limit your ability to use the underlying database to its full potential. They offer a chance for queries that seem simple and efficient to become horribly inefficient.
I cannot state my agreement with this paragraph enough. Every ORM I've worked with generates some inexplicably horrific SQL in seemingly simple situations.
 I ran across a problem in NHibernate about a decade ago. We had a
 straightforward HQL query involving joins. It took over a minute to run.
 We wrote the simple equivalent in SQL and it completed in milliseconds.
 Fortunately, NHibernate had the ability to run raw SQL queries.
I've also seen Entity Framework 6/7 do the same thing. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2017-01-01 17:50, Chris Wright wrote:

 Those both limit your ability to use the underlying database to its full
 potential. They offer a chance for queries that seem simple and efficient
 to become horribly inefficient.
I'm perfectly aware of the limitations and capabilities of ORM's. I'm just saying that making the interface/names different just to make it different is not a good idea. It should be up to the user to choose if an ORM is used or not and this interface should try to, as much as possible, to make it possible to use an ORM just as well as not using an ORM. This whole idea seems fail even before it's barely stared. If this idea is going to work then all the layers need to be designed correctly and the lower layers should not know anything about the higher layers. -- /Jacob Carlborg
Jan 02
parent Adam Wilson <flyboynw gmail.com> writes:
On 1/2/17 12:05 AM, Jacob Carlborg wrote:
 On 2017-01-01 17:50, Chris Wright wrote:

 Those both limit your ability to use the underlying database to its full
 potential. They offer a chance for queries that seem simple and efficient
 to become horribly inefficient.
I'm perfectly aware of the limitations and capabilities of ORM's. I'm just saying that making the interface/names different just to make it different is not a good idea. It should be up to the user to choose if an ORM is used or not and this interface should try to, as much as possible, to make it possible to use an ORM just as well as not using an ORM.
Is there a assumption here that there are no classes? Because and ORM could quite easily work with base classes, and indeed both NHibernate and EntityFramework function exactly this way.
 This whole idea seems fail even before it's barely stared. If this idea
 is going to work then all the layers need to be designed correctly and
 the lower layers should not know anything about the higher layers.
I absolutely agree, which, ironically, is why I am having this conversation. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 02
prev sibling next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
 	- SqlConnection (MSSQL)
 	- NpgsqlConnection (Npgsql)
 
 Yes, this means that you have to change names in your code if you switch
 data-stores
You *can* use classes and interfaces and type hierarchies. They do use the GC by default, unlike structs, but they're kind of handy, especially here. Then you have a SqlConnection interface that most people use all the time and all people use most of the time. If you explicitly need some connection properties that are specific to Postgres, you cast to a PostgresConnection.
Jan 01
parent reply Adam Wilson <flyboynw gmail.com> writes:
Chris Wright wrote:
 On Sat, 31 Dec 2016 19:24:31 -0800, Adam Wilson wrote:
 My idea: Each data store has it's own implementation with it's own
 naming convention. For example (ADO.NET):
 	- SqlConnection (MSSQL)
 	- NpgsqlConnection (Npgsql)

 Yes, this means that you have to change names in your code if you switch
 data-stores
You *can* use classes and interfaces and type hierarchies. They do use the GC by default, unlike structs, but they're kind of handy, especially here.
That was my intention, the knee-jerk reaction that class and interfaces get here sometimes strikes me as a bit histrionic sometimes. They are a tool with a use case. :)
 Then you have a SqlConnection interface that most people use all the time
 and all people use most of the time. If you explicitly need some
 connection properties that are specific to Postgres, you cast to a
 PostgresConnection.
That is pretty much how it works in ADO.NET and JDO. And I think it works well. -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 01
parent reply Jacob Carlborg <doob me.com> writes:
On 2017-01-02 02:34, Adam Wilson wrote:

 That was my intention, the knee-jerk reaction that class and interfaces
 get here sometimes strikes me as a bit histrionic sometimes. They are a
 tool with a use case. :)
I think that the design should try to avoid classes as much as possible for things that will be frequently created. It's always possible to wrap a struct in a class, the other way around is a bit more difficult. But when it comes to the connection object I think it's fine to use classes since it will most likely only be created once per thread. -- /Jacob Carlborg
Jan 02
parent reply Adam Wilson <flyboynw gmail.com> writes:
On 1/2/17 12:09 AM, Jacob Carlborg wrote:
 On 2017-01-02 02:34, Adam Wilson wrote:

 That was my intention, the knee-jerk reaction that class and interfaces
 get here sometimes strikes me as a bit histrionic sometimes. They are a
 tool with a use case. :)
I think that the design should try to avoid classes as much as possible for things that will be frequently created. It's always possible to wrap a struct in a class, the other way around is a bit more difficult. But when it comes to the connection object I think it's fine to use classes since it will most likely only be created once per thread.
Ok. How would you design a database API for D? The requirements I am operating under are: 1. Individual data-store driver implementations are not included in the D Standard Library. Driver licensing and implementation details vary. For example libpq5 uses it's own mix of licenses that is not Boost compatible. 2. The D Standard Library provides a common API and implementation of shared components, but leaves the data-store specific implementation up to the implementer. We don't care how the implementation is constructed or licensed, only that the API is followed. 3. In order to support higher level abstractions like ORM's we need a base class model that can be extended by implementers but still consumed by the ORM without knowing implementation specific details. As far as I am aware, the only way to meet those requirements is to use a base-class model. Is there something I am missing? -- Adam Wilson IRC: LightBender import quiet.dlang.dev;
Jan 02
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2017-01-03 06:25, Adam Wilson wrote:

 Ok. How would you design a database API for D?
I don't know. I think it's difficult to design something upfront without trying out different API's to see what's possible to implement in code. Structs and functions, with or without templates. Could something like this work: module db_interface; version (Postgres) public import pg.db_interface; else version (MySQL) public import mysql.db_interface; static assert(isInterfaceImplemented, "The DB interface is not implemented"); -- /Jacob Carlborg
Jan 02
parent reply Chris Wright <dhasenan gmail.com> writes:
On Tue, 03 Jan 2017 08:25:55 +0100, Jacob Carlborg wrote:

 Structs and functions, with or without templates. Could something like
 this work:
 
 module db_interface;
 
 version (Postgres)
      public import pg.db_interface;
 else version (MySQL)
      public import mysql.db_interface;
You are unable to interact with two different databases in the same executable using the same library. For instance, if you're using hibernated, either you compiled it to connect to mysql, or you compiled it to connect to oracle. This means you can't, for instance, use mysql for the CI server (because it's open source and doesn't have licensing fees), then use oracle for production (because it's faster for your workflow), because then you're testing with a different binary. You can't have some data in postgres and some in SQL Server because you're in the middle of a migration. You can still use both if you are using the database interface directly. But if you're connecting via a library, you're SOL. You have to recompile everything whenever you switch databases. That's a barrier to proprietary libraries that interact with databases. They're not impossible, but they have to release separate binaries for every database the maintainer thinks you might want to connect to. Every library that lets you access a database must maintain a list of db drivers that it supports. If you have a new or private driver you want to use, you need to modify any library you use that talks to a database. In exchange, you get...slightly less GC usage. It's not *no* GC usage -- you'll see a bunch of buffers allocated to hold incoming and outgoing messages. You'll just peel back one layer of it. You'd be much better off asking that we encourage the use of std.experimental.allocator in the driver interface.
Jan 03
parent reply Jacob Carlborg <doob me.com> writes:
On 2017-01-03 09:38, Chris Wright wrote:

 You are unable to interact with two different databases in the same
 executable using the same library. For instance, if you're using
 hibernated, either you compiled it to connect to mysql, or you compiled
 it to connect to oracle.
That's true. And that's why I said it's difficult to design an API without trying it in code :)
 In exchange, you get...slightly less GC usage. It's not *no* GC usage --
 you'll see a bunch of buffers allocated to hold incoming and outgoing
 messages. You'll just peel back one layer of it.
1. I hope there won't be that many buffers in the API, at least not in the user facing API 2. Buffers say nothing how they're allocated. With classes on the other hand, you're basically forced to allocate with the GC
 You'd be much better off asking that we encourage the use of
 std.experimental.allocator in the driver interface.
Then I'll ask for that as well :) -- /Jacob Carlborg
Jan 03
parent reply Chris Wright <dhasenan gmail.com> writes:
On Tue, 03 Jan 2017 13:23:55 +0100, Jacob Carlborg wrote:

 On 2017-01-03 09:38, Chris Wright wrote:
 
 You are unable to interact with two different databases in the same
 executable using the same library. For instance, if you're using
 hibernated, either you compiled it to connect to mysql, or you compiled
 it to connect to oracle.
That's true. And that's why I said it's difficult to design an API without trying it in code :)
I didn't try it in code.
 In exchange, you get...slightly less GC usage. It's not *no* GC usage
 --
 you'll see a bunch of buffers allocated to hold incoming and outgoing
 messages. You'll just peel back one layer of it.
1. I hope there won't be that many buffers in the API, at least not in the user facing API
The returned row data is mandatory, and its size can be much larger than the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack limit.) I suppose you could have a streaming API for row data, one that has a stack-allocated buffer and returns slices of that: string fieldName; ubyte[] data; ubyte[][string] fields; db.query("SELECT * FROM USERS") // have to revisit this if a db allows large names .onFieldStart((fieldName) => field = fieldName) .onFieldData((fragment) => data ~= fragment) .onFieldEnd(() { fields[field] = data; data = null; }) .onRowEnd(() => process(fields)) .onResultsEnd!(() => writeln("done")) .exec(); This looks pretty terrible, to be honest. I get this sort of thing from nodejs because it doesn't want to potentially block and also doesn't want to delay letting me process things, but the worst I get there is usually two callbacks. This would also result in more GC use for the majority of people who use the GC.
 2. Buffers say nothing how they're allocated. With classes on the other
 hand, you're basically forced to allocate with the GC
You haven't looked at std.experimental.allocator, have you? http://dpldocs.info/experimental-docs/std.conv.emplace.3.html http://dpldocs.info/experimental-docs/std.experimental.allocator.make.html http://dpldocs.info/experimental-docs/ std.experimental.allocator.dispose.2.html
Jan 03
parent Jacob Carlborg <doob me.com> writes:
On 2017-01-03 18:13, Chris Wright wrote:

 The returned row data is mandatory, and its size can be much larger than
 the stack limit. (A MySQL MEDIUMBLOB field will likely break your stack
 limit.)

 I suppose you could have a streaming API for row data, one that has a
 stack-allocated buffer and returns slices of that:

   string fieldName;
   ubyte[] data;
   ubyte[][string] fields;
   db.query("SELECT * FROM USERS")
     // have to revisit this if a db allows large names
     .onFieldStart((fieldName) => field = fieldName)
     .onFieldData((fragment) => data ~= fragment)
     .onFieldEnd(() { fields[field] = data; data = null; })
     .onRowEnd(() => process(fields))
     .onResultsEnd!(() => writeln("done"))
     .exec();

 This looks pretty terrible, to be honest. I get this sort of thing from
 nodejs because it doesn't want to potentially block and also doesn't want
 to delay letting me process things, but the worst I get there is usually
 two callbacks.

 This would also result in more GC use for the majority of people who use
 the GC.
Look, I didn't say that using the GC should be completely forbidden. I just said we should try to avoid it. For example, I've been using the ddb Postgres driver [1]. It uses classes for most of its types, even if it might not be necessary. Here's one example [2], unless there some intention to have some form of higher level, DB independent, API on top of this, I don't see a reason why that type needs to be a class.
 2. Buffers say nothing how they're allocated. With classes on the other
 hand, you're basically forced to allocate with the GC
You haven't looked at std.experimental.allocator, have you?
I know it's possible to allocate a class without the GC, hence the "basically". I'm not sure how other write their code but at least I make the assumption that all objects are allocated with the GC. [1] https://github.com/pszturmaj/ddb [2] https://github.com/pszturmaj/ddb/blob/master/source/ddb/postgres.d#L904 -- /Jacob Carlborg
Jan 03
prev sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:
 As far as I am aware, the only way to meet those requirements is to use
 a base-class model. Is there something I am missing?
Templates. Templates everywhere. Every method in your application that might possibly touch a database, or touch anything that touches a database, and so on, needs to be templated according to what type of database might be used.
Jan 03
parent Nicholas Wilson <iamthewilsonator hotmail.com> writes:
On Tuesday, 3 January 2017 at 08:09:54 UTC, Chris Wright wrote:
 On Mon, 02 Jan 2017 21:25:42 -0800, Adam Wilson wrote:
 As far as I am aware, the only way to meet those requirements 
 is to use a base-class model. Is there something I am missing?
Templates. Templates everywhere. Every method in your application that might possibly touch a database, or touch anything that touches a database, and so on, needs to be templated according to what type of database might be used.
That limits you to one DB per compilation or craploads of template bloat. There are a number variables here: the number of DB backends you wish to support (b), the number of DB backends you actually use at runtime (r), the number of symbols (not quite the word I'm looking for but, oh well) you need to represent an abstract backend API (s),the number of class types you use to abstract the backend (c) and the number of template you use to abstract the back end (t). b is ideally fixed at "all the backends" r is variable and dependent on the application (e.g. I may only care for Postgres, but someone else may wish to support many SQL DBs). If r == 1 then a template approach is acceptable. s is a function of the dissimilarity of the backends you wish to support. Breaking the problem up into SQL like, graph-like and KV-store is a tradeoff somewhere between having "one DB (interface) to rule them all" and one interface for each backend. c + t = s What this represents is a tradeoff between compile time dispatch and runtime dispatch. As s moves from being all classes to more templates + structs (from the "bottom up"), the last layer of dynamic dispatch before the static dispatch of the templates becomes an algebraic type selection (i.e. check the tag, choose the type, and then static dispatch). I believe the sweet spot for this lies at the point where the dissimilarity of similar backends becomes apparent after the start of a logical operation. Or put another way the point where I know the result that I want and no longer care about any implementation details. As an example using a compute API (sorry I don't know much about DBs): launching a kernel represents a single logical operation but is in fact many driver calls. If one wishes to abstract the compute API then this point becomes the point I would choose. Finding those points will probably not be easy and may be different for different people, but it is worth considering. </ramble>
Jan 03
prev sibling next sibling parent reply Mark <smarksc gmail.com> writes:
On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 2. There are so many different types of data storage systems, 
 how do you design a system generic enough for all of them?

 My answer: You don't. Nobody else has bothered trying, and I 
 believe that our worry over that question is a large part of 
 why we don't have anything substantive today.

 My idea: Split the data storage systems out by category of 
 data-store.
 For example:
 	- SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
 	- Document: std.database.document (Mongo, CouchDB, etc.)
 	- Key-Value: std.database.keyvalue (Redis, etcd2, etc.)
I think that these can all be seen as special cases of a hypegraph database. So on the face of it you probably can build some common interface (apparently the fellows in hypergraphdb.org are trying to do something in this spirit). You can then have specialized interfaces inheriting from it. But given how general hypergraph dbs are, I'm not sure if this is a worthwhile abstraction. By the way, what about XML? The documentaion on std.xml says that the module will be replaced at some point in the future. I wonder when and with what...
Jan 01
parent Adam Wilson <flyboynw gmail.com> writes:
Mark wrote:
 On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 2. There are so many different types of data storage systems, how do
 you design a system generic enough for all of them?

 My answer: You don't. Nobody else has bothered trying, and I believe
 that our worry over that question is a large part of why we don't have
 anything substantive today.

 My idea: Split the data storage systems out by category of data-store.
 For example:
     - SQL: std.database.sql (PostgreSQL, MySQL, MSSQL, etc.)
     - Document: std.database.document (Mongo, CouchDB, etc.)
     - Key-Value: std.database.keyvalue (Redis, etcd2, etc.)
I think that these can all be seen as special cases of a hypegraph database. So on the face of it you probably can build some common interface (apparently the fellows in hypergraphdb.org are trying to do something in this spirit). You can then have specialized interfaces inheriting from it. But given how general hypergraph dbs are, I'm not sure if this is a worthwhile abstraction.
My experience with graph DB's is that the reality has never been anywhere close to the hype. I don't think it's a worthwhile abstraction. But that is my opinion. And if it is, we should be able to add it over the top of this layer if we really want too.
 By the way, what about XML? The documentaion on std.xml says that the
 module will be replaced at some point in the future. I wonder when and
 with what...
I have no idea. It's a great question! That said it's a bit outside the scope of this topic. :) -- Adam Wilson IRC: LightBender //quiet.dlang.dev
Jan 01
prev sibling parent Dejan Lekic <dejan.lekic gmail.com> writes:
On Sunday, 1 January 2017 at 03:24:31 UTC, Adam Wilson wrote:
 Hi Everyone,

 I've seen a lot of talk on the forums over the past year about 
 the need for database support in the D Standard Library and I 
 completely agree. At the end of the day the purpose of any 
 programming language and its attendant libraries is to allow 
 the developer to solve their problems quickly and efficiently; 
 and a large subset of those solutions require some form of 
 structured data store. To my mind, this makes some form of 
 interface(s) to a data-store an essential component of the D 
 Standard Library. And since this is something that my 
 particular problem spaces also need, I thought it would be 
 useful to attempt to do something about it.
The only thing I want, database related, in the standard library is the API! - Nothing else! There should be a standard implementation of that API (libd-db.so for an example), but it should be separated from Phobos. In general, Phobos should only contain the APIs in my humble opinion. We should handle XML processing the same way (API in Phobos, libd-xml.so for the reference implementation), Image processing the same way, GUI, etc... Why? Phobos is enormous already!
Jan 04