www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.database

reply Erik Smith <erik cruiserhouse.com> writes:
I'm back to actively working on a std.database specification & 
implementation.  It's still unstable, minimally tested, and there 
is plenty of work to do, but I wanted to share an update on my 
progress.

The main focus of this project is to bring a standard interface 
for database clients.    This is similar to the purpose of JDBC 
(java) and DBI (perl).  While there is some existing work in 
place (ddbc, etc.c.odbc.sql, vibe.d and other newer projects), my 
goal, after a lengthly period of community review/revision, is to 
achieve Phobos inclusion for the interface standard and also some 
of the implementations. There is debate on whether the 
implementations belong there, but I have made the implementation 
Phobos compatible (by using templates) while this issue is sorted 
out.

The initial focus of this project is to first cover the 
synchronous interface details and get it to a refined state.  
Asynchronous I/O is also important and it will take some time to 
work that into the model with a number of complicated aspects to 
consider.

Also, this is not an ORM tool, although it does provide a 
foundational layer for building one. This is also aimed at SQL 
databases, although I will consider how no-sql targets might fit 
in.

There are now implementations for 4 initial targets (sqlite, 
mysql, Oracle, and ODBC) and they are in a basic working state 
with a common test suite.

For usage examples and other limited project details, see the 
github page:
https://github.com/cruisercoder/dstddb

Dub link coming soon (dealing with dub zombie name) along with 
more progress updates.

Please feel free to comment or question on the design as evolves.

erik
Mar 01 2016
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 Please feel free to comment or question on the design as 
 evolves.
Minor typo in the fluent style select section of readme.md. the line createDatabase("file:///demo.sqlite"); should be createDatabase("file:///demo.sqlite")
Mar 01 2016
parent Erik Smith <erik cruiserhouse.com> writes:
Typo fixed - thanks.  Incidentally, I'm not 100% content with 
createDatabase.  With the library being template based, the types 
are no longer as easy to work with directly:

auto database = Database!DefaultPolicy();

alias cant be used because it instantiates.  The template 
argument can be defaulted, but the best I can do is this:

auto database = Database!()();

It would be nice if the compiler invoked the default type without 
the extra !().  I settled on using a no parameter template 
function named createDatabase.   I would have liked just 
database() to match the other member function style in the 
package, but I think that might be injecting a name that might be 
too common as a variable name.

Project minutiae.

erik
Mar 01 2016
prev sibling next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 I'm back to actively working on a std.database specification & 
 implementation.  It's still unstable, minimally tested, and 
 there is plenty of work to do, but I wanted to share an update 
 on my progress.

 [...]
Hi Erik, As soon as sqlite-d is at the point that it is usable. I will see if I can support the API proposed by you. Have you ever used it in production or semi-production code ? What is the rational for your design ?
Mar 01 2016
parent Erik Smith <erik cruiserhouse.com> writes:
Hi Stefan,

It might be a challenge for CTFE compatibility in the API, but it 
would be interesting to see how much of it is workable.  There 
does need to be some optional API elements in cases where it's 
not supported by the underlying database client(array binding for 
example) and these compile time policies could also be used to 
define a restricted subset if needed for your design.

I don't have any D code in production unfortunately, but I have a 
similar design in C++ that has been in production for years.

As far as the design rationale, the primary aspect of the design 
is to ensure deterministic resource lifetimes for each of the 
resources commonly exposed by clients (connections, statements, 
rowsets, buffer allocations, etc), which is why structs are used. 
  This could be done with with simple non-copyable structs for a 
more limited design.  I went with the value type structs (using 
reference counting) which provides better composibility overall.  
This can be seen in the call chaining examples, which I think 
significantly increase ease-of-use.  It also covers use cases in 
application code where, for example, lazy results are returned 
through code layers and they need to retain their underlying 
connections.

Thanks for the questions and the interest.

erik
Mar 01 2016
prev sibling next sibling parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
Okay I've found a problem.

Here is some code demonstrating it.
http://dpaste.dzfl.pl/022c9e610a18

Now take a look again at Database 
https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37

Do you see the problem?
The solution is simple.

The client database type can be a struct or a class. It doesn't matter 
too much (assuming you're going the way of ranges).
But the intermediary representation must be on the heap and should 
probably use the constructor and not a static create method to get it.
This way people can use other memory management solutions and construct 
it however they like.
Mar 01 2016
next sibling parent reply Erik Smith <erik cruiserhouse.com> writes:
Yes agree that the poly Database is broken - it isn't reference 
counted and I will fix that.

Your sample code had me wondering if I am missing something else, 
but I can't see another issue yet.  I think the use of classes 
would definitely lead to problems with resources being freed out 
of order or too late.

As far as memory management options, my plan is to work 
allocators into the design and that would seem to provide a lot 
of options.  I'm having a problem at the moment with 
MallocAllocator's shared interface.  I'm not sure why it's shared 
since malloc/free are thread safe and I can't seem to cast away 
the shared.  I'm sure there is a reason.

erik


On Wednesday, 2 March 2016 at 03:07:54 UTC, Rikki Cattermole 
wrote:
 Okay I've found a problem.

 Here is some code demonstrating it.
 http://dpaste.dzfl.pl/022c9e610a18

 Now take a look again at Database 
 https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37

 Do you see the problem?
 The solution is simple.

 The client database type can be a struct or a class. It doesn't 
 matter too much (assuming you're going the way of ranges).
 But the intermediary representation must be on the heap and 
 should probably use the constructor and not a static create 
 method to get it.
 This way people can use other memory management solutions and 
 construct it however they like.
Mar 01 2016
parent reply Rikki Cattermole <alphaglosined gmail.com> writes:
On 02/03/16 4:48 PM, Erik Smith wrote:
 Yes agree that the poly Database is broken - it isn't reference counted
 and I will fix that.
My point was, you shouldn't handle that.
 Your sample code had me wondering if I am missing something else, but I
 can't see another issue yet.  I think the use of classes would
 definitely lead to problems with resources being freed out of order or
 too late.
Currently its use after free. E.g. destructor gets called but there is still a copy around. At the very least it should be an explicit call. If classes lead to problems, so will structs.
 As far as memory management options, my plan is to work allocators into
 the design and that would seem to provide a lot of options.  I'm having
 a problem at the moment with MallocAllocator's shared interface.  I'm
 not sure why it's shared since malloc/free are thread safe and I can't
 seem to cast away the shared.  I'm sure there is a reason.
Use IAllocator. Don't touch the structs unless you want pain. Oh and btw final class is your friend. As an FYI here is my managed memory concept https://github.com/rikkimax/alphaPhobos/blob/master/source/std/experimen al/memory/managed.d its not finished but it mostly works. Of course I would want to go the more OOP way, sure thats more allocations but over all I think there are enough wins that its worth it. So if you feel it doesn't fit well with your goal, say so :)
 erik


 On Wednesday, 2 March 2016 at 03:07:54 UTC, Rikki Cattermole wrote:
 Okay I've found a problem.

 Here is some code demonstrating it.
 http://dpaste.dzfl.pl/022c9e610a18

 Now take a look again at Database
 https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37


 Do you see the problem?
 The solution is simple.

 The client database type can be a struct or a class. It doesn't matter
 too much (assuming you're going the way of ranges).
 But the intermediary representation must be on the heap and should
 probably use the constructor and not a static create method to get it.
 This way people can use other memory management solutions and
 construct it however they like.
Mar 01 2016
parent Erik Smith <erik cruiserhouse.com> writes:
I will look at your managed approach to understand it better.  
One drawback I can think of is that destruction might not occur 
immediately.  This can be an issue for shared libraries when the 
GC hasn't run when the library call returns to the host 
application.

Maybe it's a C++ bias, but the RefCounted approach seems like 
such a natural fit for this use case and so far it's working 
well.  There is the race condition issue but I think there is an 
allocator solution for that coming.

erik


On Wednesday, 2 March 2016 at 04:11:10 UTC, Rikki Cattermole 
wrote:
 On 02/03/16 4:48 PM, Erik Smith wrote:
 Yes agree that the poly Database is broken - it isn't 
 reference counted
 and I will fix that.
My point was, you shouldn't handle that.
 Your sample code had me wondering if I am missing something 
 else, but I
 can't see another issue yet.  I think the use of classes would
 definitely lead to problems with resources being freed out of 
 order or
 too late.
Currently its use after free. E.g. destructor gets called but there is still a copy around. At the very least it should be an explicit call. If classes lead to problems, so will structs.
 As far as memory management options, my plan is to work 
 allocators into
 the design and that would seem to provide a lot of options.  
 I'm having
 a problem at the moment with MallocAllocator's shared 
 interface.  I'm
 not sure why it's shared since malloc/free are thread safe and 
 I can't
 seem to cast away the shared.  I'm sure there is a reason.
Use IAllocator. Don't touch the structs unless you want pain. Oh and btw final class is your friend. As an FYI here is my managed memory concept https://github.com/rikkimax/alphaPhobos/blob/master/source/std/experimen al/memory/managed.d its not finished but it mostly works. Of course I would want to go the more OOP way, sure thats more allocations but over all I think there are enough wins that its worth it. So if you feel it doesn't fit well with your goal, say so :)
 erik


 On Wednesday, 2 March 2016 at 03:07:54 UTC, Rikki Cattermole 
 wrote:
 Okay I've found a problem.

 Here is some code demonstrating it.
 http://dpaste.dzfl.pl/022c9e610a18

 Now take a look again at Database
 https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37


 Do you see the problem?
 The solution is simple.

 The client database type can be a struct or a class. It 
 doesn't matter
 too much (assuming you're going the way of ranges).
 But the intermediary representation must be on the heap and 
 should
 probably use the constructor and not a static create method 
 to get it.
 This way people can use other memory management solutions and
 construct it however they like.
Mar 01 2016
prev sibling parent reply landaire <landergriffith+dlang gmail.com> writes:
On Wednesday, 2 March 2016 at 03:07:54 UTC, Rikki Cattermole 
wrote:
 Okay I've found a problem.

 Here is some code demonstrating it.
 http://dpaste.dzfl.pl/022c9e610a18

 Now take a look again at Database 
 https://github.com/cruisercoder/dstddb/blob/master/src/std/database/poly/database.d#L37

 Do you see the problem?
 The solution is simple.

 The client database type can be a struct or a class. It doesn't 
 matter too much (assuming you're going the way of ranges).
 But the intermediary representation must be on the heap and 
 should probably use the constructor and not a static create 
 method to get it.
 This way people can use other memory management solutions and 
 construct it however they like.
How is this a UAF? Isn't the struct copied?
Mar 02 2016
parent reply landaire <landergriffith+dlang gmail.com> writes:
On Wednesday, 2 March 2016 at 18:28:34 UTC, landaire wrote:
 How is this a UAF? Isn't the struct copied?
Ah I think I misunderstood. You mean in the database, not the dpaste?
Mar 02 2016
parent Rikki Cattermole <alphaglosined gmail.com> writes:
On 03/03/16 8:36 AM, landaire wrote:
 On Wednesday, 2 March 2016 at 18:28:34 UTC, landaire wrote:
 How is this a UAF? Isn't the struct copied?
Ah I think I misunderstood. You mean in the database, not the dpaste?
Correct. My code was just to showcase the problem.
Mar 02 2016
prev sibling next sibling parent reply Piotrek <starpit tlen.pl> writes:
On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 The main focus of this project is to bring a standard interface 
 for database clients.    This is similar to the purpose of JDBC 
 (java) and DBI (perl).  While there is some existing work in 
 place (ddbc, etc.c.odbc.sql, vibe.d and other newer projects), 
 my goal, after a lengthly period of community review/revision, 
 is to achieve Phobos inclusion for the interface standard and 
 also some of the implementations. There is debate on whether 
 the implementations belong there, but I have made the 
 implementation Phobos compatible (by using templates) while 
 this issue is sorted out.
My quick comments: 1. In my opinion it should not be called std.database, but let's say "std.dbc". This is because it looks like a wrapper over db clients. Moreover one may say it's almost impossible to make a common and effective interface which would work with all databases. e.g. someone on Wikipedia argues ODBC is obolete (check "ODBC today") 2. I'm not against such a functionality in Phobos. However your project seems to be a duplication of the DDBC project. And it was not proposed for inclusion so often. 3. What I call a D Database API could be described as: - Database object - DbCollection object (equivalent to array) - ranges + std.algorithm And here is my implementation (experimental) of this API: https://github.com/PiotrekDlang/AirLock/tree/master/docs/database/design.md https://github.com/PiotrekDlang/AirLock/tree/master/src Piotrek
Mar 02 2016
next sibling parent reply Erik Smith <erik cruiserhouse.com> writes:
 1. In my opinion it should not be called std.database, but 
 let's say "std.dbc".
 This is because it looks like a wrapper over db clients.  
 Moreover one may say it's almost impossible to make a common 
 and effective interface which would work with all databases. 
 e.g. someone on Wikipedia argues ODBC is obolete (check "ODBC 
 today")
I agree that std.dbc would be more accurate, although I like std.database better because it is more recognizable. Another option could be std.database.client. ODBC is less important now as there are fewer desktop applications that connect directly to databases. There is at least one example related to D, which is the ODBC driver for presto, which is used in Excel. Server side code is generally written using the underlying native client directly. However ODBC support is useful in that it represents a catch all that covers all databases for which a native client is not yet supported. At least for the SQL databases, a standard interface is definitely achievable and I believe that it having it would allow D to get more traction as an application platform.
 2. I'm not against such a functionality in Phobos. However your 
 project seems to be a duplication of the DDBC project. And it 
 was not proposed for inclusion so often.
There are a number of areas where this design is an improvement over DDBC: ease-of-use, better resource management (no scope, no GC), phobos compatibility, to name a few. There is a lot more that needs to be added to make it standards grade.
 3. What I call a D Database API could be described as:
  - Database object
  - DbCollection object (equivalent to array)
  - ranges + std.algorithm
My design is indeed range based: stmt.range() returns a forward range proxy for the query results (the accessor name might change however). Your engine project is interesting. I think there is common ground in the interfaces in the two projects, particularly in how the interface for the results might work. I will look more closely at the details to see what might be workable. erik
Mar 02 2016
parent reply Piotrek <starpit tlen.pl> writes:
On Wednesday, 2 March 2016 at 17:13:32 UTC, Erik Smith wrote:
 There are a number of areas where this design is an improvement 
 over DDBC: ease-of-use, better resource management (no scope, 
 no GC), phobos compatibility, to name a few.  There is a lot 
 more that needs to be added to make it standards grade.
I agree with you we need database manipulation in Phobos. However modules like db, gui, xml or similar are too much work for a one developer. And as you can see from time to time there apears someone with its own vision. That's why, long time ago, I suggested DIP73 (http://wiki.dlang.org/DIP73) so the collaborative work would be controlled by the D community (or the D foundation). But I am aware that there is no agreement nor resources for that.
 Your engine project is interesting.  I think there is common 
 ground in the interfaces in the two projects, particularly in 
 how the interface for the results might work. I will look more 
 closely at the details to see what might be workable.

 erik
I agree that we (as a community) should work on common and effective APIs. Maybe when D foundation is big enough... Piotrek
Mar 03 2016
parent reply Erik Smith <erik cruiserhouse.com> writes:
On Thursday, 3 March 2016 at 16:08:03 UTC, Piotrek wrote:
 I agree with you we need database manipulation in Phobos. 
 However modules like db, gui, xml or similar are too much work 
 for a one developer. And as you can see from time to time there 
 apears someone with its own vision.

 That's why, long time ago, I suggested DIP73 
 (http://wiki.dlang.org/DIP73) so the collaborative work would 
 be controlled by the D community (or the D foundation). But I 
 am aware that there is no agreement nor resources for that.

 I agree that we (as a community) should work on common and 
 effective APIs. Maybe when D foundation is big enough...
Your process proposal (DIP73) was helpful and gives me a better perspective on the standardization process. Thanks for referencing that Piotrek. You are right that areas like this are too much work for one developer. The only leverage I might have is a lot of familiarity working with many of the native C client interfaces and experience implementing higher level interfaces on top. That and is also a lot of existing work to draw on that can inform the design. Also, my sense is that while there is less process for standardization in D at present, the rate at which progress can occur should be much higher (compared to ISOCCP, for example). The last thing I want, however, is to get something accepted into std.experimental that is highly contentious or of subpar quality.
Mar 03 2016
parent reply Kagamin <spam here.lot> writes:
On Thursday, 3 March 2016 at 17:46:02 UTC, Erik Smith wrote:
 The only leverage I might have is a lot of  familiarity working 
 with many of the native C client interfaces and experience 
 implementing higher level interfaces on top.  That and is also 
 a lot of existing work to draw on that can inform the design.
BTW in the oracle driver you use that ODBC idiom of passing strings as pointer+length pairs. Why don't you use it in ODBC driver?
Mar 04 2016
parent Erik Smith <erik cruiserhouse.com> writes:
On Friday, 4 March 2016 at 16:54:33 UTC, Kagamin wrote:
 On Thursday, 3 March 2016 at 17:46:02 UTC, Erik Smith wrote:
 BTW in the oracle driver you use that ODBC idiom of passing 
 strings as pointer+length pairs. Why don't you use it in ODBC 
 driver?
That will be fixed in the next push. erik
Mar 04 2016
prev sibling next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 2 March 2016 at 15:41:56 UTC, Piotrek wrote:
 On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
[...]
My quick comments: 1. In my opinion it should not be called std.database, but let's say "std.dbc". This is because it looks like a wrapper over db clients. Moreover one may say it's almost impossible to make a common and effective interface which would work with all databases. e.g. someone on Wikipedia argues ODBC is obolete (check "ODBC today") [...]
Looks good. At least at a first glace it seems pretty similar to SQLite. I'd like to take this api for sqlite-d. Because it seems to fit quite naturally.
Mar 02 2016
prev sibling next sibling parent Kagamin <spam here.lot> writes:
On Wednesday, 2 March 2016 at 15:41:56 UTC, Piotrek wrote:
 Moreover one may say it's almost impossible to make a common 
 and effective interface which would work with all databases. 
 e.g. someone on Wikipedia argues ODBC is obolete (check "ODBC 
 today")
I believe that section is about PHP forums being locked to mysql, and performance is not the reason for that. Mostly because mysql support is builtin, works for most things and is provided by hosters.
Mar 02 2016
prev sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Wed, 02 Mar 2016 15:41:56 +0000, Piotrek wrote:
 3. What I call a D Database API could be described as:
   - Database object
   - DbCollection object (equivalent to array)
   - ranges + std.algorithm
It looks like you're trying to write a LevelDB analogue that implements an array rather than a key/value collection -- and that's a decent start. If you're trying to connect to a SQL database or a document database, as I'd expect for something called "std.database", it's a pretty terrible API: * No index scans, lookups, or range queries. * No support for complex queries. * No support for joins. * No support for projections. * No support for transactions. * If you add support for transactions, you'll crash all the time because the transactions got too large, thanks to the full table scan mentality. * In your implementation, updates must bring every affected row over the wire, then send back the modified row. * Updates affect an entire row. If one process updates one field in a row and another one updates a different field, one of those writes gets clobbered. * The API assumes a total ordering for each DbCollection. This is not valid. * If there are multiple rows that compare as equals, there's no way to update only one of them in your implementation. * In your implementation, updating one row is a ϴ(N) operation. It still costs ϴ(N) when the row you want to update is the first one in the collection.
Mar 02 2016
parent reply Piotrek <starpit tlen.pl> writes:
On Thursday, 3 March 2016 at 01:49:22 UTC, Chris Wright wrote:
 If you're trying to connect to a SQL database or a document 
 database, as I'd expect for something called "std.database"
The thing is I strongly encourage to not reserve std.database for external database clients and even what is more limiting to SQL ones only.
 , it's a pretty terrible API:
  * No index scans, lookups, or range queries.
Indexes can be supported by strings and CTFE, can't they? e.g. filter!q{item.elements.length < 10 && item.model == "Sport"}
  * No support for complex queries.
Not sure what you mean by complex queries. Also I think the API allows arbitrary complex queries.
  * No support for joins.
Can be done by attributes or other linking functionality between DbCollections.
  * No support for projections.
You mean something like referring to part of the item's fields? I see no problem here.
  * No support for transactions.
  * If you add support for transactions, you'll crash all the 
 time because
 the transactions got too large, thanks to the full table scan 
 mentality.
Isn't it just the "index support" case?
  * In your implementation, updates must bring every affected 
 row over the
 wire, then send back the modified row.
In my implementation there is no wire (that's why I call it embedded). However I thought we talk about API and not particular implementation. I don't see how this API excludes RPC. Query strings (e.g. SQL) can be provided in old fashioned way.
  * Updates affect an entire row. If one process updates one 
 field in a
 row and another one updates a different field, one of those 
 writes gets
 clobbered.
I think this is just a "must have" for any db engine. I don't see how it applies to the proposed API other than any implementation of db engine has to handle it properly. When I say DbCollection should behave similar to an ordinal array I don't mean it should be an ordinal array.
  * The API assumes a total ordering for each DbCollection. This 
 is not valid.
I don't know what you mean here. Example would be good.
  * If there are multiple rows that compare as equals, there's 
 no way to
 update only one of them in your implementation.
  * In your implementation, updating one row is a ϴ(N) 
 operation. It still
 costs ϴ(N) when the row you want to update is the first one in 
 the collection.
I'm still not sure if you are referring to my implementation or hypothetical API. To be clear: my current implementation is still proof of concept and surly *unfinished*. And in case you refer to my implementation I plan to support O(1), O(log n) and O(n) access patterns with its "rights and duties". Cheers, Piotrek
Mar 03 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Thu, 03 Mar 2016 15:50:04 +0000, Piotrek wrote:

 On Thursday, 3 March 2016 at 01:49:22 UTC, Chris Wright wrote:
 If you're trying to connect to a SQL database or a document database,
 as I'd expect for something called "std.database"
The thing is I strongly encourage to not reserve std.database for external database clients and even what is more limiting to SQL ones only.
 , it's a pretty terrible API:
  * No index scans, lookups, or range queries.
Indexes can be supported by strings and CTFE, can't they? e.g. filter!q{item.elements.length < 10 && item.model == "Sport"}
You were a bit vague before. I interpreted you as saying "just offer a range and an array-like API, and then you can use it with std.algorithm". But if you meant to offer an API that is similar to std.algorithm and also array-like, that's more feasible. You're still left with the task of transpiling D to SQL. This model does not work with CouchDB. You must avoid using std.algorithm and std.range functions assiduously because they would offer terrible performance.
  * No support for complex queries.
Not sure what you mean by complex queries. Also I think the API allows arbitrary complex queries.
Aggregates, especially with joins. Computed fields.
  * No support for joins.
Can be done by attributes or other linking functionality between DbCollections.
With attributes, you need users to define aggregate types instead of just using Row and the like. That's ORM territory. At a previous job I maintained an internal BI site that exposed 50-100 different queries, each with their own set of result fields. We didn't want to use ORM there; it would have been cumbersome and inappropriate. Also, that assumes that you will always want a join when querying a table. I maintained an application once, using ORM, in which we sometimes wanted an eager join and sometimes wanted a lazy one. This posed a nontrivial performance impact. I'm not sure ORM would be a candidate for phobos.
  * No support for projections.
You mean something like referring to part of the item's fields? I see no problem here.
Let me point you to the existence of the TEXT and BLOB datatypes. They can each hold 2**32 bytes of data in MySQL. I'm not splitting those off into a separate table to port my legacy database to your API. I'm not dragging in multiple megabytes of data in every query. If you're going full ORM, you can add lazy fields. That adds complexity. It's also inefficient when I know in advance that I need those fields.
  * No support for transactions.
  * If you add support for transactions, you'll crash all the
 time because the transactions got too large, thanks to the full table
 scan mentality.
Isn't it just the "index support" case?
You didn't mention transactions at all in the initial outline. After that, yes, in large portion index support addresses this. DB-side aggregation also helps.
  * In your implementation, updates must bring every affected
 row over the wire, then send back the modified row.
In my implementation there is no wire (that's why I call it embedded). However I thought we talk about API and not particular implementation. I don't see how this API excludes RPC. Query strings (e.g. SQL) can be provided in old fashioned way.
I'm running a website and decide that, with the latest changes, existing users need to get the new user email. So I write: UPDATE users SET sent_join_email = FALSE; -- ok; 1,377,212 rows affected Or I'm using your database system. If it uses std.algorithm, I have to iterate through the users list, pulling each row into my process's memory from the database server, and then I have to write everything back to the database server. Depending on the implementation, it's using a database cursor or issuing a new query for every K results. If it's using a database cursor, those might not be valid across transaction boundaries. I'm not sure. If they aren't, you get a large transaction, which causes problems. If your database system instead offers a string-based API similar to std.algorithm, you might be able to turn this into a single query, but it's going to be a lot of work for you.
  * Updates affect an entire row. If one process updates one
 field in a row and another one updates a different field, one of those
 writes gets clobbered.
I think this is just a "must have" for any db engine. I don't see how it applies to the proposed API other than any implementation of db engine has to handle it properly.
Without transactions, MySQL supports writing to two different columns in two different queries without those writes clobbering each other. That's handling it properly.
 When I say DbCollection should behave similar to an ordinal array I
 don't mean it should be an ordinal array.
 
  * The API assumes a total ordering for each DbCollection. This
 is not valid.
I don't know what you mean here. Example would be good.
opIndex(size_t offset) assumes the database supports a one-to-one mapping between offsets and rows. SQLite, for one, does not guarantee query result ordering if the query does not include an ORDER BY clause. So offering on opIndex(size_t offset) operation -- either you load the entire table into memory in advance, or you might get the same row returned for every index. Alternatively, you must examine the table and generate a sufficiently unique ordering for it. Any mutation to the table's contents can change the indices for all items in the table. This isn't invalid -- C# collection iterators throw an exception if you modify the collection during iteration. But, to avoid silent errors you can't defend against, you have to dump the whole table into a transaction or have the database somehow tell you when someone else has modified it. Speaking of modifications, let's say I write a simple loop like: for (size_t i = 0; i < dbTable.length; i++) { writeln(dbTable[i].id); } This can go wrong in about four ways: * Someone inserts a row whose index is less than i. This prints out the same id twice. * Someone deletes a row whose index is less than i. This skips a different row. * Someone deletes a row when i == dbTable.length - 1, just before opIndex executes. I get an index out of bounds error. * This is issuing two queries per iteration. It's going to take probably fifty times longer than using a cursor. This is a terrible usage pattern, but by offering opIndex and length operations, you are recommending it.
  * If there are multiple rows that compare as equals, there's
 no way to update only one of them in your implementation.
  * In your implementation, updating one row is a ϴ(N)
 operation. It still costs ϴ(N) when the row you want to update is the
 first one in the collection.
I'm still not sure if you are referring to my implementation or hypothetical API. To be clear: my current implementation is still proof of concept and surly *unfinished*. And in case you refer to my implementation I plan to support O(1), O(log n) and O(n) access patterns with its "rights and duties".
I specifically said "in your implementation" for these two because I was referring to your implementation rather than your proposal in general. The rest refers to the basic idea.
Mar 03 2016
parent reply Piotrek <starpit tlen.pl> writes:
On Thursday, 3 March 2016 at 18:48:08 UTC, Chris Wright wrote:
 You were a bit vague before. I interpreted you as saying "just 
 offer a range and an array-like API, and then you can use it 
 with std.algorithm". But if you meant to offer an API that is 
 similar to std.algorithm and also array-like, that's more 
 feasible.
I agree I could be better in describing the concept. But I just sketched the idea.
 You're still left with the task of transpiling D to SQL.
If someone wants to use SQL in its *full power* no D API nor any other language will suffice. Mainly because it will be always a traslation layer . The only we can to is to provide an aid like things suggested by Andrei (sql parser, value binding, etc).
 This model does not work with CouchDB.
I don't know CouchDB so I can't comment.
 You must avoid using std.algorithm and std.range functions 
 assiduously because they would offer terrible performance.
For big data in db, plain vanilla std.algorithm won't be insufficient. I agree.
  * No support for complex queries.
Not sure what you mean by complex queries. Also I think the API allows arbitrary complex queries.
Aggregates, especially with joins. Computed fields.
Regarding computed fields and other database vendor specific features you are right. But on the other hand aggregations and joins can be represented as objects and proxies of objects.
  * No support for joins.
Can be done by attributes or other linking functionality between DbCollections.
With attributes, you need users to define aggregate types instead of just using Row and the like. That's ORM territory.
I don't like ORM with respect to SQL. But quasi object database which can look similar to ORM is not a problem for me.
 At a previous job I maintained an internal BI site that exposed 
 50-100 different queries, each with their own set of result 
 fields. We didn't want to use ORM there; it would have been 
 cumbersome and inappropriate.
I can see your point. But the problem can be solved by not using SQL.
 Also, that assumes that you will always want a join when 
 querying a table. I maintained an application once, using ORM, 
 in which we sometimes wanted an eager join and sometimes wanted 
 a lazy one. This posed a nontrivial performance impact.
Something like DbProxy would handle lazy "joins".
 I'm not sure ORM would be a candidate for phobos.
As I don't plan to use an (traditional) ORM I'm not involved. However if other people would find it worthy I don't object.
  * No support for projections.
You mean something like referring to part of the item's fields? I see no problem here.
Let me point you to the existence of the TEXT and BLOB datatypes. They can each hold 2**32 bytes of data in MySQL.
This is something a DbProxy would handle. Eventually: struct OrginalObject { int id; string bigString; } struct StrippedObject { int id; } then auto collA = db.collection!OrginalObject("Big"); auto collA = db.collection!StrippedObject("Big"); In the second line the string is not fetched.
 I'm not splitting those off into a separate table to port my 
 legacy database to your API. I'm not dragging in multiple 
 megabytes of data in every query.

 If you're going full ORM, you can add lazy fields. That adds 
 complexity. It's also inefficient when I know in advance that I 
 need those fields.


  * In your implementation, updates must bring every affected
 row over the wire, then send back the modified row.
In my implementation there is no wire (that's why I call it embedded). However I thought we talk about API and not particular implementation. I don't see how this API excludes RPC. Query strings (e.g. SQL) can be provided in old fashioned way.
I'm running a website and decide that, with the latest changes, existing users need to get the new user email. So I write: UPDATE users SET sent_join_email = FALSE; -- ok; 1,377,212 rows affected Or I'm using your database system. If it uses std.algorithm, I have to iterate through the users list, pulling each row into my process's memory from the database server, and then I have to write everything back to the database server. Depending on the implementation, it's using a database cursor or issuing a new query for every K results. If it's using a database cursor, those might not be valid across transaction boundaries. I'm not sure. If they aren't, you get a large transaction, which causes problems. If your database system instead offers a string-based API similar to std.algorithm, you might be able to turn this into a single query, but it's going to be a lot of work for you.
For client-server approach I agree with the above. For embedded design (as in my project) this is not a case.
  * Updates affect an entire row. If one process updates one
 field in a row and another one updates a different field, one 
 of those
 writes gets clobbered.
I think this is just a "must have" for any db engine. I don't see how it applies to the proposed API other than any implementation of db engine has to handle it properly.
Without transactions, MySQL supports writing to two different columns in two different queries without those writes clobbering each other. That's handling it properly.
Designing a good locking mechanism will be a challenging task, that is what I'm sure :)
 When I say DbCollection should behave similar to an ordinal 
 array I don't mean it should be an ordinal array.
 
  * The API assumes a total ordering for each DbCollection. 
 This
 is not valid.
I don't know what you mean here. Example would be good.
opIndex(size_t offset) assumes the database supports a one-to-one mapping between offsets and rows.
[...]
 This is a terrible usage pattern, but by offering opIndex and 
 length operations, you are recommending it.
I don't recommend it. I just added it for evaluation. I'm aware it only works when the collection is not mutated. I think the same goes for all shared collections (also those in memory) Finally, IMO any DB API will be biased toward one solution. Cheers Piotrek
Mar 03 2016
parent reply Chris Wright <dhasenan gmail.com> writes:
On Fri, 04 Mar 2016 06:16:59 +0000, Piotrek wrote:

 For client-server approach I agree with the above. For embedded design
 (as in my project) this is not a case.
Which is all I'm saying. Something in std.database sounds like it should allow you to interact with databases, like JDBC. With embedded databases, there's a lot of variety out there, probably a decent selection of tradeoffs, so I'm not sure any one would be appropriate to phobos.
Mar 04 2016
parent Piotrek <starpit tlen.pl> writes:
On Friday, 4 March 2016 at 16:41:35 UTC, Chris Wright wrote:
 With embedded databases, there's a lot of variety out there, 
 probably a decent selection of tradeoffs, so I'm not sure any 
 one would be appropriate to phobos.
The one written from scratch specially for D (I'm talking in general, nothing particular in mind). This is my idea of the optimal solution. And I can be wrong of course. And there is a need for a module with interface to other databases, something like Erik's project. Piotrek
Mar 04 2016
prev sibling next sibling parent reply Dejan Lekic <dejan.lekic gmail.com> writes:
On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 I'm back to actively working on a std.database specification & 
 implementation.  It's still unstable, minimally tested, and 
 there is plenty of work to do, but I wanted to share an update 
 on my progress.
I suggest you call the package stdx.db - it is not (and may not become) a standard package, so `std` is out of question. If it is supposed to be *proposed* as standard package, then `stdx` is good because that is what some people have used in the past (while others used the ugly std.experimental. for the same purpose). I humbly believe that this effort **must** be collaborative as such package is doomed to fail if done wrong.
Mar 03 2016
next sibling parent reply Erik Smith <erik cruiserhouse.com> writes:
 I suggest you call the package stdx.db - it is not (and may not 
 become) a standard package, so `std` is out of question. If it 
 is supposed to be *proposed* as standard package, then `stdx` 
 is good because that is what some people have used in the past 
 (while others used the ugly std.experimental. for the same 
 purpose).

 I humbly believe that this effort **must** be collaborative as 
 such package is doomed to fail if done wrong.
I totally agree that it must be collaborative and community driven. I failed to add the proposed qualifier in this thread - sorry about that. Right now I'm just trying to put together enough of a substantive design to be worthy of discussion. While I'm presenting a design the way I think it should work, I'm definitely asking for feedback and especially opposition to any aspect of it. My package name choice is just an example, the package name would have to be changed to whatever the "D committee" thinks is appropriate. erik
Mar 03 2016
parent reply Kagamin <spam here.lot> writes:
Also member names: methods are named after verbs, you use nouns. 
Method `next` is ambiguous: is the first row the next row? 
`fetch` or `fetchRow` would be better.
Mar 03 2016
parent reply Erik Smith <erik cruiserhouse.com> writes:
On Thursday, 3 March 2016 at 15:07:43 UTC, Kagamin wrote:
 Also member names: methods are named after verbs, you use 
 nouns. Method `next` is ambiguous: is the first row the next 
 row? `fetch` or `fetchRow` would be better.
Those are actually internal methods not intended for the interface user. They might get exposed in some way to the implementer so I'll have to consider this issue. However, If I apply what you are suggesting to the the interface methods, it would look like this: auto r = db.connection().statement("select from t").range(); // nouns auto r = db.getConnection().getStatement("select from t").getRange(); // verbs Maybe there is a more creative way to change the names rather than prepending with get (let me know if you have suggestions), but the nouns seem cleaner to me. I'll have to look at more phobos code to get a better sense of the conventions. Also range will be replaced by an opSlice operator. erik
Mar 03 2016
parent reply Kagamin <spam here.lot> writes:
On Thursday, 3 March 2016 at 15:53:28 UTC, Erik Smith wrote:
 auto r = db.connection().statement("select from t").range();  
 // nouns
db.execute("select from t").range(); `range` is probably ok. Or auto connection = db.createConnection(); connection.execute("select from t").range();
 auto r = db.getConnection().getStatement("select from 
 t").getRange();  // verbs
Getters must be actually nouns, but these are not getters, but factory methods: createConnection, createStatement, probably low level and/or private.
Mar 03 2016
next sibling parent reply Kagamin <spam here.lot> writes:
Other options:
db.execute("select from t").reader; //row range
db.execute("select from t").get!long; //scalar
db.execute("select from t"); //non-query
Mar 03 2016
parent Kagamin <spam here.lot> writes:
On the other hand execute can simply return the reader with extra 
getter for scalar result. Just don't do stuff until it's 
iterated. Is it possible?

auto rows = db.execute("select * from t");
Mar 03 2016
prev sibling parent reply Erik Smith <erik cruiserhouse.com> writes:
On Thursday, 3 March 2016 at 17:03:58 UTC, Kagamin wrote:
 On Thursday, 3 March 2016 at 15:53:28 UTC, Erik Smith wrote:
 auto r = db.connection().statement("select from t").range();  
 // nouns
db.execute("select from t").range(); `range` is probably ok. Or auto connection = db.createConnection(); connection.execute("select from t").range();
 auto r = db.getConnection().getStatement("select from 
 t").getRange();  // verbs
Getters must be actually nouns, but these are not getters, but factory methods: createConnection, createStatement, probably low level and/or private.
Good point. I will track this as a design option to debate.
 db.execute("select from t").reader; //row range
 db.execute("select from t").get!long; //scalar
 db.execute("select from t"); //non-query
More good options (the 3rd one is there). Also at the value access level there are several options to consider: v.get!long, v.as!long, v.to!long, etc.
 On the other hand execute can simply return the reader with 
 extra getter for scalar
 result. Just don't do stuff until it's iterated. Is it possible?
 auto rows = db.execute("select * from t");
I'm hedging a bit on this because there are other capabilities that need to be introduced that might present a problem. Another issue is that this might conflict with the notion of a container and the use of opSlice. Great feedback though and I'm tracking it. erik
Mar 03 2016
parent reply Kagamin <spam here.lot> writes:
On Thursday, 3 March 2016 at 18:08:26 UTC, Erik Smith wrote:
 db.execute("select from t").reader; //row range
 db.execute("select from t").get!long; //scalar
 db.execute("select from t"); //non-query
More good options (the 3rd one is there). Also at the value access level there are several options to consider: v.get!long, v.as!long, v.to!long, etc.
to!long looks ok.
 On the other hand execute can simply return the reader with 
 extra getter for scalar
 result. Just don't do stuff until it's iterated. Is it 
 possible?
 auto rows = db.execute("select * from t");
I'm hedging a bit on this because there are other capabilities that need to be introduced that might present a problem.
Can you elaborate? On the other hand these helper methods are built on top of abstraction API and you have to duplicate them in all drivers. Maybe better have them as extension methods in a single module that will work with all drivers? They are effectively a sort of minimal frontend already.
 Another issue is that this might conflict with the notion of a 
 container and the use of opSlice.  Great feedback though and 
 I'm tracking it.
The result set is even iterator/stream, i.e. conceptually has even less container properties than ranges themselves. Why would you think of it as a container? I think it's ok as input range. Anyway, this `execute` method is a frontend, i.e. it's replaceable without touching the driver.
Mar 04 2016
parent reply Erik Smith <erik cruiserhouse.com> writes:
On Friday, 4 March 2016 at 11:57:49 UTC, Kagamin wrote:
 On Thursday, 3 March 2016 at 18:08:26 UTC, Erik Smith wrote:
 db.execute("select from t").reader; //row range
 db.execute("select from t").get!long; //scalar
 db.execute("select from t"); //non-query
More good options (the 3rd one is there). Also at the value access level there are several options to consider: v.get!long, v.as!long, v.to!long, etc.
to!long looks ok.
 On the other hand execute can simply return the reader with 
 extra getter for scalar
 result. Just don't do stuff until it's iterated. Is it 
 possible?
 auto rows = db.execute("select * from t");
I'm hedging a bit on this because there are other capabilities that need to be introduced that might present a problem.
Can you elaborate?
Actually I like this and I think it can work. I'm trying to keep a single execute function name for both row/no-row queries. I can still return the range proxy for no-row queries that is either empty or throws on access.
 On the other hand these helper methods are built on top of 
 abstraction API and you have to duplicate them in all drivers. 
 Maybe better have them as extension methods in a single module 
 that will work with all drivers? They are effectively a sort of 
 minimal frontend already.
I'm just waiting for the code to settle a bit in a basic working state before I refactor into a two layer design.
 The result set is even iterator/stream, i.e. conceptually has 
 even less container properties than ranges themselves. Why 
 would you think of it as a container? I think it's ok as input 
 range. Anyway, this `execute` method is a frontend, i.e. it's 
 replaceable without touching the driver.
The range itself is an InputRange. The statement is acting as a container only in that it is the source of the range and holds the data. I agree that it is confusing to use the term container although it seems to fit the definition of one.
Mar 04 2016
parent reply Kagamin <spam here.lot> writes:
On Friday, 4 March 2016 at 14:44:48 UTC, Erik Smith wrote:
 Actually I like this and I think it can work.  I'm trying to 
 keep a single execute function name for both row/no-row 
 queries. I can still return the range proxy for no-row queries 
 that is either empty or throws on access.
Yes, that's the idea.
 I'm just waiting for the code to settle a bit in a basic 
 working state before I refactor into a two layer design.
Another idea from previous attempts: rename `execute` to `query`. foreach(row; db.query("select * from t")) { ... } And name for package: std.sql
 The result set is even iterator/stream, i.e. conceptually has 
 even less container properties than ranges themselves. Why 
 would you think of it as a container? I think it's ok as input 
 range. Anyway, this `execute` method is a frontend, i.e. it's 
 replaceable without touching the driver.
The range itself is an InputRange. The statement is acting as a container only in that it is the source of the range and holds the data. I agree that it is confusing to use the term container although it seems to fit the definition of one.
`execute` should not return a statement, but a result set reader, that would be a range. Yeah, the result set is indeed a range conceptually: it has a definite number of ordered rows, a beginning and an end.
Mar 04 2016
parent reply Erik Smith <erik cruiserhouse.com> writes:
On Friday, 4 March 2016 at 16:43:00 UTC, Kagamin wrote:
 On Friday, 4 March 2016 at 14:44:48 UTC, Erik Smith wrote:
 Actually I like this and I think it can work.  I'm trying to 
 keep a single execute function name for both row/no-row 
 queries. I can still return the range proxy for no-row queries 
 that is either empty or throws on access.
Yes, that's the idea.
 I'm just waiting for the code to settle a bit in a basic 
 working state before I refactor into a two layer design.
Another idea from previous attempts: rename `execute` to `query`. foreach(row; db.query("select * from t")) { ... } And name for package: std.sql
 The result set is even iterator/stream, i.e. conceptually has 
 even less container properties than ranges themselves. Why 
 would you think of it as a container? I think it's ok as 
 input range. Anyway, this `execute` method is a frontend, 
 i.e. it's replaceable without touching the driver.
The range itself is an InputRange. The statement is acting as a container only in that it is the source of the range and holds the data. I agree that it is confusing to use the term container although it seems to fit the definition of one.
`execute` should not return a statement, but a result set reader, that would be a range. Yeah, the result set is indeed a range conceptually: it has a definite number of ordered rows, a beginning and an end.
On further thought, execute() should definitely return something but I think it needs to return a Result (alternative name RowSet), the actual container, rather than the range. This more cleanly separates out the post execute state management into a separate object from Statement. The user will often want other things from the execute then just the range, such as meta-data (# columns, columns names/types) describing the returned table and RowSet provides convenient access to it. This also suggests that execute should be explicit & once/only, which I think is better. Here is an updated example to illustrate: auto db = createDatabase("file:///testdb"); auto rowSet = db.connection().statement("select name,score from score").execute; foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int); I'll track query as an alternative name for execute() and std.sql as alternative for std.database.
Mar 04 2016
next sibling parent reply Bubbasaur <bubba gmail.com> writes:
On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
 from score").execute;
     foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int);

 I'll track query as an alternative name for execute() and 
 std.sql as alternative for std.database.
Why not only: auto rowSet = db.connection().query("select name,scorefrom score"); Or instead of query maybe "returnRows" or just "rows", I think the other way is too verbose. Bubba.
Mar 04 2016
parent Erik Smith <erik cruiserhouse.com> writes:
On Friday, 4 March 2016 at 19:27:47 UTC, Bubbasaur wrote:
 On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
 from score").execute;
     foreach (r; rowSet) 
 writeln(r[0].as!string,",",r[1].as!int);

 I'll track query as an alternative name for execute() and 
 std.sql as alternative for std.database.
Why not only: auto rowSet = db.connection().query("select name,scorefrom score"); Or instead of query maybe "returnRows" or just "rows", I think the other way is too verbose. Bubba.
Agree connection execute is simpler and it is supported, although I need to add the bind version to connection as well. The statement should only get involved when you need more specific control over the input binding. I add returnRows/rows to the execute/query alternative name list. erik
Mar 04 2016
prev sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
 from score").execute;
     foreach (r; rowSet) writeln(r[0].as!string,",",r[1].as!int);
You'll want to have some types in there. As in struct S { string name; int age; } auto rowSet = db.connection().statement("select name,score from score").executeAs!S; foreach (r; rowSet) writeln(r.name,",",r.age);
Mar 04 2016
parent reply Erik Smith <erik cruiserhouse.com> writes:
On Friday, 4 March 2016 at 22:44:24 UTC, Sebastiaan Koppe wrote:
 On Friday, 4 March 2016 at 18:42:45 UTC, Erik Smith wrote:
     auto db = createDatabase("file:///testdb");
     auto rowSet = db.connection().statement("select name,score 
 from score").execute;
     foreach (r; rowSet) 
 writeln(r[0].as!string,",",r[1].as!int);
You'll want to have some types in there. As in struct S { string name; int age; } auto rowSet = db.connection().statement("select name,score from score").executeAs!S; foreach (r; rowSet) writeln(r.name,",",r.age);
I think some basic object serialization capabilities would be great although I'm not sure how the bare names can be accessed like that through the rowSet How would that work? I can see this: S s; auto rowSet = db.connection().statement("select name,score from score").into(s); writeln(s.name,",",s.age); Two other options (not really serialization): // 1 string name; int age; auto rowSet = db.connection().execute("select name,age from score").into(s.name,age); foreach (r; rowSet) writeln(name,",",age); // 2 foreach (r; rowSet) writeln(r["name"],",",r["age"]);
Mar 04 2016
parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Friday, 4 March 2016 at 23:55:55 UTC, Erik Smith wrote:
 I think some basic object serialization capabilities would be 
 great although I'm not sure how the bare names can be accessed 
 like that through the rowSet How would that work?
I did a project a while back using mysql-native (see code.dlang.org) and it has a `toStruct` function on its Row type. Maybe have a look there. To access the bare named RowSet would have to be templated on the struct. Follows code from said project: template executeAs(Type) { struct TypedSQLSequence { ResultSequence rs; alias rs this; property Type front() { Row r = rs.front; Type temp; r.toStruct!Type(temp); return temp; } } auto executeAs(ref mysql.connection.ResultSequence rs) { return TypedSQLSequence(rs); } }
Mar 05 2016
prev sibling parent John Colvin <john.loughran.colvin gmail.com> writes:
On Thursday, 3 March 2016 at 11:16:03 UTC, Dejan Lekic wrote:
 On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 I'm back to actively working on a std.database specification & 
 implementation.  It's still unstable, minimally tested, and 
 there is plenty of work to do, but I wanted to share an update 
 on my progress.
I suggest you call the package stdx.db - it is not (and may not become) a standard package, so `std` is out of question. If it is supposed to be *proposed* as standard package, then `stdx` is good because that is what some people have used in the past (while others used the ugly std.experimental. for the same purpose). I humbly believe that this effort **must** be collaborative as such package is doomed to fail if done wrong.
std.experimental, ugly or not, is what is in phobos. See std.experimental.allocator, std.experimental.logger and std.experimental.ndslice
Mar 03 2016
prev sibling next sibling parent reply Saurabh Das <saurabh.das gmail.com> writes:
On Tuesday, 1 March 2016 at 21:00:30 UTC, Erik Smith wrote:
 I'm back to actively working on a std.database specification & 
 implementation.  It's still unstable, minimally tested, and 
 there is plenty of work to do, but I wanted to share an update 
 on my progress.

 [...]
A little late to the party, nevertheless: Thanks for doing this, it will be super-helpful! My only feature request will be: please make it work with minimal effort with Vibe.D! :) ~ sd
Mar 05 2016
parent reply Jacob Carlborg <doob me.com> writes:
On 2016-03-05 11:23, Saurabh Das wrote:

 A little late to the party, nevertheless: Thanks for doing this, it will
 be super-helpful!

 My only feature request will be: please make it work with minimal effort
 with Vibe.D! :)
Yeah, that's really important. Unfortunately it looks like a synchronous interface is prioritized :(. -- /Jacob Carlborg
Mar 05 2016
parent reply Erik Smith <erik cruiserhouse.com> writes:
On Saturday, 5 March 2016 at 13:13:09 UTC, Jacob Carlborg wrote:
 On 2016-03-05 11:23, Saurabh Das wrote:

 A little late to the party, nevertheless: Thanks for doing 
 this, it will
 be super-helpful!

 My only feature request will be: please make it work with 
 minimal effort
 with Vibe.D! :)
Yeah, that's really important. Unfortunately it looks like a synchronous interface is prioritized :(.
I'm definitely going to start working in async capability (or more accurately, non-blocking) into the interface. Both models are essential and there are strong use cases for both, but I know there is a lot of interest in NBIO for vibe.d compatibility. This is challenging and, as many are aware, the C clients for SQL databases have traditionally been synchronous only. One suggested approach is to fork the driver source and change the I/O calls to be non-blocking. Another is to implement a non-blocking driver directly against the wire protocol (a "native" driver). These are limited options, but to the extent this approach is viable in some cases (mysql-native for example), they could be adapted to a standard interface. The good news is that some databases do have (or are working on) non-blocking support in their C clients. The webscalesql mysql fork, for example, has a non-blocking client that is in production and their client works with regular mysql databases. That fork should eventually be merged back into mysql (or may have been already). Postgres also provides non-blocking support. These are two cases that I'm targeting for initial reference implementations.
Mar 05 2016
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2016-03-05 19:00, Erik Smith wrote:

 I'm definitely going to start working in async capability (or more
 accurately, non-blocking) into the interface.   Both models are
 essential and there are strong use cases for both, but I know there is a
 lot of interest in NBIO for vibe.d compatibility. This is challenging
 and, as many are aware, the C clients for SQL databases have
 traditionally been synchronous only.  One suggested approach is to fork
 the driver source and change the I/O calls to be non-blocking. Another
 is to implement a non-blocking driver directly against the wire protocol
 (a "native" driver).  These are limited options, but to the extent this
 approach is viable in some cases (mysql-native for example), they could
 be adapted to a standard interface.  The good news is that some
 databases do have (or are working on) non-blocking support in their C
 clients. The webscalesql mysql fork, for example, has a non-blocking
 client that is in production and their client works with regular mysql
 databases.  That fork should eventually be merged back into mysql (or
 may have been already).  Postgres also provides non-blocking support.
 These are two cases that I'm targeting for initial reference
 implementations.
Just for the record, there's already a native Postgres client that is non-blocking and vibe.d compatible. https://github.com/pszturmaj/ddb -- /Jacob Carlborg
Mar 05 2016
prev sibling parent reply Kagamin <spam here.lot> writes:
On Saturday, 5 March 2016 at 18:00:56 UTC, Erik Smith wrote:
 I'm definitely going to start working in async capability (or 
 more accurately, non-blocking) into the interface.
Huh? Non-blocking operation is not reflected in interface. Only asynchronous requires special interface. Vibe interfaces are synchronous.
 Postgres also provides non-blocking support.  These are two 
 cases that I'm targeting for initial reference implementations.
I've looked into OCI headers, they say something about asynchronous operations, do you know something about that?
Mar 06 2016
parent Erik Smith <erik cruiserhouse.com> writes:
On Sunday, 6 March 2016 at 08:53:05 UTC, Kagamin wrote:
 On Saturday, 5 March 2016 at 18:00:56 UTC, Erik Smith wrote:
 I'm definitely going to start working in async capability (or 
 more accurately, non-blocking) into the interface.
Huh? Non-blocking operation is not reflected in interface. Only asynchronous requires special interface. Vibe interfaces are synchronous.
I think a basic non-blocking interface should be similar to the synchronous one, but also exposes a file/socket descriptor to pass to the event system and also provide status method for whether the R/W operation is waiting.
 I've looked into OCI headers, they say something about 
 asynchronous operations, do you know something about that?
I need to look into this more, but the OCI non-blocking model is problematic (a missing file descriptor maybe). It can probably be made to work since Node now has a driver directly from Oracle, but I'm not sure if it has the same scaling profile as other non-blocking clients.
Mar 06 2016
prev sibling parent Erik Smith <erik cruiserhouse.com> writes:
I've made a few updates based on some of the feedback in this 
thread.

- execute() renamed to query()
- query with input binds directly from connection
- query() returns result for chaining

The design is still early stage.  I've got a growing list of 
design options which I'll bring into discussion once the 
interface becomes more stable.

Other updates:

- postgres reference implementation added (synchronous only at 
moment)
- DUB package now available as dstddb
- DMD "cyclic structs" bug fix may land soon (PR appeared) with 
design improvements to follow.

erik
Mar 10 2016