www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [OT] Idea to get all columns from multiple tables in DB

reply Steven Schveighoffer <schveiguy gmail.com> writes:
I have a database serialization package that I use to fetch data from a 
database. The column names are all identified or tagged appropriately, 
so I just use:

connection.query("SELECT * FROM user").byItem!User;

and it serializes the data from the resulting rows into the struct User 
type for consumption as a range.

However, if there's a join, it poses a problem. Let's say I have an 
Equipment item, which is owned by a user. I can select both the 
equipment and the owner via:

SELECT * FROM equipment left join user on (user.id = equipment.owner_id)

However, this causes problems because I get duplicate columns (e.g. name 
might be in both, or id). Which one goes with which?

I have a mechanism to deal with this. I can prefix each column name with 
a prefix, and then specify that on the byItem function with a uda-like 
directive:

connection.query("SELECT e.*, u.name AS user_name, u.id AS user_id, 
u.email AS user_email FROM ...").byItem!(Equipment, 
DB.colPrefix!"user_", User);

used like so:

foreach(equip, user; dbSequence) ...

But this requires you spell out every single column name in an extremely 
verbose manner, and you can't use that cool '*' selector which is 
somewhat future-proof.

But I realized I can put a "divider" in between the two table .* 
selectors, and then even though I have duplicate names, I can detect the 
divider and, and know which duplicate-named columns apply to which:

connection.query("SELECT e.*, 'user_' as __prefix, u.* FROM 
...").byItem!(Equipment, DB.colPrefix!"user_", User)

In my byItem function, when I'm mapping column names to actual fields at 
the beginning of iteration, I see a column named __prefix, and I use the 
value to prepend to all subsequent column names. And I think this will 
work beautifully!

Is this a technique that is used by anyone regularly? If so, what is it 
called? I'm very excited about this, because I want to minimize DB round 
trips, and also fetch related items as they come in, instead of 
serializing all of one type to an array and then fetching the rest with 
individual queries, but I didn't want to type out a lot of SQL (or 
generate it based on possibly stale struct field names), and have to 
maintain it any time I change something. This solves all the problems in 
a neat way.

-Steve
Jan 07
next sibling parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Tuesday, 7 January 2020 at 21:01:46 UTC, Steven Schveighoffer 
wrote:

 connection.query("SELECT e.*, 'user_' as __prefix, u.* FROM 
 ...").byItem!(Equipment, DB.colPrefix!"user_", User)

 [...]

 Is this a technique that is used by anyone regularly? If so, 
 what is it called? I'm very excited about this, because I want 
 to minimize DB round trips, and also fetch related items as 
 they come in, instead of serializing all of one type to an 
 array and then fetching the rest with individual queries, but I 
 didn't want to type out a lot of SQL (or generate it based on 
 possibly stale struct field names), and have to maintain it any 
 time I change something. This solves all the problems in a neat 
 way.

 -Steve
I don't believe I've seen this pattern. What I realized after watching your talk is that I've generally married the structure to the query rather then the tables or data. Select user.name, equip.product... struct Ownership { string usersname; string product; } I this situation the assignment comes from position and not by column name. You lose the * as you mentioned. What I have not done is thought about how our application differ for dealing with the data.
Jan 07
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/7/20 9:28 PM, Jesse Phillips wrote:
 What I realized after watching your talk is that I've generally married 
 the structure to the query rather then the tables or data.
 
 Select user.name, equip.product...
 
 struct Ownership {
     string usersname;
     string product;
 }
 
 I this situation the assignment comes from position and not by column 
 name. You lose the * as you mentioned.
In my project, I only do generation of SQL for updating or inserting rows. For fetching elements, I write the SQL (by hand or through a sql building system I wrote) and then use the byItem serializer to get the records. Upon the first record, I use the column names in the list to map the field order to the column order. So I don't have to worry about columns being removed/added, etc. And then for each row fetched, everything is extremely fast because the column order/names don't change. And yeah, I've done stuff like the above when I have for instance to fetch statistical data. But for the most part, I have one struct for each table row that I fetch. In my code, if you don't fetch all the columns, it just doesn't fill those in. But I hadn't thought of a good way until now to rename the columns fetched so they can be serialized correctly when there are conflicting column names without spelling all the columns out (which I hate). In fact, I might change this later to avoid having to fetch the column prefix for every row (I only need it for the first row), but this technique is definitely more efficient than fetching all of one type first (probably in a single query), and then fetching each related item one at a time. -Steve
Jan 08
parent reply Paolo Invernizzi <paolo.invernizzi gmail.com> writes:
On Wednesday, 8 January 2020 at 23:01:14 UTC, Steven 
Schveighoffer wrote:

 In my project, I only do generation of SQL for updating or 
 inserting rows. For fetching elements, I write the SQL (by hand 
 or through a sql building system I wrote) and then use the 
 byItem serializer to get the records.
We also are writing SQL statements by hand, but I've found that's really nice to have a compile time check against the current db schemas. For example, actually I can do: --- string stat = public!`select recording_id from recordings where wrongcolname < 10`; ~/dlang/dmd-2.089.0/osx/bin/dmd -i -Isrc -g -unittest -debug -c -o- src/test.d /Users/pinver/dlang/dmd-2.089.0/osx/bin/../../src/phobos/std/exception.d(515): Error: uncaught CTFE exception object.Error("column 'wrongcolname' does not exist: 'select recording_id from recordings where wrongcolname < 10'") src/bkn/sql/semantic.d(75): called from here: <bigsnip> --- The struct representing the schema is created reading and processing a SQL file with the schema dump, dumping it once to a D module, but can be done also at compile time. That's pretty nice also using an IDE, for example I've configured VSCode to compile-on-save and I've cool red underlines under the wrong sql statements. The semantic pass is really basic right now, but I've plan to expand it.
Jan 09
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/9/20 10:36 AM, Paolo Invernizzi wrote:
 We also are writing SQL statements by hand, but I've found that's really 
 nice to have a compile time check against the current db schemas. For 
 example, actually I can do:
 
 ---
 string stat = public!`select recording_id from recordings where 
 wrongcolname < 10`;
 
   ~/dlang/dmd-2.089.0/osx/bin/dmd -i -Isrc -g -unittest -debug -c -o- 
 src/test.d
 
 /Users/pinver/dlang/dmd-2.089.0/osx/bin/../../src/phobos/std/exception.d(515): 
 Error: uncaught CTFE exception object.Error("column 'wrongcolname' does 
 not exist: 'select recording_id from recordings where wrongcolname < 10'")
 src/bkn/sql/semantic.d(75):        called from here: <bigsnip>
 ---
This is pretty cool. But of course, you need to have a SQL parser etc. in your code base to do this. I've considered making the SQL building code more robust. Right now, it just captures joins and conditionals (the joins are only included if they are used), but I'd like to identify columns for tables and use less strings and more compile-time checking. I'm really not too interested in parsing SQL in my application :) I like this library from Swift a lot (used in all my iOS projects), and have considered how D could do something similar: https://github.com/groue/GRDB.swift
 The struct representing the schema is created reading and processing a 
 SQL file with the schema dump, dumping it once to a D module, but can be 
 done also at compile time.
Haha, that's really cool. So basically if there's a chicken or an egg, no matter, you can deal with it ;)
 That's pretty nice also using an IDE, for example I've configured VSCode 
 to compile-on-save and I've cool red underlines under the wrong sql 
 statements.
 
 The semantic pass is really basic right now, but I've plan to expand it.
Excellent. I think there's really a lot of home-grown SQL-based libraries out there. We should have a showcase of all of them and figure out the best features. Dealing with databases is really in D's wheelhouse, and there are so many different ways to solve the problems. -Steve
Jan 09
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2020-01-07 22:01, Steven Schveighoffer wrote:
 I have a database serialization package that I use to fetch data from a 
 database. The column names are all identified or tagged appropriately, 
 so I just use:
 
 connection.query("SELECT * FROM user").byItem!User;
 
 and it serializes the data from the resulting rows into the struct User 
 type for consumption as a range.
 
 However, if there's a join, it poses a problem. Let's say I have an 
 Equipment item, which is owned by a user. I can select both the 
 equipment and the owner via:
 
 SELECT * FROM equipment left join user on (user.id = equipment.owner_id)
 
 However, this causes problems because I get duplicate columns (e.g. name 
 might be in both, or id). Which one goes with which?
IIRC Rails is using the fully qualified column name for everything. Instead of using "*" it enumerates all columns of, in your case, User. It will inspect User to figure out all the column names. You can easily do the same thing in D by inspecting the fields on User. -- /Jacob Carlborg
Jan 07
next sibling parent Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:
On Wednesday, 8 January 2020 at 07:19:07 UTC, Jacob Carlborg 
wrote:
 Instead of using "*" it enumerates all columns of, in your 
 case, User. It will inspect User to figure out all the column 
 names. You can easily do the same thing in D by inspecting the 
 fields on User.
This is also true for php doctrine orm and java hibernate orm, in basic cases. Both of them also add custom sql language that is working directly with objects in your application. Closest analogy in D is hibernated library. Maybe it is also worth trying it to get ideas on how best to map db results into an object graph. Best regards, Alexandru.
Jan 08
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/8/20 2:19 AM, Jacob Carlborg wrote:
 On 2020-01-07 22:01, Steven Schveighoffer wrote:
 I have a database serialization package that I use to fetch data from 
 a database. The column names are all identified or tagged 
 appropriately, so I just use:

 connection.query("SELECT * FROM user").byItem!User;

 and it serializes the data from the resulting rows into the struct 
 User type for consumption as a range.

 However, if there's a join, it poses a problem. Let's say I have an 
 Equipment item, which is owned by a user. I can select both the 
 equipment and the owner via:

 SELECT * FROM equipment left join user on (user.id = equipment.owner_id)

 However, this causes problems because I get duplicate columns (e.g. 
 name might be in both, or id). Which one goes with which?
IIRC Rails is using the fully qualified column name for everything. Instead of using "*" it enumerates all columns of, in your case, User. It will inspect User to figure out all the column names. You can easily do the same thing in D by inspecting the fields on User.
Yeah, this is generally the case when generating SQL from types. I like to write queries to make them the most efficient I can, without having to deal with quirks between ORM requirements and the data in the table (I'm dealing with an existing data layout, which does not lend itself always to ORM usage). -Steve
Jan 08
parent reply Jacob Carlborg <doob me.com> writes:
On 2020-01-09 00:03, Steven Schveighoffer wrote:

 Yeah, this is generally the case when generating SQL from types. I like 
 to write queries to make them the most efficient I can, without having 
 to deal with quirks between ORM requirements and the data in the table 
 (I'm dealing with an existing data layout, which does not lend itself 
 always to ORM usage).
Sure, but you don't need an ORM to do what I suggested. Something like this: connection.query("SELECT " ~ fieldsToSQL!User ~ " FROM user").byItem!User; Where "fieldsToSQL!User" would expand to: "user.first_name, user.last_name, user.age" Or whatever naming convention you have in the table. I'm sure you can come up with a better API. -- /Jacob Carlborg
Jan 10
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 1/10/20 3:32 PM, Jacob Carlborg wrote:
 On 2020-01-09 00:03, Steven Schveighoffer wrote:
 
 Yeah, this is generally the case when generating SQL from types. I 
 like to write queries to make them the most efficient I can, without 
 having to deal with quirks between ORM requirements and the data in 
 the table (I'm dealing with an existing data layout, which does not 
 lend itself always to ORM usage).
Sure, but you don't need an ORM to do what I suggested. Something like this: connection.query("SELECT " ~ fieldsToSQL!User ~ " FROM user").byItem!User; Where "fieldsToSQL!User" would expand to: "user.first_name, user.last_name, user.age" Or whatever naming convention you have in the table. I'm sure you can come up with a better API.
That's not the issue. The issue is they will come back as "first_name", "last_name", etc. If you have multiple tables with common column names, then you have an ambiguity, and cannot determine where each column goes. -Steve
Jan 10
parent Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
 On 1/10/20 3:32 PM, Jacob Carlborg wrote:
 On 2020-01-09 00:03, Steven Schveighoffer wrote:
 
 connection.query("SELECT " ~ fieldsToSQL!User ~ " FROM 
 user").byItem!User;
 
 Where "fieldsToSQL!User" would expand to:
 
 "user.first_name, user.last_name, user.age"
 
That's not the issue. The issue is they will come back as "first_name", "last_name", etc. If you have multiple tables with common column names, then you have an ambiguity, and cannot determine where each column goes. -Steve
What about : connection.query("SELECT " ~ fieldsToSQL!(User, Equipment) ~ " FROM
 user").byItem!User...;
Now all the ambiguity could be coded out by using 'as' where necessary or always. Seems like interesting thoughts to explore.
Jan 11