digitalmars.D - [OT] Idea to get all columns from multiple tables in DB

Steven Schveighoffer (41/41) Jan 07 2020 I have a database serialization package that I use to fetch data from a

Jesse Phillips (14/27) Jan 07 2020 I don't believe I've seen this pattern.

Steven Schveighoffer (22/34) Jan 08 2020 In my project, I only do generation of SQL for updating or inserting

Paolo Invernizzi (21/25) Jan 09 2020 We also are writing SQL statements by hand, but I've found that's

Steven Schveighoffer (18/42) Jan 09 2020 This is pretty cool. But of course, you need to have a SQL parser etc.

Jacob Carlborg (7/24) Jan 07 2020 IIRC Rails is using the fully qualified column name for everything.

Alexandru Ermicioi (10/14) Jan 08 2020 This is also true for php doctrine orm and java hibernate orm, in
Steven Schveighoffer (7/31) Jan 08 2020 Yeah, this is generally the case when generating SQL from types. I like

Jacob Carlborg (9/14) Jan 10 2020 Sure, but you don't need an ORM to do what I suggested. Something like t...

Steven Schveighoffer (5/26) Jan 10 2020 That's not the issue. The issue is they will come back as "first_name",

Jesse Phillips (6/22) Jan 11 2020 What about :

Steven Schveighoffer <schveiguy gmail.com> writes:

I have a database serialization package that I use to fetch data from a 
database. The column names are all identified or tagged appropriately, 
so I just use:

connection.query("SELECT * FROM user").byItem!User;

and it serializes the data from the resulting rows into the struct User 
type for consumption as a range.

However, if there's a join, it poses a problem. Let's say I have an 
Equipment item, which is owned by a user. I can select both the 
equipment and the owner via:

SELECT * FROM equipment left join user on (user.id = equipment.owner_id)

However, this causes problems because I get duplicate columns (e.g. name 
might be in both, or id). Which one goes with which?

I have a mechanism to deal with this. I can prefix each column name with 
a prefix, and then specify that on the byItem function with a uda-like 
directive:

connection.query("SELECT e.*, u.name AS user_name, u.id AS user_id, 
u.email AS user_email FROM ...").byItem!(Equipment, 
DB.colPrefix!"user_", User);

used like so:

foreach(equip, user; dbSequence) ...

But this requires you spell out every single column name in an extremely 
verbose manner, and you can't use that cool '*' selector which is 
somewhat future-proof.

But I realized I can put a "divider" in between the two table .* 
selectors, and then even though I have duplicate names, I can detect the 
divider and, and know which duplicate-named columns apply to which:

connection.query("SELECT e.*, 'user_' as __prefix, u.* FROM 
...").byItem!(Equipment, DB.colPrefix!"user_", User)

In my byItem function, when I'm mapping column names to actual fields at 
the beginning of iteration, I see a column named __prefix, and I use the 
value to prepend to all subsequent column names. And I think this will 
work beautifully!

Is this a technique that is used by anyone regularly? If so, what is it 
called? I'm very excited about this, because I want to minimize DB round 
trips, and also fetch related items as they come in, instead of 
serializing all of one type to an array and then fetching the rest with 
individual queries, but I didn't want to type out a lot of SQL (or 
generate it based on possibly stale struct field names), and have to 
maintain it any time I change something. This solves all the problems in 
a neat way.

-Steve

Jan 07 2020

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Tuesday, 7 January 2020 at 21:01:46 UTC, Steven Schveighoffer 
wrote:

 connection.query("SELECT e.*, 'user_' as __prefix, u.* FROM 
 ...").byItem!(Equipment, DB.colPrefix!"user_", User)

 [...]

 Is this a technique that is used by anyone regularly? If so, 
 what is it called? I'm very excited about this, because I want 
 to minimize DB round trips, and also fetch related items as 
 they come in, instead of serializing all of one type to an 
 array and then fetching the rest with individual queries, but I 
 didn't want to type out a lot of SQL (or generate it based on 
 possibly stale struct field names), and have to maintain it any 
 time I change something. This solves all the problems in a neat 
 way.

 -Steve

I don't believe I've seen this pattern.

What I realized after watching your talk is that I've generally 
married the structure to the query rather then the tables or data.

Select user.name, equip.product...

struct Ownership {
    string usersname;
    string product;
}

I this situation the assignment comes from position and not by 
column name. You lose the * as you mentioned.

What I have not done is thought about how our application differ 
for dealing with the data.

Jan 07 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/7/20 9:28 PM, Jesse Phillips wrote:
 What I realized after watching your talk is that I've generally married 
 the structure to the query rather then the tables or data.
 
 Select user.name, equip.product...
 
 struct Ownership {
     string usersname;
     string product;
 }
 
 I this situation the assignment comes from position and not by column 
 name. You lose the * as you mentioned.

In my project, I only do generation of SQL for updating or inserting 
rows. For fetching elements, I write the SQL (by hand or through a sql 
building system I wrote) and then use the byItem serializer to get the 
records.

Upon the first record, I use the column names in the list to map the 
field order to the column order. So I don't have to worry about columns 
being removed/added, etc. And then for each row fetched, everything is 
extremely fast because the column order/names don't change.

And yeah, I've done stuff like the above when I have for instance to 
fetch statistical data. But for the most part, I have one struct for 
each table row that I fetch. In my code, if you don't fetch all the 
columns, it just doesn't fill those in.

But I hadn't thought of a good way until now to rename the columns 
fetched so they can be serialized correctly when there are conflicting 
column names without spelling all the columns out (which I hate). In 
fact, I might change this later to avoid having to fetch the column 
prefix for every row (I only need it for the first row), but this 
technique is definitely more efficient than fetching all of one type 
first (probably in a single query), and then fetching each related item 
one at a time.

-Steve

Jan 08 2020

Paolo Invernizzi <paolo.invernizzi gmail.com> writes:

On Wednesday, 8 January 2020 at 23:01:14 UTC, Steven 
Schveighoffer wrote:

 In my project, I only do generation of SQL for updating or 
 inserting rows. For fetching elements, I write the SQL (by hand 
 or through a sql building system I wrote) and then use the 
 byItem serializer to get the records.

We also are writing SQL statements by hand, but I've found that's 
really nice to have a compile time check against the current db 
schemas. For example, actually I can do:

---
string stat = public!`select recording_id from recordings where 
wrongcolname < 10`;

  ~/dlang/dmd-2.089.0/osx/bin/dmd -i -Isrc -g -unittest -debug -c 
-o- src/test.d

/Users/pinver/dlang/dmd-2.089.0/osx/bin/../../src/phobos/std/exception.d(515):
Error: uncaught CTFE exception object.Error("column 'wrongcolname' does not
exist: 'select recording_id from recordings where wrongcolname < 10'")
src/bkn/sql/semantic.d(75):        called from here: <bigsnip>
---

The struct representing the schema is created reading and 
processing a SQL file with the schema dump, dumping it once to a 
D module, but can be done also at compile time.

That's pretty nice also using an IDE, for example I've configured 
VSCode to compile-on-save and I've cool red underlines under the 
wrong sql statements.

The semantic pass is really basic right now, but I've plan to 
expand it.

Jan 09 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/9/20 10:36 AM, Paolo Invernizzi wrote:
 We also are writing SQL statements by hand, but I've found that's really 
 nice to have a compile time check against the current db schemas. For 
 example, actually I can do:
 
 ---
 string stat = public!`select recording_id from recordings where 
 wrongcolname < 10`;
 
   ~/dlang/dmd-2.089.0/osx/bin/dmd -i -Isrc -g -unittest -debug -c -o- 
 src/test.d
 
 /Users/pinver/dlang/dmd-2.089.0/osx/bin/../../src/phobos/std/exception.d(515): 
 Error: uncaught CTFE exception object.Error("column 'wrongcolname' does 
 not exist: 'select recording_id from recordings where wrongcolname < 10'")
 src/bkn/sql/semantic.d(75):        called from here: <bigsnip>
 ---

This is pretty cool. But of course, you need to have a SQL parser etc. 
in your code base to do this.

I've considered making the SQL building code more robust. Right now, it 
just captures joins and conditionals (the joins are only included if 
they are used), but I'd like to identify columns for tables and use less 
strings and more compile-time checking. I'm really not too interested in 
parsing SQL in my application :)

I like this library from Swift a lot (used in all my iOS projects), and 
have considered how D could do something similar: 
https://github.com/groue/GRDB.swift

 The struct representing the schema is created reading and processing a 
 SQL file with the schema dump, dumping it once to a D module, but can be 
 done also at compile time.

Haha, that's really cool. So basically if there's a chicken or an egg, 
no matter, you can deal with it ;)

 That's pretty nice also using an IDE, for example I've configured VSCode 
 to compile-on-save and I've cool red underlines under the wrong sql 
 statements.
 
 The semantic pass is really basic right now, but I've plan to expand it.

Excellent. I think there's really a lot of home-grown SQL-based 
libraries out there. We should have a showcase of all of them and figure 
out the best features. Dealing with databases is really in D's 
wheelhouse, and there are so many different ways to solve the problems.

-Steve

Jan 09 2020

Jacob Carlborg <doob me.com> writes:

On 2020-01-07 22:01, Steven Schveighoffer wrote:
 I have a database serialization package that I use to fetch data from a 
 database. The column names are all identified or tagged appropriately, 
 so I just use:
 
 connection.query("SELECT * FROM user").byItem!User;
 
 and it serializes the data from the resulting rows into the struct User 
 type for consumption as a range.
 
 However, if there's a join, it poses a problem. Let's say I have an 
 Equipment item, which is owned by a user. I can select both the 
 equipment and the owner via:
 
 SELECT * FROM equipment left join user on (user.id = equipment.owner_id)
 
 However, this causes problems because I get duplicate columns (e.g. name 
 might be in both, or id). Which one goes with which?

IIRC Rails is using the fully qualified column name for everything. 
Instead of using "*" it enumerates all columns of, in your case, User. 
It will inspect User to figure out all the column names. You can easily 
do the same thing in D by inspecting the fields on User.

-- 
/Jacob Carlborg

Jan 07 2020

Alexandru Ermicioi <alexandru.ermicioi gmail.com> writes:

On Wednesday, 8 January 2020 at 07:19:07 UTC, Jacob Carlborg 
wrote:
 Instead of using "*" it enumerates all columns of, in your 
 case, User. It will inspect User to figure out all the column 
 names. You can easily do the same thing in D by inspecting the 
 fields on User.

This is also true for php doctrine orm and java hibernate orm, in 
basic cases. Both of them also add custom sql language that is 
working directly with objects in your application. Closest 
analogy in D is hibernated library. Maybe it is also worth trying 
it to get ideas on how best to map db results into an object 
graph.

Best regards,
Alexandru.

Jan 08 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/8/20 2:19 AM, Jacob Carlborg wrote:
 On 2020-01-07 22:01, Steven Schveighoffer wrote:
 I have a database serialization package that I use to fetch data from 
 a database. The column names are all identified or tagged 
 appropriately, so I just use:

 connection.query("SELECT * FROM user").byItem!User;

 and it serializes the data from the resulting rows into the struct 
 User type for consumption as a range.

 However, if there's a join, it poses a problem. Let's say I have an 
 Equipment item, which is owned by a user. I can select both the 
 equipment and the owner via:

 SELECT * FROM equipment left join user on (user.id = equipment.owner_id)

 However, this causes problems because I get duplicate columns (e.g. 
 name might be in both, or id). Which one goes with which?

 
 IIRC Rails is using the fully qualified column name for everything. 
 Instead of using "*" it enumerates all columns of, in your case, User. 
 It will inspect User to figure out all the column names. You can easily 
 do the same thing in D by inspecting the fields on User.
 

Yeah, this is generally the case when generating SQL from types. I like 
to write queries to make them the most efficient I can, without having 
to deal with quirks between ORM requirements and the data in the table 
(I'm dealing with an existing data layout, which does not lend itself 
always to ORM usage).

-Steve

Jan 08 2020

Jacob Carlborg <doob me.com> writes:

On 2020-01-09 00:03, Steven Schveighoffer wrote:

 Yeah, this is generally the case when generating SQL from types. I like 
 to write queries to make them the most efficient I can, without having 
 to deal with quirks between ORM requirements and the data in the table 
 (I'm dealing with an existing data layout, which does not lend itself 
 always to ORM usage).

Sure, but you don't need an ORM to do what I suggested. Something like this:

connection.query("SELECT " ~ fieldsToSQL!User ~ " FROM user").byItem!User;

Where "fieldsToSQL!User" would expand to:

"user.first_name, user.last_name, user.age"

Or whatever naming convention you have in the table.

I'm sure you can come up with a better API.

-- 
/Jacob Carlborg

Jan 10 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/10/20 3:32 PM, Jacob Carlborg wrote:
 On 2020-01-09 00:03, Steven Schveighoffer wrote:
 
 Yeah, this is generally the case when generating SQL from types. I 
 like to write queries to make them the most efficient I can, without 
 having to deal with quirks between ORM requirements and the data in 
 the table (I'm dealing with an existing data layout, which does not 
 lend itself always to ORM usage).

 
 Sure, but you don't need an ORM to do what I suggested. Something like 
 this:
 
 connection.query("SELECT " ~ fieldsToSQL!User ~ " FROM user").byItem!User;
 
 Where "fieldsToSQL!User" would expand to:
 
 "user.first_name, user.last_name, user.age"
 
 Or whatever naming convention you have in the table.
 
 I'm sure you can come up with a better API.
 

That's not the issue. The issue is they will come back as "first_name", 
"last_name", etc. If you have multiple tables with common column names, 
then you have an ambiguity, and cannot determine where each column goes.

-Steve

Jan 10 2020

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

 On 1/10/20 3:32 PM, Jacob Carlborg wrote:
 On 2020-01-09 00:03, Steven Schveighoffer wrote:
 
 connection.query("SELECT " ~ fieldsToSQL!User ~ " FROM 
 user").byItem!User;
 
 Where "fieldsToSQL!User" would expand to:
 
 "user.first_name, user.last_name, user.age"
 

 That's not the issue. The issue is they will come back as 
 "first_name", "last_name", etc. If you have multiple tables 
 with common column names, then you have an ambiguity, and 
 cannot determine where each column goes.

 -Steve

What about :

connection.query("SELECT " ~ fieldsToSQL!(User, Equipment) ~ " 
FROM
 user").byItem!User...;


Now all the ambiguity could be coded out by using 'as' where 
necessary or always.

Seems like interesting thoughts to explore.

Jan 11 2020

D Programming

C/C++ Programming

Other

digitalmars.D - [OT] Idea to get all columns from multiple tables in DB