digitalmars.D.learn - Tips on TCP socket to postgresql middleware

Chris Piker (21/21) Feb 19 2022 Hi D

eugene (23/24) Feb 20 2022 Most people will probably say this is crazy,

Chris Piker (11/23) Feb 20 2022 Very interesting. I need to stand-up this program and two others

eugene (16/29) Feb 20 2022 I thougt about implementing my engine using fibers but...

Chris Piker (7/10) Feb 20 2022 The code is terse and clean, thanks for sharing :) I'm adverse

eugene (7/17) Feb 20 2022 Nice to hear, thnx! Actually I wrote D variant just to demonstrate

Chris Piker (7/12) Feb 22 2022 If you want to add this file (or similar) to your sources,

eugene (4/18) Feb 23 2022 ok
eugene (30/31) Feb 23 2022 As you might have been already noted,

Tejas (4/11) Feb 23 2022 I remember you once mentioned a book that you used to learn this

eugene (5/19) Feb 23 2022 Wagner F. et al

eugene (4/8) Feb 23 2022 Ah! I also have EDSM for bare metal (AVR8 to be exact)
Tejas (2/23) Feb 24 2022 Thank you so much!

eugene (13/20) Feb 24 2022 Also there is [very nice

=?UTF-8?Q?Ali_=c3=87ehreli?= (18/20) Feb 20 2022 I use the exact scenario that you describe: Multiple threads process

Chris Piker (7/11) Feb 20 2022 Message sizes and rates are relatively well know so it will be

eugene (20/23) Feb 20 2022 If I get it right you want to restore connection

Chris Piker (4/8) Feb 20 2022 Ah, a very handy tip. It would be convoluted to multiplex

eugene (5/13) Feb 20 2022 I am remembering psql client behavior - it sees notifications

eugene (50/51) Feb 24 2022 I've remembered one not so obvious feature of TCP sockets

eugene (11/19) Feb 24 2022 https://stackoverflow.com/questions/26130010/avoiding-dataloss-in-go-whe...

eugene (5/14) Feb 24 2022 errhh... usual **blocking** sockets, of course.

Chris Piker <chris hoopjump.com> writes:

Hi D

I'm about to start a small program to whose job is:

1. Connect to a server over a TCP socket
2. Read a packetized real-time data stream
3. Update/insert to a postgresql database as data arrive.

In general it should buffer data in RAM to avoid exerting back 
pressure on the input socket and to allow for dropped connections 
to the PG database.  Since the data stream is at most 1.5 
megabits/sec (not bytes) I can buffer for quite some time before 
running out of space.

So far, I've never written a multi-threaded D program where one 
thread writes a FIFO and the other reads it so I'm reading the 
last few chapters of Ali Cehreli's book as background.  On top of 
that preparation, I'm looking for:
   * general tips on which libraries to examine
   * gotchas you've run into in your multi-threaded (or just 
concurrent) programs,
   * other resources to consult
etc.

Thanks for any advice you want to share.

Best,

Feb 19 2022

eugene <dee0xeed gmail.com> writes:

On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:
   * general tips on which libraries to examine

Most people will probably say this is crazy,
but as to PG, one can do without libraries.

I am doing so during years (in C, not D) and
did not expierienced extremely complex troubles.
I mean I do not use libpq  - instead I implement some
subset of the protocol, which is needed for particular program.

What I do not like in all these libs for working
with widely used services (postgres, redis etc) is
the fact that they all hide inside them i/o stuff,
including TCP-connect.

Why have connect() in each library?
It is universal thing, as well as read() and write().
If I want several connection to DBMS in a program,
libraries like libpq compel me to use multithreading.

But what if I want to do many-many-many things concurrently
in a single thread?

Usually I design more or less complex (network) programs using
event-driven paradigm (reactor pattern) plus state machines.
In other words programs designed this way are, so to say,
hierarchical team of state machines, interacting with
each other as well as with outer world (signals,
timers, events from sockets etc)

Feb 20 2022

Chris Piker <chris hoopjump.com> writes:

On Sunday, 20 February 2022 at 15:20:17 UTC, eugene wrote:
 Most people will probably say this is crazy,
 but as to PG, one can do without libraries.

 I am doing so during years (in C, not D) and
 did not expierienced extremely complex troubles.
 I mean I do not use libpq  - instead I implement some
 subset of the protocol, which is needed for particular program.

Very interesting.  I need to stand-up this program and two others 
in one week, so it looks like dpq2 and message passing is the 
good short term solution to reduce implementation effort.  But I 
would like to return to your idea in a couple months so that I 
can try a fiber based implementation instead.

 Usually I design more or less complex (network) programs using
 event-driven paradigm (reactor pattern) plus state machines.
 In other words programs designed this way are, so to say,
 hierarchical team of state machines, interacting with
 each other as well as with outer world (signals,
 timers, events from sockets etc)

It sounds like you might have a rigorous way of defining and 
keeping track of your state machines.  I could probably learn 
quite a bit from reading your source code, or the source for 
similarly implemented programs.  Are there examples you would 
recommend?

Feb 20 2022

eugene <dee0xeed gmail.com> writes:

On Sunday, 20 February 2022 at 16:55:44 UTC, Chris Piker wrote:
 But I would like to return to your idea in a couple months so 
 that I can try a fiber based implementation instead.

I thougt about implementing my engine using fibers but...
it seemed to me they are not very convinient because
coroutines yield returns to the caller, but I want
to return to a single event loop (after processing an event).

 Usually I design more or less complex (network) programs using
 event-driven paradigm (reactor pattern) plus state machines.
 In other words programs designed this way are, so to say,
 hierarchical team of state machines, interacting with
 each other as well as with outer world (signals,
 timers, events from sockets etc)

 It sounds like you might have a rigorous way of defining and 
 keeping track of your state machines.  I could probably learn 
 quite a bit from reading your source code, or the source for 
 similarly implemented programs.  Are there examples you would 
 recommend?

Yes, here is my engine with example (echo client/server pair):

- [In C (for 
Linux)](http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gz)
- [In D (for Linux & 
FreeBSD)](http://zed.karelia.ru/0/e/edsm-2022-02-20.tar.gz)

edsm = 'event driven state machines'

As to the program you are writing - I wrote a couple of dozens of 
programs
more or less similar to what you are going to do (data 
acqusition) using the engine above (C) for production systems and 
they all serve very well.

Feb 20 2022

Chris Piker <chris hoopjump.com> writes:

On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:

 Yes, here is my engine with example (echo client/server pair):

 - [In D (for Linux & 
 FreeBSD)](http://zed.karelia.ru/0/e/edsm-2022-02-20.tar.gz)

The code is terse and clean, thanks for sharing :)  I'm adverse 
to reading it closely since there was no license file and don't 
want to accidentally violate copyright.

I noticed there were no dub files in the package.  Not surprised. 
  Dub is such a restrictive tool compared to say, setup.py/.cfg in 
python.

Feb 20 2022

eugene <dee0xeed gmail.com> writes:

On Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:
 On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:

 Yes, here is my engine with example (echo client/server pair):

 - [In D (for Linux & FreeBSD)]

 The code is terse and clean, thanks for sharing :)

Nice to hear, thnx! Actually I wrote D variant just to demonstrate
the idea to my collegue who is OOP/py guy and for whom it was hard
to understand C code.

 I'm adverse to reading it closely since there was no license 
 file and don't want to accidentally violate copyright.

:) I think WTFPL will do :)

 I noticed there were no dub files in the package.  Not 
 surprised.  Dub is such a restrictive tool compared to say, 
 setup.py/.cfg in python.

I have very little experience in D and did not think about dub at 
all.

Feb 20 2022

Chris Piker <chris hoopjump.com> writes:

On Monday, 21 February 2022 at 07:00:52 UTC, eugene wrote:
 On Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:
 On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:
 I'm adverse to reading it closely since there was no license 
 file and don't want to accidentally violate copyright.

 :) I think WTFPL will do :)

If you want to add this file (or similar) to your sources, 
http://www.wtfpl.net/txt/copying/, but with your name in the 
copyright line I'd be happy to put the D version through it's 
paces and credit you for the basic ideas.  I would not have 
thought of this formalism on my own (at least not right away) and 
want to give credit where it's due.

Feb 22 2022

eugene <dee0xeed gmail.com> writes:

On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:
 On Monday, 21 February 2022 at 07:00:52 UTC, eugene wrote:
 On Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:
 On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:
 I'm adverse to reading it closely since there was no license 
 file and don't want to accidentally violate copyright.

 :) I think WTFPL will do :)

 If you want to add this file (or similar) to your sources, 
 http://www.wtfpl.net/txt/copying/, but with your name in the 
 copyright line I'd be happy to put the D version through it's 
 paces and credit you for the basic ideas.  I would not have 
 thought of this formalism on my own (at least not right away) 
 and want to give credit where it's due.

ok
http://zed.karelia.ru/0/e/edsm-2022-02-23.tar.gz
added 2 files, AUTHOR & LICENSE

Feb 23 2022

eugene <dee0xeed gmail.com> writes:

On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:
 credit you for the basic ideas

As you might have been already noted,
the key idea is to implement SM explicitly,
i.e we have states, messages, actions, transitions
and extremely simple engine (reactTo() method)

Switch-based implementation of SMs is probably normal
for text/token driven SMs (parsers),
but is not good for event/message driven SMs (i/o and alike).

Remember main OOP principle, as it is understood by Alan Key?
It is message exchanging, not class hierarchy.
In this sense this craft is OOP twice, both classes and
message exchanging.

Another important point is SM decomposition & hierarchy
(I mean master/slave (or customer/provider) relation here,
not 'A is B' relation) - instead of having one huge SM
I decompose the task into subtasks and construct various SMs.
See for example that echo-server - it has 3 level hierarchy
of states machines:

LISTENER <-> {WORKERS} <-> {RX,TX}

When I wrote the very first version of EDSM engine
(more than 5 years ago, may be 7 or so), I...
I was just stuck - how to design machines themselves?!?
Well, the engine is simple, but what next? How do I choose states?
After a while I came to a strong rule - as long as I
want to send/recv some "atomic" portion of data,
it is a state (called stage in D version)!
Then as a result of elimination of code duplcation
I "invented" kinda universal RX and TX machine and so on...

In general I've found SM very nice way of designing
(networking) software. Enjoy! :)

Feb 23 2022

Tejas <notrealemail gmail.com> writes:

On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:
 On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:
 [...]

 As you might have been already noted,
 the key idea is to implement SM explicitly,
 i.e we have states, messages, actions, transitions
 and extremely simple engine (reactTo() method)

 [...]

I remember you once mentioned a book that you used to learn this 
particular software design technique

Could you please name it again? Thank you in advance

Feb 23 2022

eugene <dee0xeed gmail.com> writes:

On Thursday, 24 February 2022 at 06:30:51 UTC, Tejas wrote:
 On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:
 On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker 
 wrote:
 [...]

 As you might have been already noted,
 the key idea is to implement SM explicitly,
 i.e we have states, messages, actions, transitions
 and extremely simple engine (reactTo() method)

 [...]

 I remember you once mentioned a book that you used to learn 
 this particular software design technique
 Could you please name it again? Thank you in advance

Wagner F. et al
Modeling Software with Finite State Machines: A Practical Approach

I've adopted some ideas from this book to POSIX/Linux API.
see also http://www.stateworks.com/

Feb 23 2022

eugene <dee0xeed gmail.com> writes:

On Thursday, 24 February 2022 at 06:54:07 UTC, eugene wrote:
 Wagner F. et al
 Modeling Software with Finite State Machines: A Practical 
 Approach

 I've adopted some ideas from this book to POSIX/Linux API.

Ah! I also have EDSM for bare metal (AVR8 to be exact)
There is [some 
description](http://zed.karelia.ru/go.to/for.all/software/avr8-edsm) (in
Russian), but you can look to the
[C-source](http://zed.karelia.ru/mmedia/bin/avr8-edsm-r0.tar.gz)

Feb 23 2022

Tejas <notrealemail gmail.com> writes:

On Thursday, 24 February 2022 at 06:54:07 UTC, eugene wrote:
 On Thursday, 24 February 2022 at 06:30:51 UTC, Tejas wrote:
 On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:
 On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker 
 wrote:
 [...]

 As you might have been already noted,
 the key idea is to implement SM explicitly,
 i.e we have states, messages, actions, transitions
 and extremely simple engine (reactTo() method)

 [...]

 I remember you once mentioned a book that you used to learn 
 this particular software design technique
 Could you please name it again? Thank you in advance

 Wagner F. et al
 Modeling Software with Finite State Machines: A Practical 
 Approach

 I've adopted some ideas from this book to POSIX/Linux API.
 see also http://www.stateworks.com/

Thank you so much!

Feb 24 2022

eugene <dee0xeed gmail.com> writes:

On Thursday, 24 February 2022 at 11:27:56 UTC, Tejas wrote:
 Wagner F. et al
 Modeling Software with Finite State Machines: A Practical 
 Approach

 I've adopted some ideas from this book to POSIX/Linux API.
 see also http://www.stateworks.com/

 Thank you so much!

Also there is [very nice 
discussion](https://embeddedgurus.com/state-space/2011/06/protothreads-versus-state-machines/)

I'll just cite:

"
Pure event-driven programming (without blocking) naturally 
partitions the code into small chunks that handle events. State 
machines partition the code even finer, because you have small 
chunks that are called only for a specific state-event 
combination.
"

This is exactly about what I've mentioned:
event-driven + state-machines = fine-graned-concurrency

Feb 24 2022

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 2/19/22 12:13, Chris Piker wrote:

    * gotchas you've run into in your multi-threaded (or just concurrent)
 programs,

I use the exact scenario that you describe: Multiple threads process 
data and pass the results to a "writer" thread that persist it in a file.

The main gotcha is your thread disappearing without a trace. The most 
common reason is it throws an exception and dies.

Another one is to set the message box sizes to throttle. Otherwise, 
producers could produce more than the available memory before the 
consumer could consume it. Unlike the main thread, there is nobody to 
catch an report this "uncaught" exception.

   https://dlang.org/phobos/std_concurrency.html#.setMaxMailboxSize

You need to experiment with different number of threads, the buffer size 
that you mention, different lengths of message boxes, etc. For example, 
I could not gain more benefit in my program beyond 3 threads (but still 
set the number to 4 :p Humans are crazy.).

In case you haven't seen yet, the recipe for std.concurrency that works 
for me is summarized here:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1735s

Ali

Feb 20 2022

Chris Piker <chris hoopjump.com> writes:

On Sunday, 20 February 2022 at 17:58:41 UTC, Ali Çehreli wrote:

 Another one is to set the message box sizes to throttle.

Message sizes and rates are relatively well know so it will be
easy to pick a throttle point that's unlikely to backup the
source yet provide for some quick DB maintenance in the middle
of a testing session.

 In case you haven't seen yet, the recipe for std.concurrency 
 that works for me is summarized here:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1735s

Thanks!  I like your simple exception wrapping pattern, will use 
that.

Feb 20 2022

eugene <dee0xeed gmail.com> writes:

On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:
 In general it should buffer data in RAM to avoid exerting back 
 pressure on the input socket and to allow for dropped 
 connections to the PG database.

If I get it right you want to restore connection
if it was closed by server for some reason.

I use special SM for that purpose, see [this 
picture](http://zed.karelia.ru/0/e/db-link.jpg)

In each state where this SM has to send/recv data, it takes
sending/receiving SM from a pool and commands them to
perform the task. Upon reaching IDLE state this machine
send some messsge to the user (another SM) of the connection
and seats in this state until the user detects connection lost
(in which case it sends M2 to DB-LINK SM). Then DB-LINK
goes to WAIT state, where it starts a timer and when it expires,
it goes to CONN state, where it tries to reconnect (using
sending SM - when connection is ready we get POLLOUT on socket).

You can have as many such connectors as you want,
so you have multiple connections within single thread.

I often use two connections, one for perform main task
(upload some data and alike) and the second for getting
notifications from PG, 'cause it very incovinient to
do both in a single connection.

Feb 20 2022

Chris Piker <chris hoopjump.com> writes:

On Sunday, 20 February 2022 at 18:36:21 UTC, eugene wrote:

 I often use two connections, one for perform main task
 (upload some data and alike) and the second for getting
 notifications from PG, 'cause it very incovinient to
 do both in a single connection.

Ah, a very handy tip.  It would be convoluted to multiplex 
notifications
on the data connection.

Feb 20 2022

eugene <dee0xeed gmail.com> writes:

On Monday, 21 February 2022 at 04:48:56 UTC, Chris Piker wrote:
 On Sunday, 20 February 2022 at 18:36:21 UTC, eugene wrote:

 I often use two connections, one for perform main task
 (upload some data and alike) and the second for getting
 notifications from PG, 'cause it very incovinient to
 do both in a single connection.

 Ah, a very handy tip.  It would be convoluted to multiplex 
 notifications
 on the data connection.

I am remembering psql client behavior - it sees notifications 
only after some request. It is really inconvinient to perform 
regular tasks and be ready to peek up notifications at any moment 
in one connection.

Feb 20 2022

eugene <dee0xeed gmail.com> writes:

On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:
 3. Update/insert to a postgresql database as data arrive.

I've remembered one not so obvious feature of TCP sockets 
behavour.
If the connection is closed on the server side
(i.e. on the client side the socket is in CLOSE_WAIT state),
**write() will not fail**, and data will go to no nowhere.

For this reason I have this (commented) code:

```d
     bool connOk() {
         /*
         tcp_info tcpi; // linux specific, see 
/usr/include/linux/tcp.h
         socklen_t tcpi_len = tcpi.sizeof;

         getsockopt(id, SOL_TCP, TCP_INFO, &tcpi, &tcpi_len);
         if (tcpi.tcpi_state != TCP_ESTABLISHED)
             return false;
         */
         return true; // TODO
     }
```

I've dig up this in one of my progs (uploader to pg db):

```c
/* check conn */
int dbu_psgr_ckcn_enter(struct edsm *me)
{
         struct mexprxtx_data *ctx = me->data;
         struct databuffer *ob = &ctx->obuf;
         struct databuffer *sb = &ctx->sbuf;
         int est = 0;

         if (ctx->sock > 0)
                 est = csock_check_state(ctx->sock);
         if (!est) {
                 /* save request to spare buffer */
                 __dbu_make_request_copy(sb, ob);
                 if (ctx->sock > 0)
                         log_inf_msg(
                          "%s()/DBU_%.3d - server has closed 
connection (fd %d)\n",
                          __func__, me->number, ctx->sock
                         );
                 edsm_put_event(me, ECM1_CONN);
         } else {
                 edsm_put_event(me, ECM0_SEND);
         }
         return 0;
}
```

I definitely did not want to loose data and so I am checking 
socket state before doing an INSERT request. I do not think it is 
100% reliable, but I could not invent anything else.

Feb 24 2022

eugene <dee0xeed gmail.com> writes:

On Thursday, 24 February 2022 at 08:46:35 UTC, eugene wrote:
 On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker 
 wrote:
 3. Update/insert to a postgresql database as data arrive.

 I've remembered one not so obvious feature of TCP sockets 
 behavour.
 If the connection is closed on the server side
 (i.e. on the client side the socket is in CLOSE_WAIT state),
 **write() will not fail**, and data will go to no nowhere.

https://stackoverflow.com/questions/26130010/avoiding-dataloss-in-go-when-writing-with-close-wait-socket

In my case (I was working with REDIS KVS at the moment)
exact scenario was as follows:

* prog gets EPOLLOUT (write() won't block)
* prog writes()'s data to REDIS ("successfully")
* prog gets EPOLLERR|EPOLLHUP

After this I see that I need to reconnect, but I've already send
some data which had gone to nowhere.

People sometimes recommend using usual non-blocking sockets,
but it is not the case.

Feb 24 2022

eugene <dee0xeed gmail.com> writes:

On Thursday, 24 February 2022 at 09:11:01 UTC, eugene wrote:
 In my case (I was working with REDIS KVS at the moment)
 exact scenario was as follows:

 * prog gets EPOLLOUT (write() won't block)
 * prog writes()'s data to REDIS ("successfully")
 * prog gets EPOLLERR|EPOLLHUP

 After this I see that I need to reconnect, but I've already send
 some data which had gone to nowhere.

 People sometimes recommend using usual non-blocking sockets,
 but it is not the case.

errhh... usual **blocking** sockets, of course.
however then I have to deal with multiprocess/multithread
architechture, but my goal was *fine-graned* concurrency
within single process.

Feb 24 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Tips on TCP socket to postgresql middleware