www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Tips on TCP socket to postgresql middleware

reply Chris Piker <chris hoopjump.com> writes:
Hi D

I'm about to start a small program to whose job is:

1. Connect to a server over a TCP socket
2. Read a packetized real-time data stream
3. Update/insert to a postgresql database as data arrive.

In general it should buffer data in RAM to avoid exerting back 
pressure on the input socket and to allow for dropped connections 
to the PG database.  Since the data stream is at most 1.5 
megabits/sec (not bytes) I can buffer for quite some time before 
running out of space.

So far, I've never written a multi-threaded D program where one 
thread writes a FIFO and the other reads it so I'm reading the 
last few chapters of Ali Cehreli's book as background.  On top of 
that preparation, I'm looking for:
   * general tips on which libraries to examine
   * gotchas you've run into in your multi-threaded (or just 
concurrent) programs,
   * other resources to consult
etc.

Thanks for any advice you want to share.

Best,
Feb 19 2022
next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:
   * general tips on which libraries to examine
Most people will probably say this is crazy, but as to PG, one can do without libraries. I am doing so during years (in C, not D) and did not expierienced extremely complex troubles. I mean I do not use libpq - instead I implement some subset of the protocol, which is needed for particular program. What I do not like in all these libs for working with widely used services (postgres, redis etc) is the fact that they all hide inside them i/o stuff, including TCP-connect. Why have connect() in each library? It is universal thing, as well as read() and write(). If I want several connection to DBMS in a program, libraries like libpq compel me to use multithreading. But what if I want to do many-many-many things concurrently in a single thread? Usually I design more or less complex (network) programs using event-driven paradigm (reactor pattern) plus state machines. In other words programs designed this way are, so to say, hierarchical team of state machines, interacting with each other as well as with outer world (signals, timers, events from sockets etc)
Feb 20 2022
parent reply Chris Piker <chris hoopjump.com> writes:
On Sunday, 20 February 2022 at 15:20:17 UTC, eugene wrote:
 Most people will probably say this is crazy,
 but as to PG, one can do without libraries.

 I am doing so during years (in C, not D) and
 did not expierienced extremely complex troubles.
 I mean I do not use libpq  - instead I implement some
 subset of the protocol, which is needed for particular program.
Very interesting. I need to stand-up this program and two others in one week, so it looks like dpq2 and message passing is the good short term solution to reduce implementation effort. But I would like to return to your idea in a couple months so that I can try a fiber based implementation instead.
 Usually I design more or less complex (network) programs using
 event-driven paradigm (reactor pattern) plus state machines.
 In other words programs designed this way are, so to say,
 hierarchical team of state machines, interacting with
 each other as well as with outer world (signals,
 timers, events from sockets etc)
It sounds like you might have a rigorous way of defining and keeping track of your state machines. I could probably learn quite a bit from reading your source code, or the source for similarly implemented programs. Are there examples you would recommend?
Feb 20 2022
parent reply eugene <dee0xeed gmail.com> writes:
On Sunday, 20 February 2022 at 16:55:44 UTC, Chris Piker wrote:
 But I would like to return to your idea in a couple months so 
 that I can try a fiber based implementation instead.
I thougt about implementing my engine using fibers but... it seemed to me they are not very convinient because coroutines yield returns to the caller, but I want to return to a single event loop (after processing an event).
 Usually I design more or less complex (network) programs using
 event-driven paradigm (reactor pattern) plus state machines.
 In other words programs designed this way are, so to say,
 hierarchical team of state machines, interacting with
 each other as well as with outer world (signals,
 timers, events from sockets etc)
It sounds like you might have a rigorous way of defining and keeping track of your state machines. I could probably learn quite a bit from reading your source code, or the source for similarly implemented programs. Are there examples you would recommend?
Yes, here is my engine with example (echo client/server pair): - [In C (for Linux)](http://zed.karelia.ru/mmedia/bin/edsm-g2-rev-h.tar.gz) - [In D (for Linux & FreeBSD)](http://zed.karelia.ru/0/e/edsm-2022-02-20.tar.gz) edsm = 'event driven state machines' As to the program you are writing - I wrote a couple of dozens of programs more or less similar to what you are going to do (data acqusition) using the engine above (C) for production systems and they all serve very well.
Feb 20 2022
parent reply Chris Piker <chris hoopjump.com> writes:
On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:

 Yes, here is my engine with example (echo client/server pair):

 - [In D (for Linux & 
 FreeBSD)](http://zed.karelia.ru/0/e/edsm-2022-02-20.tar.gz)
The code is terse and clean, thanks for sharing :) I'm adverse to reading it closely since there was no license file and don't want to accidentally violate copyright. I noticed there were no dub files in the package. Not surprised. Dub is such a restrictive tool compared to say, setup.py/.cfg in python.
Feb 20 2022
parent reply eugene <dee0xeed gmail.com> writes:
On Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:
 On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:

 Yes, here is my engine with example (echo client/server pair):

 - [In D (for Linux & FreeBSD)]
The code is terse and clean, thanks for sharing :)
Nice to hear, thnx! Actually I wrote D variant just to demonstrate the idea to my collegue who is OOP/py guy and for whom it was hard to understand C code.
 I'm adverse to reading it closely since there was no license 
 file and don't want to accidentally violate copyright.
:) I think WTFPL will do :)
 I noticed there were no dub files in the package.  Not 
 surprised.  Dub is such a restrictive tool compared to say, 
 setup.py/.cfg in python.
I have very little experience in D and did not think about dub at all.
Feb 20 2022
parent reply Chris Piker <chris hoopjump.com> writes:
On Monday, 21 February 2022 at 07:00:52 UTC, eugene wrote:
 On Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:
 On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:
 I'm adverse to reading it closely since there was no license 
 file and don't want to accidentally violate copyright.
:) I think WTFPL will do :)
If you want to add this file (or similar) to your sources, http://www.wtfpl.net/txt/copying/, but with your name in the copyright line I'd be happy to put the D version through it's paces and credit you for the basic ideas. I would not have thought of this formalism on my own (at least not right away) and want to give credit where it's due.
Feb 22 2022
next sibling parent eugene <dee0xeed gmail.com> writes:
On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:
 On Monday, 21 February 2022 at 07:00:52 UTC, eugene wrote:
 On Monday, 21 February 2022 at 04:46:53 UTC, Chris Piker wrote:
 On Sunday, 20 February 2022 at 18:00:26 UTC, eugene wrote:
 I'm adverse to reading it closely since there was no license 
 file and don't want to accidentally violate copyright.
:) I think WTFPL will do :)
If you want to add this file (or similar) to your sources, http://www.wtfpl.net/txt/copying/, but with your name in the copyright line I'd be happy to put the D version through it's paces and credit you for the basic ideas. I would not have thought of this formalism on my own (at least not right away) and want to give credit where it's due.
ok http://zed.karelia.ru/0/e/edsm-2022-02-23.tar.gz added 2 files, AUTHOR & LICENSE
Feb 23 2022
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:
 credit you for the basic ideas
As you might have been already noted, the key idea is to implement SM explicitly, i.e we have states, messages, actions, transitions and extremely simple engine (reactTo() method) Switch-based implementation of SMs is probably normal for text/token driven SMs (parsers), but is not good for event/message driven SMs (i/o and alike). Remember main OOP principle, as it is understood by Alan Key? It is message exchanging, not class hierarchy. In this sense this craft is OOP twice, both classes and message exchanging. Another important point is SM decomposition & hierarchy (I mean master/slave (or customer/provider) relation here, not 'A is B' relation) - instead of having one huge SM I decompose the task into subtasks and construct various SMs. See for example that echo-server - it has 3 level hierarchy of states machines: LISTENER <-> {WORKERS} <-> {RX,TX} When I wrote the very first version of EDSM engine (more than 5 years ago, may be 7 or so), I... I was just stuck - how to design machines themselves?!? Well, the engine is simple, but what next? How do I choose states? After a while I came to a strong rule - as long as I want to send/recv some "atomic" portion of data, it is a state (called stage in D version)! Then as a result of elimination of code duplcation I "invented" kinda universal RX and TX machine and so on... In general I've found SM very nice way of designing (networking) software. Enjoy! :)
Feb 23 2022
parent reply Tejas <notrealemail gmail.com> writes:
On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:
 On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker wrote:
 [...]
As you might have been already noted, the key idea is to implement SM explicitly, i.e we have states, messages, actions, transitions and extremely simple engine (reactTo() method) [...]
I remember you once mentioned a book that you used to learn this particular software design technique Could you please name it again? Thank you in advance
Feb 23 2022
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 24 February 2022 at 06:30:51 UTC, Tejas wrote:
 On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:
 On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker 
 wrote:
 [...]
As you might have been already noted, the key idea is to implement SM explicitly, i.e we have states, messages, actions, transitions and extremely simple engine (reactTo() method) [...]
I remember you once mentioned a book that you used to learn this particular software design technique Could you please name it again? Thank you in advance
Wagner F. et al Modeling Software with Finite State Machines: A Practical Approach I've adopted some ideas from this book to POSIX/Linux API. see also http://www.stateworks.com/
Feb 23 2022
next sibling parent eugene <dee0xeed gmail.com> writes:
On Thursday, 24 February 2022 at 06:54:07 UTC, eugene wrote:
 Wagner F. et al
 Modeling Software with Finite State Machines: A Practical 
 Approach

 I've adopted some ideas from this book to POSIX/Linux API.
Ah! I also have EDSM for bare metal (AVR8 to be exact) There is [some description](http://zed.karelia.ru/go.to/for.all/software/avr8-edsm) (in Russian), but you can look to the [C-source](http://zed.karelia.ru/mmedia/bin/avr8-edsm-r0.tar.gz)
Feb 23 2022
prev sibling parent reply Tejas <notrealemail gmail.com> writes:
On Thursday, 24 February 2022 at 06:54:07 UTC, eugene wrote:
 On Thursday, 24 February 2022 at 06:30:51 UTC, Tejas wrote:
 On Wednesday, 23 February 2022 at 09:34:56 UTC, eugene wrote:
 On Tuesday, 22 February 2022 at 20:19:39 UTC, Chris Piker 
 wrote:
 [...]
As you might have been already noted, the key idea is to implement SM explicitly, i.e we have states, messages, actions, transitions and extremely simple engine (reactTo() method) [...]
I remember you once mentioned a book that you used to learn this particular software design technique Could you please name it again? Thank you in advance
Wagner F. et al Modeling Software with Finite State Machines: A Practical Approach I've adopted some ideas from this book to POSIX/Linux API. see also http://www.stateworks.com/
Thank you so much!
Feb 24 2022
parent eugene <dee0xeed gmail.com> writes:
On Thursday, 24 February 2022 at 11:27:56 UTC, Tejas wrote:
 Wagner F. et al
 Modeling Software with Finite State Machines: A Practical 
 Approach

 I've adopted some ideas from this book to POSIX/Linux API.
 see also http://www.stateworks.com/
Thank you so much!
Also there is [very nice discussion](https://embeddedgurus.com/state-space/2011/06/protothreads-versus-state-machines/) I'll just cite: " Pure event-driven programming (without blocking) naturally partitions the code into small chunks that handle events. State machines partition the code even finer, because you have small chunks that are called only for a specific state-event combination. " This is exactly about what I've mentioned: event-driven + state-machines = fine-graned-concurrency
Feb 24 2022
prev sibling next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 2/19/22 12:13, Chris Piker wrote:

    * gotchas you've run into in your multi-threaded (or just concurrent)
 programs,
I use the exact scenario that you describe: Multiple threads process data and pass the results to a "writer" thread that persist it in a file. The main gotcha is your thread disappearing without a trace. The most common reason is it throws an exception and dies. Another one is to set the message box sizes to throttle. Otherwise, producers could produce more than the available memory before the consumer could consume it. Unlike the main thread, there is nobody to catch an report this "uncaught" exception. https://dlang.org/phobos/std_concurrency.html#.setMaxMailboxSize You need to experiment with different number of threads, the buffer size that you mention, different lengths of message boxes, etc. For example, I could not gain more benefit in my program beyond 3 threads (but still set the number to 4 :p Humans are crazy.). In case you haven't seen yet, the recipe for std.concurrency that works for me is summarized here: https://www.youtube.com/watch?v=dRORNQIB2wA&t=1735s Ali
Feb 20 2022
parent Chris Piker <chris hoopjump.com> writes:
On Sunday, 20 February 2022 at 17:58:41 UTC, Ali Çehreli wrote:

 Another one is to set the message box sizes to throttle.
Message sizes and rates are relatively well know so it will be easy to pick a throttle point that's unlikely to backup the source yet provide for some quick DB maintenance in the middle of a testing session.
 In case you haven't seen yet, the recipe for std.concurrency 
 that works for me is summarized here:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=1735s
Thanks! I like your simple exception wrapping pattern, will use that.
Feb 20 2022
prev sibling next sibling parent reply eugene <dee0xeed gmail.com> writes:
On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:
 In general it should buffer data in RAM to avoid exerting back 
 pressure on the input socket and to allow for dropped 
 connections to the PG database.
If I get it right you want to restore connection if it was closed by server for some reason. I use special SM for that purpose, see [this picture](http://zed.karelia.ru/0/e/db-link.jpg) In each state where this SM has to send/recv data, it takes sending/receiving SM from a pool and commands them to perform the task. Upon reaching IDLE state this machine send some messsge to the user (another SM) of the connection and seats in this state until the user detects connection lost (in which case it sends M2 to DB-LINK SM). Then DB-LINK goes to WAIT state, where it starts a timer and when it expires, it goes to CONN state, where it tries to reconnect (using sending SM - when connection is ready we get POLLOUT on socket). You can have as many such connectors as you want, so you have multiple connections within single thread. I often use two connections, one for perform main task (upload some data and alike) and the second for getting notifications from PG, 'cause it very incovinient to do both in a single connection.
Feb 20 2022
parent reply Chris Piker <chris hoopjump.com> writes:
On Sunday, 20 February 2022 at 18:36:21 UTC, eugene wrote:

 I often use two connections, one for perform main task
 (upload some data and alike) and the second for getting
 notifications from PG, 'cause it very incovinient to
 do both in a single connection.
Ah, a very handy tip. It would be convoluted to multiplex notifications on the data connection.
Feb 20 2022
parent eugene <dee0xeed gmail.com> writes:
On Monday, 21 February 2022 at 04:48:56 UTC, Chris Piker wrote:
 On Sunday, 20 February 2022 at 18:36:21 UTC, eugene wrote:

 I often use two connections, one for perform main task
 (upload some data and alike) and the second for getting
 notifications from PG, 'cause it very incovinient to
 do both in a single connection.
Ah, a very handy tip. It would be convoluted to multiplex notifications on the data connection.
I am remembering psql client behavior - it sees notifications only after some request. It is really inconvinient to perform regular tasks and be ready to peek up notifications at any moment in one connection.
Feb 20 2022
prev sibling parent reply eugene <dee0xeed gmail.com> writes:
On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker wrote:
 3. Update/insert to a postgresql database as data arrive.
I've remembered one not so obvious feature of TCP sockets behavour. If the connection is closed on the server side (i.e. on the client side the socket is in CLOSE_WAIT state), **write() will not fail**, and data will go to no nowhere. For this reason I have this (commented) code: ```d bool connOk() { /* tcp_info tcpi; // linux specific, see /usr/include/linux/tcp.h socklen_t tcpi_len = tcpi.sizeof; getsockopt(id, SOL_TCP, TCP_INFO, &tcpi, &tcpi_len); if (tcpi.tcpi_state != TCP_ESTABLISHED) return false; */ return true; // TODO } ``` I've dig up this in one of my progs (uploader to pg db): ```c /* check conn */ int dbu_psgr_ckcn_enter(struct edsm *me) { struct mexprxtx_data *ctx = me->data; struct databuffer *ob = &ctx->obuf; struct databuffer *sb = &ctx->sbuf; int est = 0; if (ctx->sock > 0) est = csock_check_state(ctx->sock); if (!est) { /* save request to spare buffer */ __dbu_make_request_copy(sb, ob); if (ctx->sock > 0) log_inf_msg( "%s()/DBU_%.3d - server has closed connection (fd %d)\n", __func__, me->number, ctx->sock ); edsm_put_event(me, ECM1_CONN); } else { edsm_put_event(me, ECM0_SEND); } return 0; } ``` I definitely did not want to loose data and so I am checking socket state before doing an INSERT request. I do not think it is 100% reliable, but I could not invent anything else.
Feb 24 2022
parent reply eugene <dee0xeed gmail.com> writes:
On Thursday, 24 February 2022 at 08:46:35 UTC, eugene wrote:
 On Saturday, 19 February 2022 at 20:13:01 UTC, Chris Piker 
 wrote:
 3. Update/insert to a postgresql database as data arrive.
I've remembered one not so obvious feature of TCP sockets behavour. If the connection is closed on the server side (i.e. on the client side the socket is in CLOSE_WAIT state), **write() will not fail**, and data will go to no nowhere.
https://stackoverflow.com/questions/26130010/avoiding-dataloss-in-go-when-writing-with-close-wait-socket In my case (I was working with REDIS KVS at the moment) exact scenario was as follows: * prog gets EPOLLOUT (write() won't block) * prog writes()'s data to REDIS ("successfully") * prog gets EPOLLERR|EPOLLHUP After this I see that I need to reconnect, but I've already send some data which had gone to nowhere. People sometimes recommend using usual non-blocking sockets, but it is not the case.
Feb 24 2022
parent eugene <dee0xeed gmail.com> writes:
On Thursday, 24 February 2022 at 09:11:01 UTC, eugene wrote:
 In my case (I was working with REDIS KVS at the moment)
 exact scenario was as follows:

 * prog gets EPOLLOUT (write() won't block)
 * prog writes()'s data to REDIS ("successfully")
 * prog gets EPOLLERR|EPOLLHUP

 After this I see that I need to reconnect, but I've already send
 some data which had gone to nowhere.

 People sometimes recommend using usual non-blocking sockets,
 but it is not the case.
errhh... usual **blocking** sockets, of course. however then I have to deal with multiprocess/multithread architechture, but my goal was *fine-graned* concurrency within single process.
Feb 24 2022