www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Issues with std.net.curl on Win 10 x64

reply cptgrok <sanden.shelton gmail.com> writes:
I need to review syslogs for over 160 systems monthly, and I am 
trying to write a utility to automate bulk downloads from a 
custom web service where they are hosted. I need to calculate a 
date range for the prior month, add start and end date and a 
serial number to the query string for each system, which is easy, 
and in a foreach(system; systems) loop in main() I call a 
function passing the string url in to download and write a log to 
file. For a small number of systems, it works.

My trouble is, using std.net.curl, if I use get(URL) to get the 
entire text in a single call and write to file, memory usage 
spirals out of control immediately and within 20 or so calls, 
gets to about 1.3 GB and the program crashes. If I use 
byLineAsync(URL), then foreach(line; range) write the lines to 
file one at a time the memory usage never gets above 5MB but it 
just hangs always at the 51st call in the loop regardless of what 
parameters are in the query string, or how much data I have 
downloaded. The program never terminates, even after hours, but I 
can't see ANY activity on the process, CPU, mem or network. I can 
break my download jobs into <=50 systems (and it seems to work), 
but that seems like sweeping something under the rug, probably 
leading to future issues.

I'm using the 32bit binary from 
libcurl-7.64.0-WinSSL-zlib-x86-x64.zip on the release archive, 
and DMD 2.085.0. I've tried curl 7.63 and 7.57 but the behavior 
is the same.

Am I doing something wrong or is there some issue with curl or 
something else? I'm pretty new to D and I'm not sure if I need to 
go right down to raw sockets and re-invent the wheel or if there 
is some other library that can help. If I get this working, it 
could potentially save myself and many others hours per week.
Mar 25
next sibling parent reply Andre Pany <andre s-e-a-p.de> writes:
On Monday, 25 March 2019 at 16:25:37 UTC, cptgrok wrote:
 I need to review syslogs for over 160 systems monthly, and I am 
 trying to write a utility to automate bulk downloads from a 
 custom web service where they are hosted. I need to calculate a 
 date range for the prior month, add start and end date and a 
 serial number to the query string for each system, which is 
 easy, and in a foreach(system; systems) loop in main() I call a 
 function passing the string url in to download and write a log 
 to file. For a small number of systems, it works.

 [...]
First idea, please switch to x86_64 if possible. This will also be the default of Dub in the next dmd release or the release after. Kind regards Andrew
Mar 25
parent reply cptgrok <sanden.shelton gmail.com> writes:
On Monday, 25 March 2019 at 16:44:12 UTC, Andre Pany wrote:
 First idea, please switch to x86_64 if possible. This will also 
 be the default of Dub in the next dmd release or the release 
 after.

 Kind regards
 Andrew
Figured out --arch=x86_64, thanks! Sadly I don't see any change. I'm not having luck finding known curl issues similar to what I am experiencing. I have a sneaking suspicion that the web service I am using is doing some nonsense in the background. Might try a packet capture to better see what's up.
Mar 25
parent Seb <seb wilzba.ch> writes:
On Monday, 25 March 2019 at 19:02:18 UTC, cptgrok wrote:
 On Monday, 25 March 2019 at 16:44:12 UTC, Andre Pany wrote:
 First idea, please switch to x86_64 if possible. This will 
 also be the default of Dub in the next dmd release or the 
 release after.

 Kind regards
 Andrew
Figured out --arch=x86_64, thanks! Sadly I don't see any change. I'm not having luck finding known curl issues similar to what I am experiencing. I have a sneaking suspicion that the web service I am using is doing some nonsense in the background. Might try a packet capture to better see what's up.
Alternatively, you could always give requests a shot: https://code.dlang.org/packages/requests It's the unofficial successor of std.net.curl.
Mar 25
prev sibling parent Boris Carvajal <boris2.9 gmail.com> writes:
On Monday, 25 March 2019 at 16:25:37 UTC, cptgrok wrote:
 Am I doing something wrong or is there some issue with curl or 
 something else? I'm pretty new to D and I'm not sure if I need 
 to go right down to raw sockets and re-invent the wheel or if 
 there is some other library that can help. If I get this 
 working, it could potentially save myself and many others hours 
 per week.
There is a limit of 50 concurrent messages per thread [1] in byLineAsync also the transmitBuffers argument takes part in. So using multiple byLineAsync at same time/thread is going to block the process, I'm not sure if this is a bug or is by design. You could use download() in a parallel foreach, something like this: import std.stdio; import std.parallelism; import std.net.curl; import std.typecons; void main() { auto connections = 3; // 3 parallel downloads defaultPoolThreads(connections - 1); auto retries = 4; // try up to 4 times if it fails auto logList = [ tuple("dlang.org", "log1.txt"), tuple("dlang.org", "log2.txt"), tuple("dlang.org", "log3.txt"), tuple("dlang.org", "log4.txt"), tuple("dlang.org", "log5.txt"), tuple("dlang.org", "log6.txt")]; foreach (log; parallel(logList, 1)) { HTTP conn = HTTP(); foreach (i; 0 .. retries) { try { writeln("Downloading ", log[0]); download(log[0], log[1], conn); if(conn.statusLine.code == 200) { writeln("File ", log[1], " created."); break; } } catch (CurlException e) { writeln("Retrying ", log[0]); } } } } [1] https://github.com/dlang/phobos/blob/master/std/net/curl.d#L1679
Mar 25