digitalmars.D - code databases for ai
- monkyyy (30/30) Dec 13 I started the process of extracting all the code from the forums
- jmh530 (3/9) Dec 16 I talked a little about what I had tried here
- monkyyy (3/13) Dec 16 Im not entirely sure what that markup format is that snar
- jmh530 (19/33) Dec 16 Honestly, my first attempt at it, I basically just asked ChatGPT
- Basile B. (4/7) Dec 16 That reminds me an old idea... make/update a symbol database when
- monkyyy (3/11) Dec 16 I have been anti these tools for myself and know nothing, whats
I started the process of extracting all the code from the forums 3 weeks ago: https://github.com/crazymonkyyy/dlangforums (I know of some flaws here, ai slop but semi-functional) adr's style of code of giant files with example programs in comments needs some amount of processing (qwen doesnt read adr's files without being explicitly told to, I dont know if any of them have the "attention" to handle "simple display") extracting links from dub webpages likely isnt that hard My own code is a horrible mess, I never got around to actually cleaning up my repos, when I planned on doing that last year or the year before, or the year before. To say nothing of my unnamed gists. etc. --- Its a big project to try to collect as much of trusted code into one organization system, "rag" is a bit of a meme but seeding a code base with known good code(compared to ai hullinations anyway) for a degree of taste and something that actually compiles is a real technique. (dont any of yall tell me "I told you so" about dub, it still will require processing) if anyone else is working on pieces id like to know about it. I have some thoerys about how to meta program to detect if a struct is a container, if a function is a range algorithm, if a file is a program, etc. Has anyone done anything on this subject? Is anyone interested in it? It may need a real hosting solution, github has file size caps that I ran into with just the forums if I start extracting from dub and then try to host that github may get quite upset.
Dec 13
On Saturday, 13 December 2025 at 23:27:40 UTC, monkyyy wrote:[snip] Has anyone done anything on this subject? Is anyone interested in it? It may need a real hosting solution, github has file size caps that I ran into with just the forums if I start extracting from dub and then try to host that github may get quite upset.I talked a little about what I had tried here https://forum.dlang.org/post/sfuxoiwthnqacwmfwxxs forum.dlang.org
Dec 16
On Tuesday, 16 December 2025 at 11:59:49 UTC, jmh530 wrote:On Saturday, 13 December 2025 at 23:27:40 UTC, monkyyy wrote:Im not entirely sure what that markup format is that snar found/made but I wonder if its trivially convertible.[snip] Has anyone done anything on this subject? Is anyone interested in it? It may need a real hosting solution, github has file size caps that I ran into with just the forums if I start extracting from dub and then try to host that github may get quite upset.I talked a little about what I had tried here https://forum.dlang.org/post/sfuxoiwthnqacwmfwxxs forum.dlang.org
Dec 16
On Tuesday, 16 December 2025 at 16:07:27 UTC, monkyyy wrote:On Tuesday, 16 December 2025 at 11:59:49 UTC, jmh530 wrote:Honestly, my first attempt at it, I basically just asked ChatGPT what I should do to make a Dlang RAG and it recommended basically what I describe in that post. So I put together the single combined file without thinking of Har. Basically looks like [File A contents] [File B contents] etc. Har [1] is used for run.dlang.io as a format for handling multiple files. The examples on that page are very similar to what I have above. But it's nothing special, per se. As it stands, that repo just handles reading Har, not writing them. So that doesn't do much good for this application (writing it is an issue). I just figured rather than writing my own thing, I would contribute to that. But then I got distracted by other things and haven't come back to it. [1] https://github.com/marler8997/harOn Saturday, 13 December 2025 at 23:27:40 UTC, monkyyy wrote:Im not entirely sure what that markup format is that snar found/made but I wonder if its trivially convertible.[snip] Has anyone done anything on this subject? Is anyone interested in it? It may need a real hosting solution, github has file size caps that I ran into with just the forums if I start extracting from dub and then try to host that github may get quite upset.I talked a little about what I had tried here https://forum.dlang.org/post/sfuxoiwthnqacwmfwxxs forum.dlang.org
Dec 16
On Tuesday, 16 December 2025 at 18:59:54 UTC, jmh530 wrote:[snip]Didn't display correctly due to markdown being checked. Should look like this[File A contents] [File B contents] etc.
Dec 16
On Tuesday, 16 December 2025 at 18:59:54 UTC, jmh530 wrote:I basically just asked ChatGPT what I should do to make a Dlang RAG and it recommended basically what I describe in that post.I would strongly suggest the first 10 lines of every project should be a human written rant. Taste is extremely important and possibly can be lost.
Dec 16
On Wednesday, 17 December 2025 at 01:17:25 UTC, monkyyy wrote:On Tuesday, 16 December 2025 at 18:59:54 UTC, jmh530 wrote:I still don't trust the code generated by the latest versions of ChatGPT without reviewing it.I basically just asked ChatGPT what I should do to make a Dlang RAG and it recommended basically what I describe in that post.I would strongly suggest the first 10 lines of every project should be a human written rant. Taste is extremely important and possibly can be lost.
Dec 17
On Wednesday, 17 December 2025 at 18:01:50 UTC, jmh530 wrote:On Wednesday, 17 December 2025 at 01:17:25 UTC, monkyyy wrote:Worse of both worlds. Its the first move of a sudoku puzzle thats the hardest and requires the entire stucture to be understood. Its the title that decides the conclusion of an essay. They just fill in details. A compiler may prompt of edit but if theres a logic error in the way the butterfly flaps its wing, nothing the compiler does will detect it. https://youtu.be/dcolM6W5Odc?si=EVFgJzFH9jmape7A&t=510On Tuesday, 16 December 2025 at 18:59:54 UTC, jmh530 wrote:I still don't trust the code generated by the latest versions of ChatGPT without reviewing it.I basically just asked ChatGPT what I should do to make a Dlang RAG and it recommended basically what I describe in that post.I would strongly suggest the first 10 lines of every project should be a human written rant. Taste is extremely important and possibly can be lost.children with 10 ais, or 1 ai writting plans for essays show lower brain activity children writing a plan first before given the ai show more brain activity then ai-less children
Dec 17
On Saturday, 13 December 2025 at 23:27:40 UTC, monkyyy wrote:I started the process of extracting all the code from the forums 3 weeks ago: [...]That reminds me an old idea... make/update a symbol database when you compile a project. Then your _next-gen_ completion deamon can use it.
Dec 16
On Tuesday, 16 December 2025 at 18:19:46 UTC, Basile B. wrote:On Saturday, 13 December 2025 at 23:27:40 UTC, monkyyy wrote:I have been anti these tools for myself and know nothing, whats the easiest to grab data?I started the process of extracting all the code from the forums 3 weeks ago: [...]That reminds me an old idea... make/update a symbol database when you compile a project. Then your _next-gen_ completion deamon can use it.
Dec 16









jmh530 <john.michael.hall gmail.com> 