www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Ddoc to PDF

reply Walter Bright <newshound2 digitalmars.com> writes:
Apparently, it is fairly simple to convert plain text files to PDF.

http://re-factor.blogspot.com/2010/10/text-to-pdf.html

Which suggests to me it should be equally simple to create a Ddoc macro file to 
allow Ddoc to emit pdf files directly.

Anyone want a nice weekend project to product this?
Oct 17 2010
next sibling parent reply Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:
On 17/10/2010 18:45, Walter Bright wrote:
 Apparently, it is fairly simple to convert plain text files to PDF.

 http://re-factor.blogspot.com/2010/10/text-to-pdf.html

 Which suggests to me it should be equally simple to create a Ddoc macro
 file to allow Ddoc to emit pdf files directly.

 Anyone want a nice weekend project to product this?
I read the PDF spec once*, I can see in my mind what a PDF generated by DDoc could look like, and I'm quite confident in saying that it is nothing like as pretty or simple to produce as the current, most basic HTML output. The "Hello Worlrd" for PDF (found in appendix H of the spec) makes DNA look simple and concise**. When generating a PDF, one has to to do all the layout, calculating when to place line breaks and begin new pages. When generating HTML, all this work is left to the web browser instead, which is why PDFs always look the same, but web pages are rendered 11 different ways by 7 different browsers. PDFs do have a tree like structure, but they are not laid out like a html file. Instead, there is a stream of cross referenced objects, each with a unique reference number and a reference to its parent and a list of its children. This means that paragraphs which span pages need to be broken up into pieces contained within different objects. Doing a layout for an unstructured stream of text in a fixed width typeface (such as in the link you posted) is quite simple, but - as far as I can fathom - is still beyond the current DDoc. Using variable width typefaces, indentation, borders, emphasis, etc. to try and produce a PDF with the same visual style as that which can be easily achieved using the current HTML macros would be very difficult (though I'm not going to go so far as saying its impossible). I think something quite pleasing could be generated with minimal post processing, but not by using DDoc alone, after all, there is post processing for DDoc right now, every time its HTML output is loaded into a browser. So, what enhancements do I think DDoc needs to be able to support the generation of PDFs? After a lot of thought, I have come to the conclusion that giving DDoc the power required to calculate layout in a way that is general enough to be used not only by PDF but by any other layout technology, and the ability to work with a flattened tree, is a non starter. Alternatively, I can't help wondering if it would be possible to use Ds compile time abilities to perform the post processing necessary? Well, I know its powerful enough, but there are a few issues with letting code from another source play in your sandbox when all one wanted to do was read the instructions... But, if the DDoc macro file specified on the command line could contain D code for post processing that is run by the CTFE engine and passed the expanded DDoc, then it could be flattened, parsed to calculate line length, generate all the cross references, split it all into pages and spat out as a PDF. I still think it would be more than 1 weekend's work though***. CAVEAT LECTOR! I'm not an expert at PDFs or DDoc, so I'd be very happy to be proven wrong, the wronger**** the better ^^ A... * Not as crazy as reading it twice would be. ** I will admit that this is possibly a slight exaggeration. *** I, however, code slower than the average bear. **** I know that is not a real word, so don't complain ><
Oct 18 2010
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Mon, 18 Oct 2010 12:54:13 +0100, Alix Pexton wrote:

 On 17/10/2010 18:45, Walter Bright wrote:
 Apparently, it is fairly simple to convert plain text files to PDF.

 http://re-factor.blogspot.com/2010/10/text-to-pdf.html

 Which suggests to me it should be equally simple to create a Ddoc macro
 file to allow Ddoc to emit pdf files directly.

 Anyone want a nice weekend project to product this?
I read the PDF spec once*, I can see in my mind what a PDF generated by DDoc could look like, and I'm quite confident in saying that it is nothing like as pretty or simple to produce as the current, most basic HTML output. The "Hello Worlrd" for PDF (found in appendix H of the spec) makes DNA look simple and concise**. [...] I still think it would be more than 1 weekend's work though***.
I, too, suspect it would be quite a lot of work. The path of least resistance would probably be to go via LaTeX. Making DDOC macros for LaTeX output sounds more like a one-weekend mini project. -Lars
Oct 18 2010
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 10/18/2010 06:54 AM, Alix Pexton wrote:
 On 17/10/2010 18:45, Walter Bright wrote:
 Apparently, it is fairly simple to convert plain text files to PDF.

 http://re-factor.blogspot.com/2010/10/text-to-pdf.html

 Which suggests to me it should be equally simple to create a Ddoc macro
 file to allow Ddoc to emit pdf files directly.

 Anyone want a nice weekend project to product this?
I read the PDF spec once*, I can see in my mind what a PDF generated by DDoc could look like, and I'm quite confident in saying that it is nothing like as pretty or simple to produce as the current, most basic HTML output.
There's no need for all that. It took me a short time to produce a set of macros that would generate LaTeX files from ddoc. I'm sure I have it somewhere, or I could rewrite it. From there you get to produce high quality PDFs. Andrei
Oct 18 2010
next sibling parent Gianluigi Rubino <gianluigi grsoft.org> writes:
 There's no need for all that. It took me a short time to produce a set of
 macros that would generate LaTeX files from ddoc. I'm sure I have it
 somewhere, or I could rewrite it. From there you get to produce high quality
 PDFs.
In my modest opinion, I agree: the optimal way would be LaTeX, so you post-pone rendering and processing just like you already do for HTML, as Alix noted.
Oct 18 2010
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 There's no need for all that. It took me a short time to produce a set 
 of macros that would generate LaTeX files from ddoc. I'm sure I have it 
 somewhere, or I could rewrite it. From there you get to produce high 
 quality PDFs.
I think this would be great to include in the D distribution, with a web page with step-by-step instructions.
Oct 18 2010
parent reply "Gour D." <gour atmarama.net> writes:
On Mon, 18 Oct 2010 11:14:54 -0700
 "Walter" =3D=3D Walter Bright <newshound2 digitalmars.com> wrote:
Walter> I think this would be great to include in the D distribution, Walter> with a web page with step-by-step instructions. May I ask here whether Ddoc is recommended way to document D code over e.g. Doxygen etc.? I'm starting and would like to adopt proper tools... Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 18 2010
parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 18 Oct 2010 14:46:13 -0400, Gour D. <gour atmarama.net> wrote:

 On Mon, 18 Oct 2010 11:14:54 -0700
 "Walter" == Walter Bright <newshound2 digitalmars.com> wrote:
Walter> I think this would be great to include in the D distribution, Walter> with a web page with step-by-step instructions. May I ask here whether Ddoc is recommended way to document D code over e.g. Doxygen etc.? I'm starting and would like to adopt proper tools...
doxygen had minimal support for D1, but it has not been updated in a long long time. Besides ddoc, there are a couple of D-based doc generators: descent (which I guess is dead), and dil. However, all the d-based tools parse the same ddoc style, so no matter what tool you use, you should use ddoc to document your code. -Steve
Oct 18 2010
parent "Gour D." <gour atmarama.net> writes:
On Mon, 18 Oct 2010 14:57:13 -0400
 "Steven" =3D=3D "Steven Schveighoffer" <schveiguy yahoo.com> wrote:
Steven> However, all the d-based tools parse the same ddoc style, so no Steven> matter what tool you use, you should use ddoc to document your Steven> code. Thanks a lot. That's what I needed to hear. ;) Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 18 2010
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Andrei Alexandrescu wrote:

 On 10/18/2010 06:54 AM, Alix Pexton wrote:
 On 17/10/2010 18:45, Walter Bright wrote:
 Apparently, it is fairly simple to convert plain text files to PDF.

 http://re-factor.blogspot.com/2010/10/text-to-pdf.html

 Which suggests to me it should be equally simple to create a Ddoc macro
 file to allow Ddoc to emit pdf files directly.

 Anyone want a nice weekend project to product this?
I read the PDF spec once*, I can see in my mind what a PDF generated by DDoc could look like, and I'm quite confident in saying that it is nothing like as pretty or simple to produce as the current, most basic HTML output.
There's no need for all that. It took me a short time to produce a set of macros that would generate LaTeX files from ddoc. I'm sure I have it somewhere, or I could rewrite it. From there you get to produce high quality PDFs. Andrei
Cool, it would be nice if you could find it. I have a D class for program listings (for a presentation with beamer), needs a little work but could be useful too.
Oct 18 2010
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 There's no need for all that. It took me a short time to produce a set 
 of macros that would generate LaTeX files from ddoc. I'm sure I have it 
 somewhere, or I could rewrite it. From there you get to produce high 
 quality PDFs.
I think you're right and that's the way we should do it. On the other hand, just for fun I wanted to see if I could read a paperback I got from the thrift store on my ipod. I sliced the back off, and ran it through a scanner to create an OCR'd pdf. Loading the pdf into my ipod didn't work, as it was 25 megs and so far, the only way I've figured out how to get pdf's to the ipod is via email. Trying a small sample did work, but the page image on the ipod was just too small to read. I had to use a magnifying glass. So I loaded the pdf, did a select all, and wrote the text out to a simple text file. Emailing the text file worked, but imail has a crappy text file reader. No good. Next I downloaded and compiled the text2pdf.c file. Trying it crashed. Buffer overflow!! Bumped all the buffer sizes way up, and it worked. Emailed it to the ipod, saved it as an "ibook", and I could read it. The pages were still too big, though, and the ibook reader made zooming the pages a miserable experience. (Apple didn't get everything right.) I downloaded the pdf spec and figured out how to set the page margins to zilch (don't need page margins on the ipod), about 11 lines by 60 characters seems to work fine. Reset the font to Times-Roman. Now it works perfectly! The only problem is the book itself sux. Oh well!
Oct 18 2010
next sibling parent "Nick Sabalausky" <a a.a> writes:
"Walter Bright" <newshound2 digitalmars.com> wrote in message 
news:i9j7c7$209t$1 digitalmars.com...
 Andrei Alexandrescu wrote:
 There's no need for all that. It took me a short time to produce a set of 
 macros that would generate LaTeX files from ddoc. I'm sure I have it 
 somewhere, or I could rewrite it. From there you get to produce high 
 quality PDFs.
I think you're right and that's the way we should do it. On the other hand, just for fun I wanted to see if I could read a paperback I got from the thrift store on my ipod. I sliced the back off, and ran it through a scanner to create an OCR'd pdf. Loading the pdf into my ipod didn't work, as it was 25 megs and so far, the only way I've figured out how to get pdf's to the ipod is via email.
My portable music player is a Toshiba Gigabeat F with the Rockbox firmware. I can put any files I want onto it by connecting the USB cord and treating it like the USB HDD that it actually is. (Doesn't have a pdf reader though. Although it can display text files.)
 (Apple didn't get everything right.)
Clearly not!
Oct 18 2010
prev sibling parent reply David Gileadi <gileadis NSPMgmail.com> writes:
On 10/18/10 9:37 PM, Walter Bright wrote:
 On the other hand, just for fun I wanted to see if I could read a
 paperback I got from the thrift store on my ipod. I sliced the back off,
 and ran it through a scanner to create an OCR'd pdf. Loading the pdf
 into my ipod didn't work, as it was 25 megs and so far, the only way
 I've figured out how to get pdf's to the ipod is via email.
Apple has been notorious for making it hard to load files onto their iOS devices. Recently in iOS 4 they've provided for it through iTunes, but it's still very clunky. For PDFs I use GoodReader which makes it somewhat easy to load files wirelessly and whose claim to fame is that it supports very large PDFs without crashing.
Oct 19 2010
parent Walter Bright <newshound2 digitalmars.com> writes:
David Gileadi wrote:
 Apple has been notorious for making it hard to load files onto their iOS 
 devices.  Recently in iOS 4 they've provided for it through iTunes, but 
 it's still very clunky.  For PDFs I use GoodReader which makes it 
 somewhat easy to load files wirelessly and whose claim to fame is that 
 it supports very large PDFs without crashing.
I haven't had problems with Apple's ipod pdf reader crashing. Just getting files to it! It's very strange that Safari on the ipod cannot look at files on shares on my LAN, whereas Safari on OSX can (and every other browser can, too).
Oct 19 2010
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Alix Pexton wrote:
 After a lot of thought, I have come to the conclusion that giving DDoc 
 the power required to calculate layout in a way that is general enough 
 to be used not only by PDF but by any other layout technology, and the 
 ability to work with a flattened tree, is a non starter.
Perhaps you're right. But it strikes me as possible that a Ddoc template that emits the text in some custom format, which is then read by a custom pdf generator program, could work. In other words, a two step process.
Oct 18 2010
next sibling parent reply David Gileadi <gileadis NSPMgmail.com> writes:
On 10/18/10 11:13 AM, Walter Bright wrote:
 Alix Pexton wrote:
 After a lot of thought, I have come to the conclusion that giving DDoc
 the power required to calculate layout in a way that is general enough
 to be used not only by PDF but by any other layout technology, and the
 ability to work with a flattened tree, is a non starter.
Perhaps you're right. But it strikes me as possible that a Ddoc template that emits the text in some custom format, which is then read by a custom pdf generator program, could work. In other words, a two step process.
Presuming that custom format is HTML then this is close to actuality. During the look 'n' feel changes for d-programming-language.org I experimented with using wkhtmltopdf (http://code.google.com/p/wkhtmltopdf/) to generate PDF versions of the documentation using the print CSS styles. Besides a couple of defects to do with page numbering the results looked very good.
Oct 18 2010
parent "Gour D." <gour atmarama.net> writes:
On Mon, 18 Oct 2010 11:19:59 -0700
 "David" =3D=3D David Gileadi <gileadis NSPMgmail.com> wrote:
David> Presuming that custom format is HTML then this is close to David> actuality. During the look 'n' feel changes for David> d-programming-language.org I experimented with using wkhtmltopdf=20 David> (http://code.google.com/p/wkhtmltopdf/) to generate PDF versions David> of the documentation using the print CSS styles. Besides a David> couple of defects to do with page numbering the results looked David> very good. +1 wkhtmltopdf looks very good, although I had problem with it or let's say with php-wkhtmltox extension. Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 18 2010
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Walter Bright wrote:

 Alix Pexton wrote:
 After a lot of thought, I have come to the conclusion that giving DDoc
 the power required to calculate layout in a way that is general enough
 to be used not only by PDF but by any other layout technology, and the
 ability to work with a flattened tree, is a non starter.
Perhaps you're right. But it strikes me as possible that a Ddoc template that emits the text in some custom format, which is then read by a custom pdf generator program, could work. In other words, a two step process.
ddoc -> LaTeX -> profit LaTeX produces most beautiful output, including (amongst others) well rendered pdf. No need to write a custom generator.
Oct 18 2010
prev sibling parent Gianluigi <gianluigi grsoft.org> writes:
  Il 18/10/10 20.13, Walter Bright ha scritto:
 Perhaps you're right. But it strikes me as possible that a Ddoc 
 template that emits the text in some custom format, which is then read 
 by a custom pdf generator program, could work. In other words, a two 
 step process.
IMHO Custom text format >> LaTeX
Oct 19 2010
prev sibling parent reply BCS <none anon.com> writes:
Hello Alix,

 Doing a layout for an unstructured stream of text in a fixed width
 typeface (such as in the link you posted) is quite simple, but - as
 far as I can fathom - is still beyond the current DDoc. Using variable
 width typefaces, indentation, borders, emphasis, etc. to try and
 produce a PDF with the same visual style as that which can be easily
 achieved using the current HTML macros would be very difficult (though
 I'm not going to go so far as saying its impossible).
IIRC pdf is built on PS and PS is Turing compleat, you might be able to do all the processing in PS and just slap the DDoc content is as data. -- ... <IXOYE><
Oct 18 2010
parent Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:
On 19/10/2010 04:12, BCS wrote:
 Hello Alix,

 Doing a layout for an unstructured stream of text in a fixed width
 typeface (such as in the link you posted) is quite simple, but - as
 far as I can fathom - is still beyond the current DDoc. Using variable
 width typefaces, indentation, borders, emphasis, etc. to try and
 produce a PDF with the same visual style as that which can be easily
 achieved using the current HTML macros would be very difficult (though
 I'm not going to go so far as saying its impossible).
IIRC pdf is built on PS and PS is Turing compleat, you might be able to do all the processing in PS and just slap the DDoc content is as data.
Hmn, I hadn't thought of that, I looked it up, and it seems that the subset of PS in the PDF spec is crippled, it has no "if" or "loop" constructs... But you reminded me that PDFs can embed ActionScript! I'm not sure how practical it would be, particularly for large documents, as the whole stream would need to be processed in order to view any page, or even calculate the number of pages. PDF was designed so that any page could be reached and rendered quickly without having to process all the pages that came before. Overall, I think the LaTeX in the middle solution is a far better option, especially as it is already working ^^ A...
Oct 19 2010
prev sibling next sibling parent Kagamin <spam here.lot> writes:
Walter Bright Wrote:

 Apparently, it is fairly simple to convert plain text files to PDF.
 
 http://re-factor.blogspot.com/2010/10/text-to-pdf.html
 
 Which suggests to me it should be equally simple to create a Ddoc macro file
to 
 allow Ddoc to emit pdf files directly.
You need to produce single pdf for entire project.
Oct 18 2010
prev sibling next sibling parent reply Gerrit Wichert <gwichert yahoo.com> writes:
Am 17.10.2010 19:45, schrieb Walter Bright

 Apparently, it is fairly simple to convert plain text files to PDF.

 http://re-factor.blogspot.com/2010/10/text-to-pdf.html

 Which suggests to me it should be equally simple to create a Ddoc
 macro file to allow Ddoc to emit pdf files directly.

 Anyone want a nice weekend project to product this?
I don't think that it is the best idea to produce a pdf in one step. First PDF is really complicated (and also evolves over time). Second this would require dmd to determine the layout of the generated documentation. We could easily avoid the frist point. When we just make ddoc generating xsl-fo a tool like apache fop can be used to generate pdf or html from it. This is what xsl-fo is designed for. It's not rocket science to create a xsl-fo layout. But the second problem remains. If i where a company or community writing libraries in d i would like to have some corporate identity in it. This means that I want to decide over the layout. So i would really prefer if ddoc were *additionaly* able to generate a pure semantical version of the document data that is easy to mess with an external tool. This can be a simple xml file which i can feed into my own transformation pipline. This way ddoc does the part it can really shine on, extracting the information, and delegates the rest to something that knows more about the wishes of the actual user. This shuoldn't mean that ddoc should stop generating unified standart documentation. But i think it is worth a thought to generate semantic data files on request. Gerrit
Oct 19 2010
next sibling parent Stephan Soller <stephan.soller helionweb.de> writes:
On 19.10.2010 15:14, Gerrit Wichert wrote:
 Am 17.10.2010 19:45, schrieb Walter Bright

 Apparently, it is fairly simple to convert plain text files to PDF.

 http://re-factor.blogspot.com/2010/10/text-to-pdf.html

 Which suggests to me it should be equally simple to create a Ddoc
 macro file to allow Ddoc to emit pdf files directly.

 Anyone want a nice weekend project to product this?
I don't think that it is the best idea to produce a pdf in one step. First PDF is really complicated (and also evolves over time). Second this would require dmd to determine the layout of the generated documentation. We could easily avoid the frist point. When we just make ddoc generating xsl-fo a tool like apache fop can be used to generate pdf or html from it. This is what xsl-fo is designed for. It's not rocket science to create a xsl-fo layout. But the second problem remains. If i where a company or community writing libraries in d i would like to have some corporate identity in it. This means that I want to decide over the layout. So i would really prefer if ddoc were *additionaly* able to generate a pure semantical version of the document data that is easy to mess with an external tool. This can be a simple xml file which i can feed into my own transformation pipline. This way ddoc does the part it can really shine on, extracting the information, and delegates the rest to something that knows more about the wishes of the actual user. This shuoldn't mean that ddoc should stop generating unified standart documentation. But i think it is worth a thought to generate semantic data files on request. Gerrit
I agree but maybe DDoc as it is now is already enough. I haven't looked closely at the HTML DDoc generates but from what I've seen on the D home page it looks expressive enough. This HTML code can be converted to PDF with [PrinceXML][1] and you have all the flexibility of CSS at your disposal (I know not everyone likes CSS…). Letting the compiler do the layout of the PDF generation might be overkill. It's a compiler and not a layout engine. [1]: http://www.princexml.com/ Happy programming Stephan
Oct 19 2010
prev sibling parent Gour <gour atmarama.net> writes:
On Tue, 19 Oct 2010 15:14:13 +0200
 "Gerrit" =3D=3D Gerrit Wichert <gwichert yahoo.com> wrote:
Gerrit> When we just make ddoc generating xsl-fo a tool like apache Gerrit> fop can be used to generate pdf or html from it. This is what Gerrit> xsl-fo is designed for. It's not rocket science to create a Gerrit> xsl-fo layout. Uhh...xsl & fop...I quickly run away from those tools. Why not just creating some 'standard markup (markdown, reST...) which can be later post-processed into html/pdf/man/...? Otoh, I'm quite satisfied having some macros to produce LateX, although having e.g. reST markup and processing it with Sphinx gives nice output (HTML- including Windows HTML Help, LaTeX manual pages, plain text) if there is no NIH problem. Sincerely, Gour --=20 Gour | Hlapicina, Croatia | GPG key: CDBF17CA ----------------------------------------------------------------
Oct 19 2010
prev sibling parent Emil Madsen <sovende gmail.com> writes:
Having two modes? - the basic standart one, and a flag switched one?

On 19 October 2010 15:14, Gerrit Wichert <gwichert yahoo.com> wrote:

 Am 17.10.2010 19:45, schrieb Walter Bright

 Apparently, it is fairly simple to convert plain text files to PDF.

 http://re-factor.blogspot.com/2010/10/text-to-pdf.html

 Which suggests to me it should be equally simple to create a Ddoc
 macro file to allow Ddoc to emit pdf files directly.

 Anyone want a nice weekend project to product this?
I don't think that it is the best idea to produce a pdf in one step. First PDF is really complicated (and also evolves over time). Second this would require dmd to determine the layout of the generated documentation. We could easily avoid the frist point. When we just make ddoc generating xsl-fo a tool like apache fop can be used to generate pdf or html from it. This is what xsl-fo is designed for. It's not rocket science to create a xsl-fo layout. But the second problem remains. If i where a company or community writing libraries in d i would like to have some corporate identity in it. This means that I want to decide over the layout. So i would really prefer if ddoc were *additionaly* able to generate a pure semantical version of the document data that is easy to mess with an external tool. This can be a simple xml file which i can feed into my own transformation pipline. This way ddoc does the part it can really shine on, extracting the information, and delegates the rest to something that knows more about the wishes of the actual user. This shuoldn't mean that ddoc should stop generating unified standart documentation. But i think it is worth a thought to generate semantic data files on request. Gerrit
-- // Yours sincerely // Emil 'Skeen' Madsen
Oct 19 2010