www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Mobipocket to EPUB

reply "Borden" <2013 bordenrhodes.com> writes:
Good evening, all,

I appreciate the work of the people who've made the D 
Documentation, and I've wanted to download an eBook of the 
language specification to read offline. I see that the downloads 
page has the spec in Mobipocket format, but, according to 
wikipedia, it's been superceded by the more widely-adopted, and 
more open, EPUB format.

To this end, I've downloaded the website source code from GitHub 
and I've been fiddling with posix.mak and its supporting files to 
compile the language spec into EPUB. I've been able (sorta) to 
get it to work, but I want to co-ordinate my effort with anyone 
else who's doing the migration.

Further, after I've done my changes, what's the procedure to get 
my changes merged with GitHub?

With thanks,
May 11 2013
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/11/2013 5:49 PM, Borden wrote:
 Further, after I've done my changes, what's the procedure to get my changes
 merged with GitHub?

Generate a "pull request". There's plenty of tutorials on it if you google the phrase.
May 11 2013
prev sibling next sibling parent reply "Borden" <2013 bordenrhodes.com> writes:
Thank you very much, Walter, for your reply. I hope I've done it 
correctly. I may use this thread to post various questions about 
what I find in the repo as I stagger through it. As I consider 
how to approach this problem, here are some questions:

1) On the documentation page for std.xml, it states "Warning: 
This module is considered out-dated and not up to Phobos' current 
standards. It will remain until we have a suitable replacement, 
but be aware that it will not remain long term." Not just for the 
job I'm doing but for other XML manipulation I may do in the 
future, what is the recommended way to manipulate XML in D? Is 
there a task force (to the extent that such things exist in open 
source software) to address the shortcomings?

2) I'm trying to get a sense of the tools repo on GitHub. Am I 
correct in inferring that it consists of utilities to help the 
makefiles in the other repos? Is this project going to blossom 
into something else?

With thanks,
May 14 2013
parent reply Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:
On 15/05/2013 06:22, Borden wrote:
 Thank you for the feedback, all.

 The XML parsing bit is a little important for me because, to generate
 the required boilerplate for an EPUB, one necessarily needs to
 manipulate XML. The approach I'm thinking for the EPUB is to write a
 'generate EPUB from an OPF' program in D and shove it into the tools
 directory (hence my earlier question about what 'belongs' in there).

 Of course, I'd be happy to set straight to work in writing such a
 program (I have some XML experience using PHP and D could not possibly
 be any more awkward to use) but I don't want to be calling in libraries
 that are likely to be superceded before my patches get reviewed.

 As a hobbiest trained in accounting, I'm the last person anyone would
 want to work on a programming language. I suppose, though, that I'm
 pretty good with standards...

fwiw... I recently started implementing my own XML lib for D out of frustration (I want to write addons for inkscape in something that isn't Python). I had originally planned to "port" my own old Ada XML lib, but upon reacquainting myself with the official spec I realised that it would just not cut (no unicode support etc...) I'm pretty sure anyone with any real experience of writing lexers/parsers would look at the spec and think it pretty simple, but I keep finding corner cases and subtleties that I hadn't allowed for in my adaptation of the grammar. I've also not really thought that much about the API yet, I have a plan for the features that I want, but recall from previous discussions that SAX-like callbacks were desired by some, and full DOM-like interfaces by others. Anyone who wants a great XML lib in double-quick time should probably ignore this though, I'm a very slow and rusty coder >< A...
May 15 2013
parent reply Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:
On 15/05/2013 11:39, Russel Winder wrote:
 On Wed, 2013-05-15 at 09:52 +0100, Alix Pexton wrote:
 […]
 I recently started implementing my own XML lib for D out of frustration
 (I want to write addons for inkscape in something that isn't Python).

It might be worth noting here that SAX and DOM have not been distinguished so far in the thread. I would have thought SAX would be really easy in D. Also, whilst Python has it's own W3C compliant DOM parser (minidom), and it has a pure Python (ElementTree) and C implemented simplified (cElementTree), the push is now to use lxml which is a wrapper around libxml2 and libxslt. Although it might be nice to have D implementations, perhaps D might follow Python and wrap a rather good system so as to get something with full validation and XPath sooner rather than later?

libxml2 and its kin should definitely get the deimos treatment! I don't know how hard it might be to create a wrapper with an idiomatically D-ish interface though. A...
May 15 2013
parent Alix Pexton <alix.DOT.pexton gmail.DOT.com> writes:
On 15/05/2013 12:45, Russel Winder wrote:
 On Wed, 2013-05-15 at 12:27 +0100, Alix Pexton wrote:
 […]
 libxml2 and its kin should definitely get the deimos treatment!

Indeed. Sadly, I doubt I will have time to even look at it for at least a couple of months.

I took a look... Why is it that I find the prospect of moulding a wrapper around a c-lib more daunting than writing my own d-lib from scratch? Mike Parker's GameDev.net has made a timely arrival, but although reading it makes me feel more knowledgeable about the process, I feel no more confident ><
 I don't know how hard it might be to create a wrapper with an
 idiomatically D-ish interface though.

I guess the trick will be to look at how lxml provides the ElementTree API (very Pythonic) over libxml2 and take a leaf from that book.

There is a section on the lxml website that goes into very brief detail about how the binding works, it uses cython to wrap the c structs into proxy objects. I should probably say that while I have written code in python, I have never written any pythonic code, I'm not sure what it should look like, so I don't think looking at lxml would give me any insight into how to wrap libxml2. After all, getting away from python is my motivation for putting better XML support into D.
 I am assuming SWIG would not be a good idea here.

That would be a question to ask a SWIG guru (or even a htod guru). A...
May 16 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, May 15, 2013 05:15:24 Borden wrote:
 1) On the documentation page for std.xml, it states "Warning:
 This module is considered out-dated and not up to Phobos' current
 standards. It will remain until we have a suitable replacement,
 but be aware that it will not remain long term." Not just for the
 job I'm doing but for other XML manipulation I may do in the
 future, what is the recommended way to manipulate XML in D? Is
 there a task force (to the extent that such things exist in open
 source software) to address the shortcomings?

I believe that several 3rd party XML parsers exist, but none have attempted to become the new std.xml yet. In order for std.xml to be replaced, someone needs to champion that cause and take the time to either implement it and push it through the review process or take an existing implemenation and push that through the review process (which would likely involve further development on it on some level). No one has done that yet. I'm actually tempted to, since I really like parsing, but I already have too much on my plate to get it done anytime soon. I'm having a hard enough time getting much D-related stuff done as it is. Probably one of the bigger hurdles in getting the new std.xml implemented is the fact that it needs to be range-based, and I'm not sure that any existing xml parser are. The note was added (by me IIRC) in order to warn people that the current implementation will not be around long term, but unfortunately, we don't have a replacement yet. The way stuff like that typically gets done around here is that someone decides that it's important enough to them to do it themselves (and they have the time and expertise to do it), and they do it. One of the regulars around here may do it at some point (though most of them tend to be quite busy with other stuff), or someone new may step up and do it, but until someone steps up and does it, we're stuck. - Jonathan M Davis
May 14 2013
prev sibling next sibling parent "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 15 May 2013 at 03:15:46 UTC, Borden wrote:
 Thank you very much, Walter, for your reply. I hope I've done 
 it correctly. I may use this thread to post various questions 
 about what I find in the repo as I stagger through it. As I 
 consider how to approach this problem, here are some questions:

 1) what is the recommended way to manipulate XML in D? Is there 
 a task force (to the extent that such things exist in open 
 source software) to address the shortcomings?

No "task force" someone needs to take on the task by themselves or with willing participants. We just know the current std.xml is not acceptable for the long term. There is a xml library written by Michael who has shown interest in having it included into Phobos. Though I can't say it is on par with where we'd like std.xml to be. http://wiki.dlang.org/Review_Queue
 2) I'm trying to get a sense of the tools repo on GitHub. Am I 
 correct in inferring that it consists of utilities to help the 
 makefiles in the other repos? Is this project going to blossom 
 into something else?

My understanding is that it is for extra things that support D. But can not speak to its direction.
May 14 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, May 15, 2013 05:45:41 Jesse Phillips wrote:
 2) I'm trying to get a sense of the tools repo on GitHub. Am I
 correct in inferring that it consists of utilities to help the
 makefiles in the other repos? Is this project going to blossom
 into something else?

My understanding is that it is for extra things that support D. But can not speak to its direction.

It's basically where stray stuff goes. It's not really trying to go anywhere AFAIK. It's simply where a program goes when we want to add something to be distributed with the compiler (e.g. rdmd) or which Walter finds useful for development (e.g. detab). It could just as easily be called miscellaneous as tools. - Jonathan M Davis
May 14 2013
prev sibling next sibling parent "Borden" <2013 bordenrhodes.com> writes:
Thank you for the feedback, all.

The XML parsing bit is a little important for me because, to 
generate the required boilerplate for an EPUB, one necessarily 
needs to manipulate XML. The approach I'm thinking for the EPUB 
is to write a 'generate EPUB from an OPF' program in D and shove 
it into the tools directory (hence my earlier question about what 
'belongs' in there).

Of course, I'd be happy to set straight to work in writing such a 
program (I have some XML experience using PHP and D could not 
possibly be any more awkward to use) but I don't want to be 
calling in libraries that are likely to be superceded before my 
patches get reviewed.

As a hobbiest trained in accounting, I'm the last person anyone 
would want to work on a programming language. I suppose, though, 
that I'm pretty good with standards...
May 14 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, 2013-05-15 at 09:52 +0100, Alix Pexton wrote:
[=E2=80=A6]
 I recently started implementing my own XML lib for D out of frustration=

 (I want to write addons for inkscape in something that isn't Python).

It might be worth noting here that SAX and DOM have not been distinguished so far in the thread. I would have thought SAX would be really easy in D. Also, whilst Python has it's own W3C compliant DOM parser (minidom), and it has a pure Python (ElementTree) and C implemented simplified (cElementTree), the push is now to use lxml which is a wrapper around libxml2 and libxslt. Although it might be nice to have D implementations, perhaps D might follow Python and wrap a rather good system so as to get something with full validation and XPath sooner rather than later? --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
May 15 2013
prev sibling next sibling parent Russel Winder <russel winder.org.uk> writes:
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, 2013-05-15 at 12:27 +0100, Alix Pexton wrote:
[=E2=80=A6]
 libxml2 and its kin should definitely get the deimos treatment!

Indeed. Sadly, I doubt I will have time to even look at it for at least a couple of months.
 I don't know how hard it might be to create a wrapper with an=20
 idiomatically D-ish interface though.

I guess the trick will be to look at how lxml provides the ElementTree API (very Pythonic) over libxml2 and take a leaf from that book. I am assuming SWIG would not be a good idea here. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
May 15 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, May 15, 2013 09:52:09 Alix Pexton wrote:
 I've also not really thought that much about the API yet, I have a plan
 for the features that I want, but recall from previous discussions that
 SAX-like callbacks were desired by some, and full DOM-like interfaces by
 others.

Ideally, I would expect multiple levels to it. The lowest one probably wouldn't even be a SAX parser (something closer to StAX probably), and then SAX and DOM parsers could be built on top of that. Then you can get the absolute best speed possible out of it depending on what your requirements are, and we give people the choice between SAX and DOM (as well as the lower- level API if they really want raw speed and don't need the higher level features of either SAX or DOM). - Jonathan M Davis
May 15 2013
prev sibling next sibling parent "Borden" <2013 bordenrhodes.com> writes:
On Wednesday, 15 May 2013 at 10:39:41 UTC, Russel Winder wrote:
 It might be worth noting here that SAX and DOM have not been
 distinguished so far in the thread.  I would have thought SAX 
 would be really easy in D.

generation in D code, not necessarily reading (although any half-decent DOM library needs to be able to read files as well as write them). I'm not very familiar with SAX, but Wikipedia says that it's not standardised and so, if I understand correctly, there would be varying implementations of it.
 Although it might be nice to have D implementations, perhaps D 
 might
 follow Python and wrap a rather good system so as to get 
 something with
 full validation and XPath sooner rather than later?

I think that this is a better approach to take. Surely Windows has an XML library and that would have to be done is write a single header file that would link to the XML library on the operating system? I know, probably far more difficult to implement than it sounds...
May 15 2013
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, May 15, 2013 20:08:13 Borden wrote:
 I think that this is a better approach to take. Surely Windows
 has an XML library and that would have to be done is write a
 single header file that would link to the XML library on the
 operating system? I know, probably far more difficult to
 implement than it sounds...

Long term, that is very much the wrong way to go. Thanks primarily to slicing, parsing is one area where D really shines and is likely to outstrip the competition of the code is well-written. And it does even better of any aspects of it can be generated at compile time (as happens with std.regex - the compile time stuff that it has is the faster regex library on the planet). Also, as far as the standard library goes, we want a range-based XML solution, which is likely to be much harder if you're trying to wrap a C API. So, in this particular case, wrapping a C library would be a stop-gap solution at best. In other cases, it may be the best way to go, but parsing is one area where D stands out. Tango's XML parser is lightning fast and was a great highlight for D1 (and it should be useable in D2 now that Tango has been ported to D2, though given the difference in license, it could never be put into Phobos). It's just that std.xml is a particularly poor implementation. We can do far, far better. - Jonathan M Davis
May 16 2013
prev sibling parent "Borden" <2013 bordenrhodes.com> writes:
On Thursday, 16 May 2013 at 19:23:37 UTC, Jonathan M Davis wrote:
 So, in this particular case, wrapping a C library would be a 
 stop-gap solution at best. In other cases, it may be the best 
 way to go, but parsing is one area where D stands out.

Thank you for the insight, Jonathan, You have the advantage on me as you know far more about this and I know next to nothing. It's amusing, as a general observation, how object-oriented programming was supposed to make code reusable and yet it seems that these essential libraries constantly need rewriting in new languages because they're 'programmed wrong.' As I've mentioned in this thread http://forum.dlang.org/thread/bsbdpjyjubfxvmecwhjl forum.dlang.org my EPUB conversion has hit something of a brick wall as I'm finding DDoc's macros to be very awkward to use and, in order to do what I need them to do, I'd have to rewrite much of the documentation! I'd be interested in feedback on that thread...
May 16 2013