www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D source code revision system idea

reply jdunne4 bradley.edu writes:
I'm not sure if this is the right place to throw up an idea like this, but there
seem to be an astonishing number of competent developers here to offer
insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond
and bounce ideas back off me!

What would you think of a source code revision system that does not work on
line-by-line code differences, but rather semantical differences?  This would be
mainly targeted at D source code, since it lends well to this type of revision
system.  The lack of a pre-processor combined with the concept of modules makes
this language an ideal target.

Pros:
1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
2)  Easy merging of codebases (trunks)
3)  Easy conflict detection during merges (check function call parameters, etc.)
4)  Could spot possible compile errors
5)  Code can be regenerated to conform to a formatting standard
6)  Accepts only correct code (possible con...)

Cons:
1)  Maintaining comments and their positions in the code becomes difficult,
since they are not compilable elements
2)  Somewhat difficult implementation

A new patch/diff toolset would need to be created to accomodate this new
semantic revision control system as well.

Please, let me know what you think!

James Dunne
Aug 19 2004
parent reply pragma <EricAnderton at yahoo dot com> <pragma_member pathlink.com> writes:
Not a bad idea.  Would this be a stand-alone project, or something added to an
existing product, like Subversion or CVS?

The only thing that comes to mind is: how would you even attempt to define
semantic merging and versioning in any language?  Are you talking about making
sure that merged sources compile okay, or is it something deeper than a
unittest?

- Pragma

In article <cg2i31$23fu$1 digitaldaemon.com>, jdunne4 bradley.edu says...
I'm not sure if this is the right place to throw up an idea like this, but there
seem to be an astonishing number of competent developers here to offer
insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond
and bounce ideas back off me!

What would you think of a source code revision system that does not work on
line-by-line code differences, but rather semantical differences?  This would be
mainly targeted at D source code, since it lends well to this type of revision
system.  The lack of a pre-processor combined with the concept of modules makes
this language an ideal target.

Pros:
1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
2)  Easy merging of codebases (trunks)
3)  Easy conflict detection during merges (check function call parameters, etc.)
4)  Could spot possible compile errors
5)  Code can be regenerated to conform to a formatting standard
6)  Accepts only correct code (possible con...)

Cons:
1)  Maintaining comments and their positions in the code becomes difficult,
since they are not compilable elements
2)  Somewhat difficult implementation

A new patch/diff toolset would need to be created to accomodate this new
semantic revision control system as well.

Please, let me know what you think!

James Dunne
Aug 19 2004
parent reply Jaymz <jdunne4 bradley.edu> writes:
Let's see...

Upon first design, this could just be a simple stand-alone project implemented
for the D language, consisting of a defined patch-format and a patch/diff-like
toolset.  After all, we've got the front-end source to D already!  That could
*possibly* make this simpler to implement, as it contains all the data
structures necessary to parse, analyze, and possibly re-create the code with.

How I see the "diff" tool working:
1)  Lex & parse the source files
2)  Create semantic tree representation of the original & new code
3)  Compare new code's semantic tree with original code's semantic tree
4)  Output a series of simple, defined operations to transform the original
code's semantic tree into the new code's semantic tree.

And the "patch" tool would do basically the inverse of the diff tool:
1)  Lex & parse the target source file
2)  Create semantic tree representation of the target source code
3)  Apply defined operations on the semantic tree
4)  Rebuild the target code from the modified semantic tree, possibly conforming
to a given formatting standard, or using hints provided by the diff tool to
recreate the formatting of the original file.

This type of patch/diff toolset could handle the creation of an entire module,
simply defined by "create" operations on an "empty" semantic tree.

Let me know what you all think of this.  Thanks for your input, Pragma!


In article <cg2mmu$266i$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
Not a bad idea.  Would this be a stand-alone project, or something added to an
existing product, like Subversion or CVS?

The only thing that comes to mind is: how would you even attempt to define
semantic merging and versioning in any language?  Are you talking about making
sure that merged sources compile okay, or is it something deeper than a
unittest?

- Pragma

In article <cg2i31$23fu$1 digitaldaemon.com>, jdunne4 bradley.edu says...
I'm not sure if this is the right place to throw up an idea like this, but there
seem to be an astonishing number of competent developers here to offer
insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond
and bounce ideas back off me!

What would you think of a source code revision system that does not work on
line-by-line code differences, but rather semantical differences?  This would be
mainly targeted at D source code, since it lends well to this type of revision
system.  The lack of a pre-processor combined with the concept of modules makes
this language an ideal target.

Pros:
1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
2)  Easy merging of codebases (trunks)
3)  Easy conflict detection during merges (check function call parameters, etc.)
4)  Could spot possible compile errors
5)  Code can be regenerated to conform to a formatting standard
6)  Accepts only correct code (possible con...)

Cons:
1)  Maintaining comments and their positions in the code becomes difficult,
since they are not compilable elements
2)  Somewhat difficult implementation

A new patch/diff toolset would need to be created to accomodate this new
semantic revision control system as well.

Please, let me know what you think!

James Dunne
Aug 19 2004
next sibling parent reply Berin Loritsch <bloritsch d-haven.org> writes:
Jaymz wrote:
 Let's see...
 
 Upon first design, this could just be a simple stand-alone project implemented
 for the D language, consisting of a defined patch-format and a patch/diff-like
 toolset.  After all, we've got the front-end source to D already!  That could
 *possibly* make this simpler to implement, as it contains all the data
 structures necessary to parse, analyze, and possibly re-create the code with.
 
 How I see the "diff" tool working:
 1)  Lex & parse the source files
 2)  Create semantic tree representation of the original & new code
 3)  Compare new code's semantic tree with original code's semantic tree
 4)  Output a series of simple, defined operations to transform the original
 code's semantic tree into the new code's semantic tree.
 
 And the "patch" tool would do basically the inverse of the diff tool:
 1)  Lex & parse the target source file
 2)  Create semantic tree representation of the target source code
 3)  Apply defined operations on the semantic tree
 4)  Rebuild the target code from the modified semantic tree, possibly
conforming
 to a given formatting standard, or using hints provided by the diff tool to
 recreate the formatting of the original file.
 
 This type of patch/diff toolset could handle the creation of an entire module,
 simply defined by "create" operations on an "empty" semantic tree.
 
 Let me know what you all think of this.  Thanks for your input, Pragma!
 
If you start by getting the diff/patch utilities working properly, with a format compatible with the unix diff/patch utilities, then you could specify it as the diff/patch util for the CVS or SVN repos. That would be the only real level of integration you need. I will say this: (ir)Rational ClearCase tries to use this technique as much as possible with abismal results. From what I understand, the PowerBuilder integration works decently, but the XML diff tool is worse than their line diff tool (which still randomizes things). If you get the diff/patch utility right, I will be very impressed. Just be careful to focus only on diff/patch and not try to have a tool that does a whole bunch of stuff. KISS
Aug 19 2004
parent reply Jaymz <jdunne4 bradley.edu> writes:
In article <cg2rcg$290b$1 digitaldaemon.com>, Berin Loritsch says...
Jaymz wrote:
 Let's see...
 
 Upon first design, this could just be a simple stand-alone project implemented
 for the D language, consisting of a defined patch-format and a patch/diff-like
 toolset.  After all, we've got the front-end source to D already!  That could
 *possibly* make this simpler to implement, as it contains all the data
 structures necessary to parse, analyze, and possibly re-create the code with.
 
 How I see the "diff" tool working:
 1)  Lex & parse the source files
 2)  Create semantic tree representation of the original & new code
 3)  Compare new code's semantic tree with original code's semantic tree
 4)  Output a series of simple, defined operations to transform the original
 code's semantic tree into the new code's semantic tree.
 
 And the "patch" tool would do basically the inverse of the diff tool:
 1)  Lex & parse the target source file
 2)  Create semantic tree representation of the target source code
 3)  Apply defined operations on the semantic tree
 4)  Rebuild the target code from the modified semantic tree, possibly
conforming
 to a given formatting standard, or using hints provided by the diff tool to
 recreate the formatting of the original file.
 
 This type of patch/diff toolset could handle the creation of an entire module,
 simply defined by "create" operations on an "empty" semantic tree.
 
 Let me know what you all think of this.  Thanks for your input, Pragma!
 
If you start by getting the diff/patch utilities working properly, with a format compatible with the unix diff/patch utilities, then you could specify it as the diff/patch util for the CVS or SVN repos. That would be the only real level of integration you need. I will say this: (ir)Rational ClearCase tries to use this technique as much as possible with abismal results. From what I understand, the PowerBuilder integration works decently, but the XML diff tool is worse than their line diff tool (which still randomizes things). If you get the diff/patch utility right, I will be very impressed. Just be careful to focus only on diff/patch and not try to have a tool that does a whole bunch of stuff. KISS
Unfortunately, I don't see how I could create a format compatible with the unix diff/patch utilities which are line-based, using a semantic tree-based modification scheme. The format would have to be entirely different. I could, however, make my toolset support the command-line arguments of the original diff/patch utilities, ignoring now senseless ones, which would be the best way to go. This would not be a necessarily bad thing for SVN use ... if you make the decision to use my diff/patch utilities from the start, as the new patch format wouldn't be compatible with the unix diff/patch utilities patch format. SVN really doesn't care what the diff/patch format that it stores in its database is, AFAIK. It simply relies on correct operation from diff/patch to do its work. And sorry, I haven't used any of the products to which you made mention: ClearCase or PowerBuilder. Could you post an example of "abysmal results" so we can see what NOT to produce? :-) I do like to develop tools that produce qualiy results -- this is probably due to my delusion that I have unlimited project development time, and that a project is never quite "done" ;). BTW, what's w/ the KISS? Thanks for your comments! James Dunne
Aug 19 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Thu, 19 Aug 2004 19:01:07 +0000 (UTC), Jaymz <jdunne4 bradley.edu> 
wrote:

 In article <cg2rcg$290b$1 digitaldaemon.com>, Berin Loritsch says...
 Jaymz wrote:
 Let's see...

 Upon first design, this could just be a simple stand-alone project 
 implemented
 for the D language, consisting of a defined patch-format and a 
 patch/diff-like
 toolset.  After all, we've got the front-end source to D already!  
 That could
 *possibly* make this simpler to implement, as it contains all the data
 structures necessary to parse, analyze, and possibly re-create the 
 code with.

 How I see the "diff" tool working:
 1)  Lex & parse the source files
 2)  Create semantic tree representation of the original & new code
 3)  Compare new code's semantic tree with original code's semantic tree
 4)  Output a series of simple, defined operations to transform the 
 original
 code's semantic tree into the new code's semantic tree.

 And the "patch" tool would do basically the inverse of the diff tool:
 1)  Lex & parse the target source file
 2)  Create semantic tree representation of the target source code
 3)  Apply defined operations on the semantic tree
 4)  Rebuild the target code from the modified semantic tree, possibly 
 conforming
 to a given formatting standard, or using hints provided by the diff 
 tool to
 recreate the formatting of the original file.

 This type of patch/diff toolset could handle the creation of an entire 
 module,
 simply defined by "create" operations on an "empty" semantic tree.

 Let me know what you all think of this.  Thanks for your input, Pragma!
If you start by getting the diff/patch utilities working properly, with a format compatible with the unix diff/patch utilities, then you could specify it as the diff/patch util for the CVS or SVN repos. That would be the only real level of integration you need. I will say this: (ir)Rational ClearCase tries to use this technique as much as possible with abismal results. From what I understand, the PowerBuilder integration works decently, but the XML diff tool is worse than their line diff tool (which still randomizes things). If you get the diff/patch utility right, I will be very impressed. Just be careful to focus only on diff/patch and not try to have a tool that does a whole bunch of stuff. KISS
Unfortunately, I don't see how I could create a format compatible with the unix diff/patch utilities which are line-based, using a semantic tree-based modification scheme. The format would have to be entirely different. I could, however, make my toolset support the command-line arguments of the original diff/patch utilities, ignoring now senseless ones, which would be the best way to go. This would not be a necessarily bad thing for SVN use ... if you make the decision to use my diff/patch utilities from the start, as the new patch format wouldn't be compatible with the unix diff/patch utilities patch format. SVN really doesn't care what the diff/patch format that it stores in its database is, AFAIK. It simply relies on correct operation from diff/patch to do its work. And sorry, I haven't used any of the products to which you made mention: ClearCase or PowerBuilder. Could you post an example of "abysmal results" so we can see what NOT to produce? :-) I do like to develop tools that produce qualiy results -- this is probably due to my delusion that I have unlimited project development time, and that a project is never quite "done" ;). BTW, what's w/ the KISS? Thanks for your comments!
KISS == Keep It Simple Stupid. And before you take any offense, none was intended (I assume), it's a somewhat common acronymm meaning simply that you should attempt not to *over* complicate things. Regan p.s. I think your idea is great. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Aug 19 2004
parent reply Berin Loritsch <bloritsch d-haven.org> writes:
Regan Heath wrote:

 On Thu, 19 Aug 2004 19:01:07 +0000 (UTC), Jaymz <jdunne4 bradley.edu> 
 wrote:
 
 And sorry, I haven't used any of the products to which you made mention:
 ClearCase or PowerBuilder.  Could you post an example of "abysmal 
 results" so we
 can see what NOT to produce?  :-)  I do like to develop tools that 
 produce
 qualiy results -- this is probably due to my delusion that I have 
 unlimited
 project development time, and that a project is never quite "done" ;).

 BTW, what's w/ the KISS?  Thanks for your comments!
KISS == Keep It Simple Stupid. And before you take any offense, none was intended (I assume), it's a somewhat common acronymm meaning simply that you should attempt not to *over* complicate things. Regan p.s. I think your idea is great.
That was its intention (how did you get this message and I didn't?). Anyway for an example of bad integration, ClearCase's XML merge is perfect. If the XML is not properly formatted, the tool will choke beyond reason (this applies to original or new XML documents). By that I mean the tool will attempt to treat the whole block as one element. For example: OLD: <!-- parse error here --> <element <embedded type="element"/> </element> NEW <!-- parse error fixed --> <element> <embedded type="element"/> </element> CONFLICT: The element "element" conflicts with "element <embedded type="element"/> "
Aug 19 2004
parent reply Jaymz <jdunne4 bradley.edu> writes:
That was odd, the posts got out of order and I didn't see Regan's post
initially... Anyways...

Yeah, I'm definitely a fan of KISS... Heh, punny punny... But seriously, I'd
like to keep this revision control system on the ground:  simple and reliable,
yet very powerful.  It seems as though after a nice evening of playing Doom 3, I
have no will to be near a computer until at least tomorrow morning... Jesus
Christ, that game ... wow ...  Then, this weekend is gonna be crazy, moving back
into house at skool.

If anyone, in the down-time here, would like to poke thru the D front-end
parser/analyzer code and possibly produce some nice D code to achieve the same
effect, that'd be sweet.  If not, that's cool too, I'll just do it once I'm at
skool.

On the train ride home from work today I was jottin' down some ideas on how to
do a tree-diff operation.  I started writing out D code in an XML-like format
just to see how I could process a given module as a syntactic tree and
rearrange, add, and remove parts of it.  I came up with a quickie example
XML-like tree: (some declarations are useless, but exist for example's sake)

D source module:

module addition;

import std.c.stdio;

alias int myInt;

int add(int a, int b)
out {
assert(a + b == value);
}
body {
return (a + b);
}

Corresponding XML tree definition:

<module name="addition">
<import name="std.c.stdio"/>
<alias type="d:int" name="myInt"/>
<function name="add" return="d:int">
<param type="d:int" modifier="in" name="a"/>
<param type="d:int" modifier="in" name="b"/>
<out>
<assert>
<opEquals>
<left>
<opAdd>
<left><ref-param name="a"/></left>
<right><ref-param name="b"/></right>
</opAdd>
</left>
<right>
<return-value/>
</right>
</assert>
</out>
<body>
<return>
<paren>
<opAdd>
<left><ref-param name="a"/></left>
<right><ref-param name="b"/></right>
</opAdd>
</paren>
</return>
</body>
</function>
</module>

As you can see, it's pretty much a syntactic representation of the D module.  It
looks similar to a CodeDOM structure, if you've ever used that from .NET.  Of

like <linecomment>, <blockcomment>, <nestcomment>, <blankline>, etc to preserve
spacing and comments.  All of the D operators as tags should be defined by their
corresponding op* names.  Feel absolutely free to rip on my definition schema
here, I just made it up without *much* thought.  Admittedly, there was *some*
thought.

Now to the real meat...

Defining the operations to ADD and REMOVE sections is easy enough, just treat
the tree in an in-order-traversal manner and linearly add/remove tags (start and
end tags must be matched, of course).  Process the two trees just as diff
processes two files, trying to match them up tag-by-tag wherever possible.  This
makes a huge benefit in terms of simple changes that have a major impact on the
formatting of a document.

For example, in unix diff/patch you indent a block of code, all the affected
lines are included in the diff.  But when using /my/ utility, the affected start
and end tags of the if-statement are created and the internal code block is left
completely alone, making the diff much more compressed.  Here, we win against
the unix diff utility, whereas the worst case would be a draw with the unix diff
utility.

To try to complicate things, defining a MOVE operation without falling back to
an ADD and REMOVE operation should be considered.  Of course, in the initial
implementation it could just very well be not defined, and we could rely on
ADD/REMOVE, just as the unix diff/patch utilities do.  However, in the future
this could be a major source of improvement.

Let me know what you all think!

James Dunne
Aug 19 2004
parent reply J C Calvarese <jcc7 cox.net> writes:
Jaymz wrote:
 That was odd, the posts got out of order and I didn't see Regan's post
 initially... Anyways...
 
...
 On the train ride home from work today I was jottin' down some ideas on how to
 do a tree-diff operation.  I started writing out D code in an XML-like format
 just to see how I could process a given module as a syntactic tree and
 rearrange, add, and remove parts of it.  I came up with a quickie example
 XML-like tree: (some declarations are useless, but exist for example's sake)
 
 D source module:
 
 module addition;
 
 import std.c.stdio;
 
 alias int myInt;
 
 int add(int a, int b)
 out {
 assert(a + b == value);
 }
 body {
 return (a + b);
 }
 
 Corresponding XML tree definition:
 
 <module name="addition">
 <import name="std.c.stdio"/>
 <alias type="d:int" name="myInt"/>
<snip> This discussion reminds me of the DML idea that was mentioned a while back (I think it was brought up 2 or 3 years ago): http://jdanielsmith.org/DML/ I don't know how similar this is what you're thinking, but it is XML-based. -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Aug 19 2004
parent Jaymz <jdunne4 bradley.edu> writes:
In article <cg3qps$2qcg$1 digitaldaemon.com>, J C Calvarese says...
Jaymz wrote:
 That was odd, the posts got out of order and I didn't see Regan's post
 initially... Anyways...
 
...
 On the train ride home from work today I was jottin' down some ideas on how to
 do a tree-diff operation.  I started writing out D code in an XML-like format
 just to see how I could process a given module as a syntactic tree and
 rearrange, add, and remove parts of it.  I came up with a quickie example
 XML-like tree: (some declarations are useless, but exist for example's sake)
 
 D source module:
 
 module addition;
 
 import std.c.stdio;
 
 alias int myInt;
 
 int add(int a, int b)
 out {
 assert(a + b == value);
 }
 body {
 return (a + b);
 }
 
 Corresponding XML tree definition:
 
 <module name="addition">
 <import name="std.c.stdio"/>
 <alias type="d:int" name="myInt"/>
<snip> This discussion reminds me of the DML idea that was mentioned a while back (I think it was brought up 2 or 3 years ago): http://jdanielsmith.org/DML/ I don't know how similar this is what you're thinking, but it is XML-based. -- Justin (a/k/a jcc7) http://jcc_7.tripod.com/d/
Thanks for your comment, but I was just trying to convey the idea of the syntactic tree using an XML-like form. I'm not going to *actually use* any form of XML or DML in the syntactic tree's definition. I'll be keeping that all in memory in a DOM structure. Which reminds me, I did do a little poring over the DMD front-end code last night, and it looks very clean and easy to port over to D for just the syntactic analysis. The data structures used are pretty clear and can easily be suited to this project. Really, now, the only design problem that I can see is how to define the patch format, possibly in a human-readable way. I'm leaning towards an extensible binary format using chunks (like EBML does), since a text-based patch format would get rather lengthy. Does anyone know if SVN needs the patch data to be in ASCII text format, or does it not care? James Dunne
Aug 20 2004
prev sibling next sibling parent reply Ilya Minkov <minkov cs.tum.edu> writes:
If it is defined over a tree, i imagine it fairly unstable. Not that it 
couldn't be done, but i'm somewhat sceptical, also considering the 
extandable syntax which might come in 2.x. Are there any good tree 
diff/merge tools already? Any open-source ones? If there are such tools 
for XML, one could define some mapping between D and XML.

If you define it over a stream of lexemes, it will be wonderfully 
robust, but i don't imagibe it being too useful. It will at most take 
care of formatting issue (different contributors prefer different 
formatting), but projects now use some kind of an auto-formatter with 
certain settings, which also provides (an admittably much cruder) solution.

What i would think of being more valuable for now, would be a 
documentation system and code formatter written completely in D.

-eye
Aug 19 2004
parent reply Jaymz <jdunne4 bradley.edu> writes:
In article <cg2tk8$2amn$1 digitaldaemon.com>, Ilya Minkov says...
If it is defined over a tree, i imagine it fairly unstable. Not that it 
couldn't be done, but i'm somewhat sceptical, also considering the 
extandable syntax which might come in 2.x. Are there any good tree 
diff/merge tools already? Any open-source ones? If there are such tools 
for XML, one could define some mapping between D and XML.

If you define it over a stream of lexemes, it will be wonderfully 
robust, but i don't imagibe it being too useful. It will at most take 
care of formatting issue (different contributors prefer different 
formatting), but projects now use some kind of an auto-formatter with 
certain settings, which also provides (an admittably much cruder) solution.

What i would think of being more valuable for now, would be a 
documentation system and code formatter written completely in D.

-eye
Well, it would have to be defined with something a bit more complex than just a tree structure. A tree-based structure, like a DOM, would be ideal. I don't see how that'd be unstable. It should be defined over a stream of lexemes, of course. That's what the DOM will hold. I'm not too keen on having this be another implementation of a source code re-formatter. It's merely just a different way of patching source code using the assumption that we're reading SOURCE CODE, not just arbitrary lines of text. The code re-formatting comes out of the need to reproduce the code from the DOM. A documentation system for D written entirely in D? Just a few simple changes to Doxygen it sounds like, minus the initial work of porting to D ;). This could be a whole different pile of monkeys if class meta-data support was in D *WINK WINK*. I saw a few threads of discussion on meta-data, but it didn't seem to end up anywhere. Gr. I don't see what the big issue is, the symbol table doesn't take up *that* much room. I personally would like a bit more flexibility at the cost of the executable size being bumped up a few KB. BTW, could you elaborate a bit on your skepticism? I'm a bit confused here. Thanks! James Dunne
Aug 19 2004
parent Ilya Minkov <minkov cs.tum.edu> writes:
Jaymz schrieb:

 Well, it would have to be defined with something a bit more complex than just a
 tree structure.  A tree-based structure, like a DOM, would be ideal.  I don't
 see how that'd be unstable.  It should be defined over a stream of lexemes, of
 course.  That's what the DOM will hold.
That might work... Although i'd like it somehow independant from most language constructs, and being able to handle new syntax constructs gracefully... more or less like a highliting editor with "levels" recognition does. Perhaps even some extensibility?
 I'm not too keen on having this be another implementation of a source code
 re-formatter.  It's merely just a different way of patching source code using
 the assumption that we're reading SOURCE CODE, not just arbitrary lines of
text.
 The code re-formatting comes out of the need to reproduce the code from the
DOM.
On the other hand the DIFF will not be very human-readable.
 A documentation system for D written entirely in D?  Just a few simple changes
 to Doxygen it sounds like, minus the initial work of porting to D ;).
Hr hr. :)
 This could be a whole different pile of monkeys if class meta-data support was
 in D *WINK WINK*.  I saw a few threads of discussion on meta-data, but it
didn't
 seem  to end up anywhere.  Gr.  I don't see what the big issue is, the symbol
 table doesn't take up *that* much room.  I personally would like a bit more
 flexibility at the cost of the executable size being bumped up a few KB.
The metadata was already there in DLI, and was incomplete, and only the DLI verion of Phobos ever used it. The topic must be raised again in the post-1.0 era. For now, the consensus was that a parser and some custom code generators would have to do the work for the others, and relieve Walter from something unnecessary to do right now. Besides, the metadata was only intended to be used in a program itself. I wonder whether i find some time to bake a D version of my favorite parser gen (COCO/R) and a corresponding D grammar... It would be a great help on creating tools. I started to port the Java version, but after seeing the C version i have come to dislike that for Java and will probably first hack up a C version which outputs D code, then someone else could finish porting it. I am sure that the tool can cope perfectly with D syntax, and the generated code is efficient.
 BTW, could you elaborate a bit on your skepticism?  I'm a bit confused here.
 Thanks!
I don't know, i'm totally new to the matter... That means i'm confused and skeptical. Still, are there any tree diffs out there? One point to consider is that /me and Bill Cox has raised the question of an extentable language, where libraries could introduce new syntax, like in OpenC++ and similar. Walter promised to consider this again in the post-1.0 era. -eye
Aug 19 2004
prev sibling parent reply pragma <EricAnderton at yahoo dot com> <pragma_member pathlink.com> writes:
In article <cg2qa4$28dh$1 digitaldaemon.com>, Jaymz says...
Let's see...

Upon first design, this could just be a simple stand-alone project implemented
for the D language, consisting of a defined patch-format and a patch/diff-like
toolset.  After all, we've got the front-end source to D already!  That could
*possibly* make this simpler to implement, as it contains all the data
structures necessary to parse, analyze, and possibly re-create the code with.

How I see the "diff" tool working:
1)  Lex & parse the source files
2)  Create semantic tree representation of the original & new code
3)  Compare new code's semantic tree with original code's semantic tree
4)  Output a series of simple, defined operations to transform the original
code's semantic tree into the new code's semantic tree.

And the "patch" tool would do basically the inverse of the diff tool:
1)  Lex & parse the target source file
2)  Create semantic tree representation of the target source code
3)  Apply defined operations on the semantic tree
4)  Rebuild the target code from the modified semantic tree, possibly conforming
to a given formatting standard, or using hints provided by the diff tool to
recreate the formatting of the original file.

This type of patch/diff toolset could handle the creation of an entire module,
simply defined by "create" operations on an "empty" semantic tree.

Let me know what you all think of this.  Thanks for your input, Pragma!
I can see the merit in a stand-alone server, but this may be the wrong way to start out. Honestly, I think an add-on module to an existing source control tool might prove much more useful than an outright replacement. Take dsource.org for example: an entire website dedicated to D programming that is backed on Subversion. IMO an extension to Subversion would be far more useful (and easier to implement) to the D community as a whole. All the same, please look at Mango over on dsource if you're going to write a stand-alone server. The I/O and socket portions of that library may give you a good head-start. That aside, I like where you're going with this, especially with 'creating a semantic tree' of the code. I gather this would be some form of pseudocode or XML? I can see this becoming useful for increasing performance if you keep the current semantic tree version on hand at all times. That way one can compare the tree against their own local source to make sure they're not altering other portions of the application too badly (i.e. trying not to violate contracts across a whole project) Another thing, a lot of the spirit of what you're proposing here is captured in D's in/out/body and unittest contracting system. Have you considered incorporating these statements in particular to deepen the semantic meaning of code when you assess it? :) - Pragma
Aug 19 2004
parent reply Jaymz <jdunne4 bradley.edu> writes:
In article <cg2vh6$2c5t$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
<<snip>>
I can see the merit in a stand-alone server, but this may be the wrong way to
start out.  Honestly, I think an add-on module to an existing source control
tool might prove much more useful than an outright replacement.  Take
dsource.org for example: an entire website dedicated to D programming that is
backed on Subversion.  IMO an extension to Subversion would be far more useful
(and easier to implement) to the D community as a whole. 

All the same, please look at Mango over on dsource if you're going to write a
stand-alone server.  The I/O and socket portions of that library may give you a
good head-start.

That aside, I like where you're going with this, especially with 'creating a
semantic tree' of the code.  I gather this would be some form of pseudocode or
XML?  I can see this becoming useful for increasing performance if you keep the
current semantic tree version on hand at all times.  That way one can compare
the tree against their own local source to make sure they're not altering other
portions of the application too badly (i.e. trying not to violate contracts
across a whole project)

Another thing, a lot of the spirit of what you're proposing here is captured in
D's in/out/body and unittest contracting system.  Have you considered
incorporating these statements in particular to deepen the semantic meaning of
code when you assess it?  :)

- Pragma
Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy of the code on hand. I do like the system and am using it personally, just never cared to see its code ;). But if you say it is easier to extend, then I will believe you. An SVN extension is definitely a possible direction in the future for this project, assuming the proof-of-concept diff/patch toolset works. After all, I didn't really have any ambition to create a new stand-alone server in the first place. You're saying I could gather contract information from the in, out, invariant, etc. constructs that D provides and make sure the coder isn't going to violate them with the code commit? Wow, that takes balls. Actually I don't think that's possible. How are you to know at compile time if the coder is violating any contracts? Where do you get your values to test against the contracts? And finally, HOW do you represent a contract in an evaluative way, assuming you magically have values provided by the committed code to test against the contracts? I don't think the contract information would be too useful in a revision control system, and it wouldn't be very language-independent either. But it's a cool idea, nonetheless. Er, anyway... The real intent behind building the semantic tree of the module is to have a uniform way of accessing functions, structures, classes in order to compare them and change them easily. I could foresee this being defined by a relatively large inter-related class hierarchy of things like expressions, statements, etc... .. Wait a tick... that's a SYNTACTIC tree... Aww dammit all. My bad... Well, a semantic tree is really an extension of a syntactic tree, isn't it? Oh God, my head... Someone clarify myself for me. James Dunne
Aug 19 2004
parent reply pragma <EricAnderton at yahoo dot com> <pragma_member pathlink.com> writes:
In article <cg31on$2ega$1 digitaldaemon.com>, Jaymz says...
Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy
of the code on hand.  I do like the system and am using it personally, just
never cared to see its code ;).  But if you say it is easier to extend, then I
will believe you.  An SVN extension is definitely a possible direction in the
future for this project, assuming the proof-of-concept diff/patch toolset works.
Well, I haven't done it personally, but word has it that it has an event model of some kind that was written with extensibility in mind. :)
You're saying I could gather contract information from the in, out, invariant,
etc. constructs that D provides and make sure the coder isn't going to violate
them with the code commit?  Wow, that takes balls.
Um, thank you? I wasn't aware of that statement being all that out there, but in retrospect it's pretty bogus. Its probably all this going back and forth between ColdFusion (work) and D (here in the NG). But I am on the same page now and will restrain from making any future "ballsy" comments. ;)
Actually I don't think that's possible.  How are you to know at compile time if
the coder is violating any contracts?  Where do you get your values to test
against the contracts?  And finally, HOW do you represent a contract in an
evaluative way, assuming you magically have values provided by the committed
code to test against the contracts?  I don't think the contract information
would be too useful in a revision control system, and it wouldn't be very
language-independent either.  But it's a cool idea, nonetheless.
Okay, I see where you're coming from now. I was thinking more at the compilation and unittest level, where testing DBC *really* comes into play. You're right: you can't use that kind of information when you're just looking at how the code is put together. Of course there's no reason why you couldn't get a nightly or on-demand build to do some analysis when an assert or static assert fires in a unittest. After all that processing, wouldn't it be pretty easy to correlate a line number and error message with a particular change ... especially since your semantic pass will know how everthing is interrelated?
Er, anyway... The real intent behind building the semantic tree of the module is
to have a uniform way of accessing functions, structures, classes in order to
compare them and change them easily.  I could foresee this being defined by a
relatively large inter-related class hierarchy of things like expressions,
statements, etc...
Gotcha. So if the revision system can acutally "understand" the code it's processing, then it'll be less prone to screwups and possibly catch developer mistakes as well...
.. Wait a tick... that's a SYNTACTIC tree... Aww dammit all.  My bad...

Well, a semantic tree is really an extension of a syntactic tree, isn't it?  Oh
God, my head...  Someone clarify myself for me.
I'll take a stab at that one. I've always understood the semantics of a program to be derived from the syntax used. Yes, it's almost 1-for-1 between meaining and the syntax used, especially in D. The difference lies in how one can do some things in more than one way, like using "?" instead of "if()" and so on: both have the same semantic meaning, but the syntax is totally different. - Pragma
Aug 19 2004
parent Jaymz <jdunne4 bradley.edu> writes:
In article <cg358p$2hho$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
In article <cg31on$2ega$1 digitaldaemon.com>, Jaymz says...
Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy
of the code on hand.  I do like the system and am using it personally, just
never cared to see its code ;).  But if you say it is easier to extend, then I
will believe you.  An SVN extension is definitely a possible direction in the
future for this project, assuming the proof-of-concept diff/patch toolset works.
Well, I haven't done it personally, but word has it that it has an event model of some kind that was written with extensibility in mind. :)
You're saying I could gather contract information from the in, out, invariant,
etc. constructs that D provides and make sure the coder isn't going to violate
them with the code commit?  Wow, that takes balls.
Um, thank you? I wasn't aware of that statement being all that out there, but in retrospect it's pretty bogus. Its probably all this going back and forth between ColdFusion (work) and D (here in the NG). But I am on the same page now and will restrain from making any future "ballsy" comments. ;)
Actually I don't think that's possible.  How are you to know at compile time if
the coder is violating any contracts?  Where do you get your values to test
against the contracts?  And finally, HOW do you represent a contract in an
evaluative way, assuming you magically have values provided by the committed
code to test against the contracts?  I don't think the contract information
would be too useful in a revision control system, and it wouldn't be very
language-independent either.  But it's a cool idea, nonetheless.
Okay, I see where you're coming from now. I was thinking more at the compilation and unittest level, where testing DBC *really* comes into play. You're right: you can't use that kind of information when you're just looking at how the code is put together. Of course there's no reason why you couldn't get a nightly or on-demand build to do some analysis when an assert or static assert fires in a unittest. After all that processing, wouldn't it be pretty easy to correlate a line number and error message with a particular change ... especially since your semantic pass will know how everthing is interrelated?
I thought you were gonna *restrain* from making future "ballsy" comments? ;). What type of algorithm could be developed based on a commit and a static assert firing to lead to a possible collection of offending line numbers? Now THAT would be really interesting, and is technically feasible!
Er, anyway... The real intent behind building the semantic tree of the module is
to have a uniform way of accessing functions, structures, classes in order to
compare them and change them easily.  I could foresee this being defined by a
relatively large inter-related class hierarchy of things like expressions,
statements, etc...
Gotcha. So if the revision system can acutally "understand" the code it's processing, then it'll be less prone to screwups and possibly catch developer mistakes as well...
Well, the project's scope certainly has escalated from a simple syntactic tree based revision control system to an intelligent learning machine that'll automagically fix your mistakes and know what you *really* want to do. LOL. Not to pick on you, Pragma. ;) Hey! Why don't we just build a neural network of a few billion nodes and train it on D grammar and semantics 'til it's sick? Oh wait, we already got a couple hundred of 'em walkin around.. Dammit. lol.
.. Wait a tick... that's a SYNTACTIC tree... Aww dammit all.  My bad...

Well, a semantic tree is really an extension of a syntactic tree, isn't it?  Oh
God, my head...  Someone clarify myself for me.
I'll take a stab at that one. I've always understood the semantics of a program to be derived from the syntax used. Yes, it's almost 1-for-1 between meaining and the syntax used, especially in D. The difference lies in how one can do some things in more than one way, like using "?" instead of "if()" and so on: both have the same semantic meaning, but the syntax is totally different. - Pragma
<not snobby>I do realize the difference between syntax and semantics</not snobby>. And in general, a language's syntax will strongly reflect its semantics (unless you complain of such silly things as READABILITY ... damn VB coders). Regardless, should the revision control system be based on a /syntactic/ or /semantic/ tree representation of the code? To contradict myself, as I always do, I don't see much benefit now in a /semantic/ tree for a simple revision control system. :) I do hope I've successfully confused everyone now. The master of deception and contradiction will be back tomorrow morning. James Dunne
Aug 19 2004