digitalmars.D - D source code revision system idea

jdunne4 bradley.edu (24/24) Aug 19 2004 I'm not sure if this is the right place to throw up an idea like this, b...

pragma (8/32) Aug 19 2004 Not a bad idea. Would this be a stand-alone project, or something added...

Jaymz (24/63) Aug 19 2004 Let's see...

Berin Loritsch (12/40) Aug 19 2004 If you start by getting the diff/patch utilities working properly, with

Jaymz (20/60) Aug 19 2004 Unfortunately, I don't see how I could create a format compatible with t...

Regan Heath (10/92) Aug 19 2004 KISS == Keep It Simple Stupid.

Berin Loritsch (21/45) Aug 19 2004 That was its intention (how did you get this message and I didn't?).

Jaymz (89/89) Aug 19 2004 That was odd, the posts got out of order and I didn't see Regan's post

J C Calvarese (10/40) Aug 19 2004

Jaymz (16/56) Aug 20 2004 Thanks for your comment, but I was just trying to convey the idea of the

Ilya Minkov (13/13) Aug 19 2004 If it is defined over a tree, i imagine it fairly unstable. Not that it

Jaymz (19/32) Aug 19 2004 Well, it would have to be defined with something a bit more complex than...

Ilya Minkov (27/44) Aug 19 2004 That might work... Although i'd like it somehow independant from most

pragma (22/44) Aug 19 2004 I can see the merit in a stand-alone server, but this may be the wrong w...

Jaymz (29/50) Aug 19 2004 In article , pragma

pragma (27/50) Aug 19 2004 Well, I haven't done it personally, but word has it that it has an event...

Jaymz (24/79) Aug 19 2004 I thought you were gonna *restrain* from making future "ballsy" comments...

jdunne4 bradley.edu writes:

I'm not sure if this is the right place to throw up an idea like this, but there
seem to be an astonishing number of competent developers here to offer
insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond
and bounce ideas back off me!

What would you think of a source code revision system that does not work on
line-by-line code differences, but rather semantical differences?  This would be
mainly targeted at D source code, since it lends well to this type of revision
system.  The lack of a pre-processor combined with the concept of modules makes
this language an ideal target.

Pros:
1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
2)  Easy merging of codebases (trunks)
3)  Easy conflict detection during merges (check function call parameters, etc.)
4)  Could spot possible compile errors
5)  Code can be regenerated to conform to a formatting standard
6)  Accepts only correct code (possible con...)

Cons:
1)  Maintaining comments and their positions in the code becomes difficult,
since they are not compilable elements
2)  Somewhat difficult implementation

A new patch/diff toolset would need to be created to accomodate this new
semantic revision control system as well.

Please, let me know what you think!

James Dunne

Aug 19 2004

pragma <EricAnderton at yahoo dot com> <pragma_member pathlink.com> writes:

Not a bad idea.  Would this be a stand-alone project, or something added to an
existing product, like Subversion or CVS?

The only thing that comes to mind is: how would you even attempt to define
semantic merging and versioning in any language?  Are you talking about making
sure that merged sources compile okay, or is it something deeper than a
unittest?

- Pragma

In article <cg2i31$23fu$1 digitaldaemon.com>, jdunne4 bradley.edu says...
I'm not sure if this is the right place to throw up an idea like this, but there
seem to be an astonishing number of competent developers here to offer
insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond
and bounce ideas back off me!

What would you think of a source code revision system that does not work on
line-by-line code differences, but rather semantical differences?  This would be
mainly targeted at D source code, since it lends well to this type of revision
system.  The lack of a pre-processor combined with the concept of modules makes
this language an ideal target.

Pros:
1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
2)  Easy merging of codebases (trunks)
3)  Easy conflict detection during merges (check function call parameters, etc.)
4)  Could spot possible compile errors
5)  Code can be regenerated to conform to a formatting standard
6)  Accepts only correct code (possible con...)

Cons:
1)  Maintaining comments and their positions in the code becomes difficult,
since they are not compilable elements
2)  Somewhat difficult implementation

A new patch/diff toolset would need to be created to accomodate this new
semantic revision control system as well.

Please, let me know what you think!

James Dunne

Aug 19 2004

Jaymz <jdunne4 bradley.edu> writes:

Let's see...

Upon first design, this could just be a simple stand-alone project implemented
for the D language, consisting of a defined patch-format and a patch/diff-like
toolset.  After all, we've got the front-end source to D already!  That could
*possibly* make this simpler to implement, as it contains all the data
structures necessary to parse, analyze, and possibly re-create the code with.

How I see the "diff" tool working:
1)  Lex & parse the source files
2)  Create semantic tree representation of the original & new code
3)  Compare new code's semantic tree with original code's semantic tree
4)  Output a series of simple, defined operations to transform the original
code's semantic tree into the new code's semantic tree.

And the "patch" tool would do basically the inverse of the diff tool:
1)  Lex & parse the target source file
2)  Create semantic tree representation of the target source code
3)  Apply defined operations on the semantic tree
4)  Rebuild the target code from the modified semantic tree, possibly conforming
to a given formatting standard, or using hints provided by the diff tool to
recreate the formatting of the original file.

This type of patch/diff toolset could handle the creation of an entire module,
simply defined by "create" operations on an "empty" semantic tree.

Let me know what you all think of this.  Thanks for your input, Pragma!


In article <cg2mmu$266i$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
Not a bad idea.  Would this be a stand-alone project, or something added to an
existing product, like Subversion or CVS?

The only thing that comes to mind is: how would you even attempt to define
semantic merging and versioning in any language?  Are you talking about making
sure that merged sources compile okay, or is it something deeper than a
unittest?

- Pragma

In article <cg2i31$23fu$1 digitaldaemon.com>, jdunne4 bradley.edu says...
I'm not sure if this is the right place to throw up an idea like this, but there
seem to be an astonishing number of competent developers here to offer
insightful feedback, so I'll go ahead and toss it up ;).  Feel free to respond
and bounce ideas back off me!

What would you think of a source code revision system that does not work on
line-by-line code differences, but rather semantical differences?  This would be
mainly targeted at D source code, since it lends well to this type of revision
system.  The lack of a pre-processor combined with the concept of modules makes
this language an ideal target.

Pros:
1)  ***More robust patching ability***  (Not line-based, so no "fuzz" needed)
2)  Easy merging of codebases (trunks)
3)  Easy conflict detection during merges (check function call parameters, etc.)
4)  Could spot possible compile errors
5)  Code can be regenerated to conform to a formatting standard
6)  Accepts only correct code (possible con...)

Cons:
1)  Maintaining comments and their positions in the code becomes difficult,
since they are not compilable elements
2)  Somewhat difficult implementation

A new patch/diff toolset would need to be created to accomodate this new
semantic revision control system as well.

Please, let me know what you think!

James Dunne

Aug 19 2004

Berin Loritsch <bloritsch d-haven.org> writes:

Jaymz wrote:
 Let's see...
 
 Upon first design, this could just be a simple stand-alone project implemented
 for the D language, consisting of a defined patch-format and a patch/diff-like
 toolset.  After all, we've got the front-end source to D already!  That could
 *possibly* make this simpler to implement, as it contains all the data
 structures necessary to parse, analyze, and possibly re-create the code with.
 
 How I see the "diff" tool working:
 1)  Lex & parse the source files
 2)  Create semantic tree representation of the original & new code
 3)  Compare new code's semantic tree with original code's semantic tree
 4)  Output a series of simple, defined operations to transform the original
 code's semantic tree into the new code's semantic tree.
 
 And the "patch" tool would do basically the inverse of the diff tool:
 1)  Lex & parse the target source file
 2)  Create semantic tree representation of the target source code
 3)  Apply defined operations on the semantic tree
 4)  Rebuild the target code from the modified semantic tree, possibly
conforming
 to a given formatting standard, or using hints provided by the diff tool to
 recreate the formatting of the original file.
 
 This type of patch/diff toolset could handle the creation of an entire module,
 simply defined by "create" operations on an "empty" semantic tree.
 
 Let me know what you all think of this.  Thanks for your input, Pragma!
 

If you start by getting the diff/patch utilities working properly, with
a format compatible with the unix diff/patch utilities, then you could
specify it as the diff/patch util for the CVS or SVN repos.  That would
be the only real level of integration you need.

I will say this: (ir)Rational ClearCase tries to use this technique as
much as possible with abismal results.  From what I understand, the
PowerBuilder integration works decently, but the XML diff tool is worse
than their line diff tool (which still randomizes things).

If you get the diff/patch utility right, I will be very impressed.  Just
be careful to focus only on diff/patch and not try to have a tool that
does a whole bunch of stuff.  KISS

Aug 19 2004

Jaymz <jdunne4 bradley.edu> writes:

In article <cg2rcg$290b$1 digitaldaemon.com>, Berin Loritsch says...
Jaymz wrote:
 Let's see...
 
 Upon first design, this could just be a simple stand-alone project implemented
 for the D language, consisting of a defined patch-format and a patch/diff-like
 toolset.  After all, we've got the front-end source to D already!  That could
 *possibly* make this simpler to implement, as it contains all the data
 structures necessary to parse, analyze, and possibly re-create the code with.
 
 How I see the "diff" tool working:
 1)  Lex & parse the source files
 2)  Create semantic tree representation of the original & new code
 3)  Compare new code's semantic tree with original code's semantic tree
 4)  Output a series of simple, defined operations to transform the original
 code's semantic tree into the new code's semantic tree.
 
 And the "patch" tool would do basically the inverse of the diff tool:
 1)  Lex & parse the target source file
 2)  Create semantic tree representation of the target source code
 3)  Apply defined operations on the semantic tree
 4)  Rebuild the target code from the modified semantic tree, possibly
conforming
 to a given formatting standard, or using hints provided by the diff tool to
 recreate the formatting of the original file.
 
 This type of patch/diff toolset could handle the creation of an entire module,
 simply defined by "create" operations on an "empty" semantic tree.
 
 Let me know what you all think of this.  Thanks for your input, Pragma!
 

If you start by getting the diff/patch utilities working properly, with
a format compatible with the unix diff/patch utilities, then you could
specify it as the diff/patch util for the CVS or SVN repos.  That would
be the only real level of integration you need.

I will say this: (ir)Rational ClearCase tries to use this technique as
much as possible with abismal results.  From what I understand, the
PowerBuilder integration works decently, but the XML diff tool is worse
than their line diff tool (which still randomizes things).

If you get the diff/patch utility right, I will be very impressed.  Just
be careful to focus only on diff/patch and not try to have a tool that
does a whole bunch of stuff.  KISS

Unfortunately, I don't see how I could create a format compatible with the unix
diff/patch utilities which are line-based, using a semantic tree-based
modification scheme.  The format would have to be entirely different.  I could,
however, make my toolset support the command-line arguments of the original
diff/patch utilities, ignoring now senseless ones, which would be the best way
to go.

This would not be a necessarily bad thing for SVN use ... if you make the
decision to use my diff/patch utilities from the start, as the new patch format
wouldn't be compatible with the unix diff/patch utilities patch format.  SVN
really doesn't care what the diff/patch format that it stores in its database
is, AFAIK.  It simply relies on correct operation from diff/patch to do its
work.

And sorry, I haven't used any of the products to which you made mention:
ClearCase or PowerBuilder.  Could you post an example of "abysmal results" so we
can see what NOT to produce?  :-)  I do like to develop tools that produce
qualiy results -- this is probably due to my delusion that I have unlimited
project development time, and that a project is never quite "done" ;).

BTW, what's w/ the KISS?  Thanks for your comments!

James Dunne

Aug 19 2004

Regan Heath <regan netwin.co.nz> writes:

On Thu, 19 Aug 2004 19:01:07 +0000 (UTC), Jaymz <jdunne4 bradley.edu> 
wrote:

 In article <cg2rcg$290b$1 digitaldaemon.com>, Berin Loritsch says...
 Jaymz wrote:
 Let's see...

 Upon first design, this could just be a simple stand-alone project 
 implemented
 for the D language, consisting of a defined patch-format and a 
 patch/diff-like
 toolset.  After all, we've got the front-end source to D already!  
 That could
 *possibly* make this simpler to implement, as it contains all the data
 structures necessary to parse, analyze, and possibly re-create the 
 code with.

 How I see the "diff" tool working:
 1)  Lex & parse the source files
 2)  Create semantic tree representation of the original & new code
 3)  Compare new code's semantic tree with original code's semantic tree
 4)  Output a series of simple, defined operations to transform the 
 original
 code's semantic tree into the new code's semantic tree.

 And the "patch" tool would do basically the inverse of the diff tool:
 1)  Lex & parse the target source file
 2)  Create semantic tree representation of the target source code
 3)  Apply defined operations on the semantic tree
 4)  Rebuild the target code from the modified semantic tree, possibly 
 conforming
 to a given formatting standard, or using hints provided by the diff 
 tool to
 recreate the formatting of the original file.

 This type of patch/diff toolset could handle the creation of an entire 
 module,
 simply defined by "create" operations on an "empty" semantic tree.

 Let me know what you all think of this.  Thanks for your input, Pragma!

 If you start by getting the diff/patch utilities working properly, with
 a format compatible with the unix diff/patch utilities, then you could
 specify it as the diff/patch util for the CVS or SVN repos.  That would
 be the only real level of integration you need.

 I will say this: (ir)Rational ClearCase tries to use this technique as
 much as possible with abismal results.  From what I understand, the
 PowerBuilder integration works decently, but the XML diff tool is worse
 than their line diff tool (which still randomizes things).

 If you get the diff/patch utility right, I will be very impressed.  Just
 be careful to focus only on diff/patch and not try to have a tool that
 does a whole bunch of stuff.  KISS

 Unfortunately, I don't see how I could create a format compatible with 
 the unix
 diff/patch utilities which are line-based, using a semantic tree-based
 modification scheme.  The format would have to be entirely different.  I 
 could,
 however, make my toolset support the command-line arguments of the 
 original
 diff/patch utilities, ignoring now senseless ones, which would be the 
 best way
 to go.

 This would not be a necessarily bad thing for SVN use ... if you make the
 decision to use my diff/patch utilities from the start, as the new patch 
 format
 wouldn't be compatible with the unix diff/patch utilities patch format.  
 SVN
 really doesn't care what the diff/patch format that it stores in its 
 database
 is, AFAIK.  It simply relies on correct operation from diff/patch to do 
 its
 work.

 And sorry, I haven't used any of the products to which you made mention:
 ClearCase or PowerBuilder.  Could you post an example of "abysmal 
 results" so we
 can see what NOT to produce?  :-)  I do like to develop tools that 
 produce
 qualiy results -- this is probably due to my delusion that I have 
 unlimited
 project development time, and that a project is never quite "done" ;).

 BTW, what's w/ the KISS?  Thanks for your comments!

KISS == Keep It Simple Stupid.

And before you take any offense, none was intended (I assume), it's a 
somewhat common acronymm meaning simply that you should attempt not to 
*over* complicate things.

Regan

p.s. I think your idea is great.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/

Aug 19 2004

Berin Loritsch <bloritsch d-haven.org> writes:

Regan Heath wrote:

 On Thu, 19 Aug 2004 19:01:07 +0000 (UTC), Jaymz <jdunne4 bradley.edu> 
 wrote:
 
 And sorry, I haven't used any of the products to which you made mention:
 ClearCase or PowerBuilder.  Could you post an example of "abysmal 
 results" so we
 can see what NOT to produce?  :-)  I do like to develop tools that 
 produce
 qualiy results -- this is probably due to my delusion that I have 
 unlimited
 project development time, and that a project is never quite "done" ;).

 BTW, what's w/ the KISS?  Thanks for your comments!

 
 
 KISS == Keep It Simple Stupid.
 
 And before you take any offense, none was intended (I assume), it's a 
 somewhat common acronymm meaning simply that you should attempt not to 
 *over* complicate things.
 
 Regan
 
 p.s. I think your idea is great.

That was its intention (how did you get this message and I didn't?).

Anyway for an example of bad integration, ClearCase's XML merge is
perfect.

If the XML is not properly formatted, the tool will choke beyond
reason (this applies to original or new XML documents).  By that
I mean the tool will attempt to treat the whole block as one element.
For example:

OLD:

<!-- parse error here -->
<element
   <embedded type="element"/>
</element>

NEW

<!-- parse error fixed -->
<element>
   <embedded type="element"/>
</element>

CONFLICT:

The element "element" conflicts with "element       <embedded 
type="element"/>      "

Aug 19 2004

Jaymz <jdunne4 bradley.edu> writes:

That was odd, the posts got out of order and I didn't see Regan's post
initially... Anyways...

Yeah, I'm definitely a fan of KISS... Heh, punny punny... But seriously, I'd
like to keep this revision control system on the ground:  simple and reliable,
yet very powerful.  It seems as though after a nice evening of playing Doom 3, I
have no will to be near a computer until at least tomorrow morning... Jesus
Christ, that game ... wow ...  Then, this weekend is gonna be crazy, moving back
into house at skool.

If anyone, in the down-time here, would like to poke thru the D front-end
parser/analyzer code and possibly produce some nice D code to achieve the same
effect, that'd be sweet.  If not, that's cool too, I'll just do it once I'm at
skool.

On the train ride home from work today I was jottin' down some ideas on how to
do a tree-diff operation.  I started writing out D code in an XML-like format
just to see how I could process a given module as a syntactic tree and
rearrange, add, and remove parts of it.  I came up with a quickie example
XML-like tree: (some declarations are useless, but exist for example's sake)

D source module:

module addition;

import std.c.stdio;

alias int myInt;

int add(int a, int b)
out {
assert(a + b == value);
}
body {
return (a + b);
}

Corresponding XML tree definition:

<module name="addition">
<import name="std.c.stdio"/>
<alias type="d:int" name="myInt"/>
<function name="add" return="d:int">
<param type="d:int" modifier="in" name="a"/>
<param type="d:int" modifier="in" name="b"/>
<out>
<assert>
<opEquals>
<left>
<opAdd>
<left><ref-param name="a"/></left>
<right><ref-param name="b"/></right>
</opAdd>
</left>
<right>
<return-value/>
</right>
</assert>
</out>
<body>
<return>
<paren>
<opAdd>
<left><ref-param name="a"/></left>
<right><ref-param name="b"/></right>
</opAdd>
</paren>
</return>
</body>
</function>
</module>

As you can see, it's pretty much a syntactic representation of the D module.  It
looks similar to a CodeDOM structure, if you've ever used that from .NET.  Of

like <linecomment>, <blockcomment>, <nestcomment>, <blankline>, etc to preserve
spacing and comments.  All of the D operators as tags should be defined by their
corresponding op* names.  Feel absolutely free to rip on my definition schema
here, I just made it up without *much* thought.  Admittedly, there was *some*
thought.

Now to the real meat...

Defining the operations to ADD and REMOVE sections is easy enough, just treat
the tree in an in-order-traversal manner and linearly add/remove tags (start and
end tags must be matched, of course).  Process the two trees just as diff
processes two files, trying to match them up tag-by-tag wherever possible.  This
makes a huge benefit in terms of simple changes that have a major impact on the
formatting of a document.

For example, in unix diff/patch you indent a block of code, all the affected
lines are included in the diff.  But when using /my/ utility, the affected start
and end tags of the if-statement are created and the internal code block is left
completely alone, making the diff much more compressed.  Here, we win against
the unix diff utility, whereas the worst case would be a draw with the unix diff
utility.

To try to complicate things, defining a MOVE operation without falling back to
an ADD and REMOVE operation should be considered.  Of course, in the initial
implementation it could just very well be not defined, and we could rely on
ADD/REMOVE, just as the unix diff/patch utilities do.  However, in the future
this could be a major source of improvement.

Let me know what you all think!

James Dunne

Aug 19 2004

J C Calvarese <jcc7 cox.net> writes:

Jaymz wrote:
 That was odd, the posts got out of order and I didn't see Regan's post
 initially... Anyways...
 

...
 On the train ride home from work today I was jottin' down some ideas on how to
 do a tree-diff operation.  I started writing out D code in an XML-like format
 just to see how I could process a given module as a syntactic tree and
 rearrange, add, and remove parts of it.  I came up with a quickie example
 XML-like tree: (some declarations are useless, but exist for example's sake)
 
 D source module:
 
 module addition;
 
 import std.c.stdio;
 
 alias int myInt;
 
 int add(int a, int b)
 out {
 assert(a + b == value);
 }
 body {
 return (a + b);
 }
 
 Corresponding XML tree definition:
 
 <module name="addition">
 <import name="std.c.stdio"/>
 <alias type="d:int" name="myInt"/>

<snip>

This discussion reminds me of the DML idea that was mentioned a while 
back (I think it was brought up 2 or 3 years ago):

http://jdanielsmith.org/DML/

I don't know how similar this is what you're thinking, but it is XML-based.

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Aug 19 2004

Jaymz <jdunne4 bradley.edu> writes:

In article <cg3qps$2qcg$1 digitaldaemon.com>, J C Calvarese says...
Jaymz wrote:
 That was odd, the posts got out of order and I didn't see Regan's post
 initially... Anyways...
 

...
 On the train ride home from work today I was jottin' down some ideas on how to
 do a tree-diff operation.  I started writing out D code in an XML-like format
 just to see how I could process a given module as a syntactic tree and
 rearrange, add, and remove parts of it.  I came up with a quickie example
 XML-like tree: (some declarations are useless, but exist for example's sake)
 
 D source module:
 
 module addition;
 
 import std.c.stdio;
 
 alias int myInt;
 
 int add(int a, int b)
 out {
 assert(a + b == value);
 }
 body {
 return (a + b);
 }
 
 Corresponding XML tree definition:
 
 <module name="addition">
 <import name="std.c.stdio"/>
 <alias type="d:int" name="myInt"/>

<snip>

This discussion reminds me of the DML idea that was mentioned a while 
back (I think it was brought up 2 or 3 years ago):

http://jdanielsmith.org/DML/

I don't know how similar this is what you're thinking, but it is XML-based.

-- 
Justin (a/k/a jcc7)
http://jcc_7.tripod.com/d/

Thanks for your comment, but I was just trying to convey the idea of the
syntactic tree using an XML-like form.  I'm not going to *actually use* any form
of XML or DML in the syntactic tree's definition.  I'll be keeping that all in
memory in a DOM structure.

Which reminds me, I did do a little poring over the DMD front-end code last
night, and it looks very clean and easy to port over to D for just the syntactic
analysis.  The data structures used are pretty clear and can easily be suited to
this project.

Really, now, the only design problem that I can see is how to define the patch
format, possibly in a human-readable way.  I'm leaning towards an extensible
binary format using chunks (like EBML does), since a text-based patch format
would get rather lengthy.

Does anyone know if SVN needs the patch data to be in ASCII text format, or does
it not care?

James Dunne

Aug 20 2004

Ilya Minkov <minkov cs.tum.edu> writes:

If it is defined over a tree, i imagine it fairly unstable. Not that it 
couldn't be done, but i'm somewhat sceptical, also considering the 
extandable syntax which might come in 2.x. Are there any good tree 
diff/merge tools already? Any open-source ones? If there are such tools 
for XML, one could define some mapping between D and XML.

If you define it over a stream of lexemes, it will be wonderfully 
robust, but i don't imagibe it being too useful. It will at most take 
care of formatting issue (different contributors prefer different 
formatting), but projects now use some kind of an auto-formatter with 
certain settings, which also provides (an admittably much cruder) solution.

What i would think of being more valuable for now, would be a 
documentation system and code formatter written completely in D.

-eye

Aug 19 2004

Jaymz <jdunne4 bradley.edu> writes:

In article <cg2tk8$2amn$1 digitaldaemon.com>, Ilya Minkov says...
If it is defined over a tree, i imagine it fairly unstable. Not that it 
couldn't be done, but i'm somewhat sceptical, also considering the 
extandable syntax which might come in 2.x. Are there any good tree 
diff/merge tools already? Any open-source ones? If there are such tools 
for XML, one could define some mapping between D and XML.

If you define it over a stream of lexemes, it will be wonderfully 
robust, but i don't imagibe it being too useful. It will at most take 
care of formatting issue (different contributors prefer different 
formatting), but projects now use some kind of an auto-formatter with 
certain settings, which also provides (an admittably much cruder) solution.

What i would think of being more valuable for now, would be a 
documentation system and code formatter written completely in D.

-eye


Well, it would have to be defined with something a bit more complex than just a
tree structure.  A tree-based structure, like a DOM, would be ideal.  I don't
see how that'd be unstable.  It should be defined over a stream of lexemes, of
course.  That's what the DOM will hold.

I'm not too keen on having this be another implementation of a source code
re-formatter.  It's merely just a different way of patching source code using
the assumption that we're reading SOURCE CODE, not just arbitrary lines of text.
The code re-formatting comes out of the need to reproduce the code from the DOM.

A documentation system for D written entirely in D?  Just a few simple changes
to Doxygen it sounds like, minus the initial work of porting to D ;).

This could be a whole different pile of monkeys if class meta-data support was
in D *WINK WINK*.  I saw a few threads of discussion on meta-data, but it didn't
seem  to end up anywhere.  Gr.  I don't see what the big issue is, the symbol
table doesn't take up *that* much room.  I personally would like a bit more
flexibility at the cost of the executable size being bumped up a few KB.

BTW, could you elaborate a bit on your skepticism?  I'm a bit confused here.
Thanks!

James Dunne

Aug 19 2004

Ilya Minkov <minkov cs.tum.edu> writes:

Jaymz schrieb:

 Well, it would have to be defined with something a bit more complex than just a
 tree structure.  A tree-based structure, like a DOM, would be ideal.  I don't
 see how that'd be unstable.  It should be defined over a stream of lexemes, of
 course.  That's what the DOM will hold.

That might work... Although i'd like it somehow independant from most 
language constructs, and being able to handle new syntax constructs 
gracefully... more or less like a highliting editor with "levels" 
recognition does. Perhaps even some extensibility?

 I'm not too keen on having this be another implementation of a source code
 re-formatter.  It's merely just a different way of patching source code using
 the assumption that we're reading SOURCE CODE, not just arbitrary lines of
text.
 The code re-formatting comes out of the need to reproduce the code from the
DOM.

On the other hand the DIFF will not be very human-readable.

 A documentation system for D written entirely in D?  Just a few simple changes
 to Doxygen it sounds like, minus the initial work of porting to D ;).

Hr hr. :)

 This could be a whole different pile of monkeys if class meta-data support was
 in D *WINK WINK*.  I saw a few threads of discussion on meta-data, but it
didn't
 seem  to end up anywhere.  Gr.  I don't see what the big issue is, the symbol
 table doesn't take up *that* much room.  I personally would like a bit more
 flexibility at the cost of the executable size being bumped up a few KB.

The metadata was already there in DLI, and was incomplete, and only the 
DLI verion of Phobos ever used it. The topic must be raised again in the 
post-1.0 era. For now, the consensus was that a parser and some custom 
code generators would have to do the work for the others, and relieve 
Walter from something unnecessary to do right now. Besides, the metadata 
was only intended to be used in a program itself.

I wonder whether i find some time to bake a D version of my favorite 
parser gen (COCO/R) and a corresponding D grammar... It would be a great 
help on creating tools. I started to port the Java version, but after 
seeing the C version i have come to dislike that for Java and will 
probably first hack up a C version which outputs D code, then someone 
else could finish porting it. I am sure that the tool can cope perfectly 
with D syntax, and the generated code is efficient.

 BTW, could you elaborate a bit on your skepticism?  I'm a bit confused here.
 Thanks!

I don't know, i'm totally new to the matter... That means i'm confused 
and skeptical. Still, are there any tree diffs out there?

One point to consider is that /me and Bill Cox has raised the question 
of an extentable language, where libraries could introduce new syntax, 
like in OpenC++ and similar. Walter promised to consider this again in 
the post-1.0 era.

-eye

Aug 19 2004

pragma <EricAnderton at yahoo dot com> <pragma_member pathlink.com> writes:

In article <cg2qa4$28dh$1 digitaldaemon.com>, Jaymz says...
Let's see...

Upon first design, this could just be a simple stand-alone project implemented
for the D language, consisting of a defined patch-format and a patch/diff-like
toolset.  After all, we've got the front-end source to D already!  That could
*possibly* make this simpler to implement, as it contains all the data
structures necessary to parse, analyze, and possibly re-create the code with.

How I see the "diff" tool working:
1)  Lex & parse the source files
2)  Create semantic tree representation of the original & new code
3)  Compare new code's semantic tree with original code's semantic tree
4)  Output a series of simple, defined operations to transform the original
code's semantic tree into the new code's semantic tree.

And the "patch" tool would do basically the inverse of the diff tool:
1)  Lex & parse the target source file
2)  Create semantic tree representation of the target source code
3)  Apply defined operations on the semantic tree
4)  Rebuild the target code from the modified semantic tree, possibly conforming
to a given formatting standard, or using hints provided by the diff tool to
recreate the formatting of the original file.

This type of patch/diff toolset could handle the creation of an entire module,
simply defined by "create" operations on an "empty" semantic tree.

Let me know what you all think of this.  Thanks for your input, Pragma!


I can see the merit in a stand-alone server, but this may be the wrong way to
start out.  Honestly, I think an add-on module to an existing source control
tool might prove much more useful than an outright replacement.  Take
dsource.org for example: an entire website dedicated to D programming that is
backed on Subversion.  IMO an extension to Subversion would be far more useful
(and easier to implement) to the D community as a whole. 

All the same, please look at Mango over on dsource if you're going to write a
stand-alone server.  The I/O and socket portions of that library may give you a
good head-start.

That aside, I like where you're going with this, especially with 'creating a
semantic tree' of the code.  I gather this would be some form of pseudocode or
XML?  I can see this becoming useful for increasing performance if you keep the
current semantic tree version on hand at all times.  That way one can compare
the tree against their own local source to make sure they're not altering other
portions of the application too badly (i.e. trying not to violate contracts
across a whole project)

Another thing, a lot of the spirit of what you're proposing here is captured in
D's in/out/body and unittest contracting system.  Have you considered
incorporating these statements in particular to deepen the semantic meaning of
code when you assess it?  :)

- Pragma

Aug 19 2004

Jaymz <jdunne4 bradley.edu> writes:

In article <cg2vh6$2c5t$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
<<snip>>
I can see the merit in a stand-alone server, but this may be the wrong way to
start out.  Honestly, I think an add-on module to an existing source control
tool might prove much more useful than an outright replacement.  Take
dsource.org for example: an entire website dedicated to D programming that is
backed on Subversion.  IMO an extension to Subversion would be far more useful
(and easier to implement) to the D community as a whole. 

All the same, please look at Mango over on dsource if you're going to write a
stand-alone server.  The I/O and socket portions of that library may give you a
good head-start.

That aside, I like where you're going with this, especially with 'creating a
semantic tree' of the code.  I gather this would be some form of pseudocode or
XML?  I can see this becoming useful for increasing performance if you keep the
current semantic tree version on hand at all times.  That way one can compare
the tree against their own local source to make sure they're not altering other
portions of the application too badly (i.e. trying not to violate contracts
across a whole project)

Another thing, a lot of the spirit of what you're proposing here is captured in
D's in/out/body and unittest contracting system.  Have you considered
incorporating these statements in particular to deepen the semantic meaning of
code when you assess it?  :)

- Pragma

Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy
of the code on hand.  I do like the system and am using it personally, just
never cared to see its code ;).  But if you say it is easier to extend, then I
will believe you.  An SVN extension is definitely a possible direction in the
future for this project, assuming the proof-of-concept diff/patch toolset works.
After all, I didn't really have any ambition to create a new stand-alone server
in the first place.

You're saying I could gather contract information from the in, out, invariant,
etc. constructs that D provides and make sure the coder isn't going to violate
them with the code commit?  Wow, that takes balls.

Actually I don't think that's possible.  How are you to know at compile time if
the coder is violating any contracts?  Where do you get your values to test
against the contracts?  And finally, HOW do you represent a contract in an
evaluative way, assuming you magically have values provided by the committed
code to test against the contracts?  I don't think the contract information
would be too useful in a revision control system, and it wouldn't be very
language-independent either.  But it's a cool idea, nonetheless.

Er, anyway... The real intent behind building the semantic tree of the module is
to have a uniform way of accessing functions, structures, classes in order to
compare them and change them easily.  I could foresee this being defined by a
relatively large inter-related class hierarchy of things like expressions,
statements, etc...

.. Wait a tick... that's a SYNTACTIC tree... Aww dammit all.  My bad...

Well, a semantic tree is really an extension of a syntactic tree, isn't it?  Oh
God, my head...  Someone clarify myself for me.

James Dunne

Aug 19 2004

pragma <EricAnderton at yahoo dot com> <pragma_member pathlink.com> writes:

In article <cg31on$2ega$1 digitaldaemon.com>, Jaymz says...
Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy
of the code on hand.  I do like the system and am using it personally, just
never cared to see its code ;).  But if you say it is easier to extend, then I
will believe you.  An SVN extension is definitely a possible direction in the
future for this project, assuming the proof-of-concept diff/patch toolset works.

Well, I haven't done it personally, but word has it that it has an event model
of some kind that was written with extensibility in mind. :)

You're saying I could gather contract information from the in, out, invariant,
etc. constructs that D provides and make sure the coder isn't going to violate
them with the code commit?  Wow, that takes balls.

Um, thank you?  I wasn't aware of that statement being all that out there, but
in retrospect it's pretty bogus.  

Its probably all this going back and forth between ColdFusion (work) and D (here
in the NG).  But I am on the same page now and will restrain from making any
future "ballsy" comments. ;)

Actually I don't think that's possible.  How are you to know at compile time if
the coder is violating any contracts?  Where do you get your values to test
against the contracts?  And finally, HOW do you represent a contract in an
evaluative way, assuming you magically have values provided by the committed
code to test against the contracts?  I don't think the contract information
would be too useful in a revision control system, and it wouldn't be very
language-independent either.  But it's a cool idea, nonetheless.

Okay, I see where you're coming from now.  I was thinking more at the
compilation and unittest level, where testing DBC *really* comes into play.
You're right: you can't use that kind of information when you're just looking at
how the code is put together.

Of course there's no reason why you couldn't get a nightly or on-demand build to
do some analysis when an assert or static assert fires in a unittest. After all
that processing, wouldn't it be pretty easy to correlate a line number and error
message with a particular change ... especially since your semantic pass will
know how everthing is interrelated? 

Er, anyway... The real intent behind building the semantic tree of the module is
to have a uniform way of accessing functions, structures, classes in order to
compare them and change them easily.  I could foresee this being defined by a
relatively large inter-related class hierarchy of things like expressions,
statements, etc...

Gotcha.  So if the revision system can acutally "understand" the code it's
processing, then it'll be less prone to screwups and possibly catch developer
mistakes as well... 
.. Wait a tick... that's a SYNTACTIC tree... Aww dammit all.  My bad...

Well, a semantic tree is really an extension of a syntactic tree, isn't it?  Oh
God, my head...  Someone clarify myself for me.

I'll take a stab at that one.

I've always understood the semantics of a program to be derived from the syntax
used.  Yes, it's almost 1-for-1 between meaining and the syntax used, especially
in D.  The difference lies in how one can do some things in more than one way,
like using "?" instead of "if()" and so on: both have the same semantic meaning,
but the syntax is totally different.

- Pragma

Aug 19 2004

Jaymz <jdunne4 bradley.edu> writes:

In article <cg358p$2hho$1 digitaldaemon.com>, pragma <EricAnderton at yahoo dot
com> says...
In article <cg31on$2ega$1 digitaldaemon.com>, Jaymz says...
Well, I wouldn't know anything about extensibility with SVN, as I haven't a copy
of the code on hand.  I do like the system and am using it personally, just
never cared to see its code ;).  But if you say it is easier to extend, then I
will believe you.  An SVN extension is definitely a possible direction in the
future for this project, assuming the proof-of-concept diff/patch toolset works.

Well, I haven't done it personally, but word has it that it has an event model
of some kind that was written with extensibility in mind. :)

You're saying I could gather contract information from the in, out, invariant,
etc. constructs that D provides and make sure the coder isn't going to violate
them with the code commit?  Wow, that takes balls.

Um, thank you?  I wasn't aware of that statement being all that out there, but
in retrospect it's pretty bogus.  

Its probably all this going back and forth between ColdFusion (work) and D (here
in the NG).  But I am on the same page now and will restrain from making any
future "ballsy" comments. ;)

Actually I don't think that's possible.  How are you to know at compile time if
the coder is violating any contracts?  Where do you get your values to test
against the contracts?  And finally, HOW do you represent a contract in an
evaluative way, assuming you magically have values provided by the committed
code to test against the contracts?  I don't think the contract information
would be too useful in a revision control system, and it wouldn't be very
language-independent either.  But it's a cool idea, nonetheless.

Okay, I see where you're coming from now.  I was thinking more at the
compilation and unittest level, where testing DBC *really* comes into play.
You're right: you can't use that kind of information when you're just looking at
how the code is put together.

Of course there's no reason why you couldn't get a nightly or on-demand build to
do some analysis when an assert or static assert fires in a unittest. After all
that processing, wouldn't it be pretty easy to correlate a line number and error
message with a particular change ... especially since your semantic pass will
know how everthing is interrelated? 

I thought you were gonna *restrain* from making future "ballsy" comments? ;).

What type of algorithm could be developed based on a commit and a static assert
firing to lead to a possible collection of offending line numbers?  Now THAT
would be really interesting, and is technically feasible!


Er, anyway... The real intent behind building the semantic tree of the module is
to have a uniform way of accessing functions, structures, classes in order to
compare them and change them easily.  I could foresee this being defined by a
relatively large inter-related class hierarchy of things like expressions,
statements, etc...

Gotcha.  So if the revision system can acutally "understand" the code it's
processing, then it'll be less prone to screwups and possibly catch developer
mistakes as well... 

Well, the project's scope certainly has escalated from a simple syntactic tree
based revision control system to an intelligent learning machine that'll
automagically fix your mistakes and know what you *really* want to do.  LOL.
Not to pick on you, Pragma. ;)

Hey!  Why don't we just build a neural network of a few billion nodes and train
it on D grammar and semantics 'til it's sick?  Oh wait, we already got a couple
hundred of 'em walkin around.. Dammit.  lol.

.. Wait a tick... that's a SYNTACTIC tree... Aww dammit all.  My bad...

Well, a semantic tree is really an extension of a syntactic tree, isn't it?  Oh
God, my head...  Someone clarify myself for me.

I'll take a stab at that one.

I've always understood the semantics of a program to be derived from the syntax
used.  Yes, it's almost 1-for-1 between meaining and the syntax used, especially
in D.  The difference lies in how one can do some things in more than one way,
like using "?" instead of "if()" and so on: both have the same semantic meaning,
but the syntax is totally different.

- Pragma

<not snobby>I do realize the difference between syntax and semantics</not
snobby>.  And in general, a language's syntax will strongly reflect its
semantics (unless you complain of such silly things as READABILITY ... damn VB
coders).

Regardless, should the revision control system be based on a /syntactic/ or
/semantic/ tree representation of the code?  To contradict myself, as I always
do, I don't see much benefit now in a /semantic/ tree for a simple revision
control system.  :)

I do hope I've successfully confused everyone now.  The master of deception and
contradiction will be back tomorrow morning.

James Dunne

Aug 19 2004

D Programming

C/C++ Programming

Other

digitalmars.D - D source code revision system idea