www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Notepad++

reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

What's the best anybody's managed to get Notepad++ to syntax-highlight 
D?  (I'm on version 5.4.5, if that makes a difference.)

My userDefineLang.xml file is as given here
http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport/NotepadPlus
(note that I've fixed a few errors I've no idea how got there).

Notepad++ does a good job of syntax-highlighting PHP files, whose 
syntactic structure is more complex than that of D.  So clearly, 
Notepad++ is a powerful syntax-highlighter (or Scintilla is, whatever). 
  However, at the moment I can't even seem to get it up to C standard! 
(Can anybody find a full reference of the userDefineLang.xml format, for 
that matter?)

Maybe it's just a case in point of some comments here:
http://d.puremagic.com/issues/show_bug.cgi?id=3193

Anyway, attached is the result.  Can anybody do better (other than by 
telling it to treat D as C or some other language instead)?

Or maybe I should just go back to TextPad (which isn't perfect either) 
and put up with its not supporting Unicode....

Stewart.
Aug 12 2009
next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:

 What's the best anybody's managed to get Notepad++ to syntax-highlight 
 D?  (I'm on version 5.4.5, if that makes a difference.)
 
 My userDefineLang.xml file is as given here
 http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport/NotepadPlus
 (note that I've fixed a few errors I've no idea how got there).
 
 Notepad++ does a good job of syntax-highlighting PHP files, whose 
 syntactic structure is more complex than that of D.  So clearly, 
 Notepad++ is a powerful syntax-highlighter (or Scintilla is, whatever). 
   However, at the moment I can't even seem to get it up to C standard! 
 (Can anybody find a full reference of the userDefineLang.xml format, for 
 that matter?)

Scintilla uses plugins to highlight source. These plugins are written in C++ and have almost full access to the buffer so the highlighter code may be arbitrarily complex. I actually wrote such a plugin to highlight D a while back: http://dsource.org/projects/scrapple/browser/trunk/scilexer It seems like Notepad++ developers added their own highlighter plugin which takes userDefineLang.xml as its configuration. Such a configurable plugin is presumably much less flexible than pure C++ implementation for a particular language. It's very likely that PHP highlighter is written in C++ and comes bundled with Scintilla.
Aug 12 2009
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Sergey Gromov wrote:
 Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:

 Scintilla uses plugins to highlight source.  These plugins are written
 in C++ and have almost full access to the buffer so the highlighter code
 may be arbitrarily complex.  I actually wrote such a plugin to highlight
 D a while back:
 
 http://dsource.org/projects/scrapple/browser/trunk/scilexer

"1. If you have SciTE 1.76 for Windows installed simply replace SciLexer.dll and d.properties with the supplied files. 2. If you wish to build Scintilla from source:" Can it be used in Scintilla-based editors besides SciTE short of acquiring the whole Scintilla source and rebuilding it? For the record, there's a SciLexer.dll in my Notepad++ dir, but no d.properties to be found. The SciLexer.dll reports itself as file version 1.7.8.0, product version 1.78. So maybe the question is of what effect replacing it with a fork of version 1.76 would have. (Do SciTE versions correspond directly to Scintilla versions?)
 It seems like Notepad++ developers added their own highlighter plugin
 which takes userDefineLang.xml as its configuration.  Such a
 configurable plugin is presumably much less flexible than pure C++
 implementation for a particular language.  It's very likely that PHP
 highlighter is written in C++ and comes bundled with Scintilla.

It puzzles me that they didn't make this plugin powerful enough to highlight the language it (and indeed the whole of Notepad++) is written in. Even more so considering the sheer number of C-like languages out there, which people are likely to want to use N++ to write. Stewart.
Aug 12 2009
next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Thu, 13 Aug 2009 01:40:47 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:

 Scintilla uses plugins to highlight source.  These plugins are written
 in C++ and have almost full access to the buffer so the highlighter code
 may be arbitrarily complex.  I actually wrote such a plugin to highlight
 D a while back:
 
 http://dsource.org/projects/scrapple/browser/trunk/scilexer

"1. If you have SciTE 1.76 for Windows installed simply replace SciLexer.dll and d.properties with the supplied files. 2. If you wish to build Scintilla from source:" Can it be used in Scintilla-based editors besides SciTE short of acquiring the whole Scintilla source and rebuilding it?

There are two problems at least: 1. SciLexer.dll contains *all* of the built-in lexer modules. Replacing your DLL with another version will remove any extra lexers which 3rd party put there, like an XML-configurable lexer in case of Notepad++. 2. Lexers are written in C++ and interface with the rest of Scintilla via C++ classes. Therefore if a field is added or removed anywhere, or if you use a different compiler to build your DLL than that used to build Scintilla, you'll get GPF, or worse. Good news is that Notepad++ is on SourceForge so that the "from source" way is at least possible.
 For the record, there's a SciLexer.dll in my Notepad++ dir, but no 
 d.properties to be found.  The SciLexer.dll reports itself as file 
 version 1.7.8.0, product version 1.78.  So maybe the question is of what 
 effect replacing it with a fork of version 1.76 would have.  (Do SciTE 
 versions correspond directly to Scintilla versions?)

Yes, SciTE versions seem to be in sync with Scintilla versions.
 It seems like Notepad++ developers added their own highlighter plugin
 which takes userDefineLang.xml as its configuration.  Such a
 configurable plugin is presumably much less flexible than pure C++
 implementation for a particular language.  It's very likely that PHP
 highlighter is written in C++ and comes bundled with Scintilla.

It puzzles me that they didn't make this plugin powerful enough to highlight the language it (and indeed the whole of Notepad++) is written in. Even more so considering the sheer number of C-like languages out there, which people are likely to want to use N++ to write.

Well I think it's hard to create a regular expression engine flexible enough to allow arbitrary highlighting. I think the best such engine I've seen was Colorer by Igor Russkih, and even there I wasn't able to express D's WYSIWYG or delimited strings. You need a real programming language for that. --- I've just had a look at Notepad++ sources. The Scintilla they use contains Scintilla's built-in D lexer. I think it's just not configured. SciTE uses *.properties files to configure stuff. Notepad++ uses XML files for the same purpose. I think it's all in langs.model.xml. My current idea is to take d.properties from the corresponding release of SciTE and try to translate it into the langs.model.xml format. I'll probably try it later when I have time. Of course it would be nice to replace the original D lexer with mine. Or, even better, to ask Scintilla developers to include my lexer into the official bundle. May be worth a try.
Aug 12 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 2.  Lexers are written in C++ and interface with the rest of Scintilla
 via C++ classes.  Therefore if a field is added or removed anywhere, or
 if you use a different compiler to build your DLL than that used to
 build Scintilla, you'll get GPF, or worse.

If they use binary interfacing with virtual functions a la COM's binary standard, then field presence shouldn't matter. Also, most compilers on Windows respect the basic ABI. No? Andrei
Aug 12 2009
parent Sergey Gromov <snake.scaly gmail.com> writes:
Wed, 12 Aug 2009 21:35:02 -0500, Andrei Alexandrescu wrote:

 Sergey Gromov wrote:
 2.  Lexers are written in C++ and interface with the rest of Scintilla
 via C++ classes.  Therefore if a field is added or removed anywhere, or
 if you use a different compiler to build your DLL than that used to
 build Scintilla, you'll get GPF, or worse.

If they use binary interfacing with virtual functions a la COM's binary standard, then field presence shouldn't matter.

They don't, unfortunately. Every lexer defines a static instance of a LexerModule class. The coloring function receives a reference to an Accessor class. They're full-blown classes, with fields and stuff.
 Also, most compilers on Windows respect the basic ABI. No?

Even though they don't use inheritance, and therefore most compilers will likely build identical data layouts for them, there is still zero compatibility between different versions of those classes.
Aug 12 2009
prev sibling next sibling parent Kagamin <spam here.lot> writes:
Sergey Gromov Wrote:

 Or, even better, to ask Scintilla developers to include my lexer into
 the official bundle.  May be worth a try.

Uh... that's not an option.
Aug 13 2009
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Sergey Gromov wrote:
 Thu, 13 Aug 2009 01:40:47 +0100, Stewart Gordon wrote:

 It puzzles me that they didn't make this plugin powerful enough to 
 highlight the language it (and indeed the whole of Notepad++) is written 
 in.  Even more so considering the sheer number of C-like languages out 
 there, which people are likely to want to use N++ to write.

Well I think it's hard to create a regular expression engine flexible enough to allow arbitrary highlighting.

I can't see how it can be at all complicated to find the beginning and end of a C string or character literal. This (Posix?) regexp "(\\.|[^\\"])*" works as I try (though not in the tiny subset of Posix regexps that N++ understands). But that's an aside - you don't need regexps at all to get it working at this basic level, only a rudimentary concept of escape sequences.
 I think the best such engine
 I've seen was Colorer by Igor Russkih, and even there I wasn't able to
 express D's WYSIWYG or delimited strings.  You need a real programming
 language for that.

For WYSIWYG strings, all that's needed is a generic highlighter that supports: - the aforementioned string escapes - multiple types of string literals distinguished by whether they support string escapes, and not just delimiters TextPad's syntax highlighting engine manages 2/3 of this without any regexps (or anything to that effect). That said, I've just found that it can do a little bit of what remains: I can make it do `...` but not r"..." at the expense of distinguishing string and character literals. But token-delimited strings are indeed more complex to deal with. (How many people do we have putting them to practical use at the moment, for that matter?)
 ---
 
 I've just had a look at Notepad++ sources.  The Scintilla they use
 contains Scintilla's built-in D lexer.  I think it's just not
 configured.

Sounds as though N++'s developers overlooked to keep the configuration files up to date as new languages have been added to Scintilla.
 SciTE uses *.properties files to configure stuff.
 Notepad++ uses XML files for the same purpose.  I think it's all in
 langs.model.xml.  My current idea is to take d.properties from the
 corresponding release of SciTE and try to translate it into the
 langs.model.xml format.  I'll probably try it later when I have time.
 
 Of course it would be nice to replace the original D lexer with mine.
 Or, even better, to ask Scintilla developers to include my lexer into
 the official bundle.  May be worth a try.

You have two good plans there. Scintilla's definition of a plugin is confusing - normally plugins are things that can be dynamically loaded at runtime, rather than having to compile them in. If only.... Stewart.
Aug 13 2009
next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 Well I think it's hard to create a regular expression engine flexible
 enough to allow arbitrary highlighting.

I can't see how it can be at all complicated to find the beginning and end of a C string or character literal. This (Posix?) regexp "(\\.|[^\\"])*" works as I try (though not in the tiny subset of Posix regexps that N++ understands). But that's an aside - you don't need regexps at all to get it working at this basic level, only a rudimentary concept of escape sequences.
 I think the best such engine
 I've seen was Colorer by Igor Russkih, and even there I wasn't able to
 express D's WYSIWYG or delimited strings.  You need a real programming
 language for that.

For WYSIWYG strings, all that's needed is a generic highlighter that supports: - the aforementioned string escapes - multiple types of string literals distinguished by whether they support string escapes, and not just delimiters TextPad's syntax highlighting engine manages 2/3 of this without any regexps (or anything to that effect). That said, I've just found that it can do a little bit of what remains: I can make it do `...` but not r"..." at the expense of distinguishing string and character literals. But token-delimited strings are indeed more complex to deal with. (How many people do we have putting them to practical use at the moment, for that matter?)

Well, you can write a regexp to handle a simple C string. That is, if your regexp is matched against the whole file, which is usually not the case. Otherwise you'll have troubles with C string: "foo\ bar" or D string: "foo bar" Then you want to highlight string escapes and probably format specifiers. Therefore you need not simple regexps but hierarchies of them, and also you need to know where *internals* of the string start and end. Then you have r"foo" which probably can be handled with regexps. Then you have q"/foo/" where "/" can be anything. Still can be handled by extended regexps, even though they won't be regular expressions in scientific sense. Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}. Regexps cannot translate while substituting, so you must create regexps for all possible parens. And of course q"BLAH whatever BLAH here BLAH", well, probably nice for help texts. And these are only strings. Try to write regexp which treats .__15 as number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as number(2), operator(..), number(3).
 Scintilla's definition of a plugin is confusing - normally plugins are 
 things that can be dynamically loaded at runtime, rather than having to 
 compile them in.  If only....

I'm not sure they call them "plugins". They're lexer modules made so that lexer is relatively easily extendable.
Aug 14 2009
next sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Sergey Gromov wrote:
<snip>
 Well, you can write a regexp to handle a simple C string.  That is, if
 your regexp is matched against the whole file, which is usually not the
 case.  Otherwise you'll have troubles with C string:
 
 "foo\
 bar"
 
 or D string:
 
 "foo
 bar"

So there is a problem if the highlighter works by matching regexps on a line-by-line basis. But matching regexps over a whole file is no harder in principle than matching line-by-line and, when the maximal munch principle is never called to action, it can't be much less efficient. (The only bit of C or D strings that relies on maximal munch is octal escapes.)
 Then you want to highlight string escapes and probably format
 specifiers.  Therefore you need not simple regexps but hierarchies of
 them, and also you need to know where *internals* of the string start
 and end.

Let's just concentrate for the moment on the simple process of finding the beginning and end of a string. Here's a snippet of a TextPad syntax file: StringsSpanLines = Yes StringStart = " StringEnd = " StringEsc = \ A possible snippet of lexer code to handle this (which FAIK might be near enough how TP does it): if (*c == StringStart) { beginHighlightString(c); for (++c; *c != StringEnd && *c != '\0' &&(StringsSpanLines || *c != '\n'); ++c) { if (*c == StringEsc) ++c; } endHighlightString(c+1); } It's simple and it should work. (OK, there are two assumptions made for simplicity: that line breaks are normalised to LF, and that the file is terminated by at least two null bytes in memory, but you get the idea.) While it doesn't support highlighting of escapes, I can't see this fact as being the reason N++'s developers haven't implemented even this in the generic lexer module. I probably couldn't see it being the reason even if the C lexer did highlight escapes (which it doesn't).
 Then you have r"foo" which probably can be handled with regexps.
 
 Then you have q"/foo/" where "/" can be anything.  Still can be handled
 by extended regexps, even though they won't be regular expressions in
 scientific sense.
 
 Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
 Regexps cannot translate while substituting, so you must create regexps
 for all possible parens.

Yes, these aspects are more complicated. Both TP and N++ (out of the box, anyway) are probably far from being able to lex D2 properly. But they certainly could do better in supporting D1. Still, once N++ gains access to Scintilla's D lexer, things will certainly be better.
 And of course q"BLAH
 whatever BLAH here
 BLAH", well, probably nice for help texts.
 
 And these are only strings.  Try to write regexp which treats .__15 as
 number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
 number(2), operator(..), number(3).

We'd need many regexps to handle all possible cases, but a possible set to cover these cases and a few others (listed in a possible order of priority) is: \._*[0-9][0-9_]* ([1-9][0-9]*)(\.\.) [0-9]+\.[0-9]* [1-9][0-9]* \.\. \. [a-zA-Z_][a-zA-Z0-9_]* Note the use of capturing groups to handle the 2..3 case. Each capturing group would match a token, while in the other cases the whole regexp matches a token. Stewart.
Aug 14 2009
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 
 "foo
 bar"

So there is a problem if the highlighter works by matching regexps on a line-by-line basis. But matching regexps over a whole file is no harder in principle than matching line-by-line and, when the maximal munch principle is never called to action, it can't be much less efficient. (The only bit of C or D strings that relies on maximal munch is octal escapes.)

Highlighting the whole file every time a charater is typed is slow. Scintilla doesn't do that. It provides the lexer with a range of changed lines. The lexer is then free to choose a larger range if it cannot deduce context from the initial range. I tried to ignore this range and re-highlight the whole file in my lexer. The performance was unacceptable.
 Then you want to highlight string escapes and probably format
 specifiers.  Therefore you need not simple regexps but hierarchies of
 them, and also you need to know where *internals* of the string start
 and end.

Let's just concentrate for the moment on the simple process of finding the beginning and end of a string. Here's a snippet of a TextPad syntax file: StringsSpanLines = Yes StringStart = " StringEnd = " StringEsc = \ A possible snippet of lexer code to handle this (which FAIK might be [...]

Sure, TextPad uses a dozen of simple hacks specific to lexing programming languages. They're ad-hoc and they're limited to exactly what TextPad authors thought were important. Regexps is a different approach. They are more generic but are limited, too, because they're slow and don't nest naturally. Slow means they must try to re-color as little lines as possible. Not nestable means you need to invent some framework around regexps which is another sort of description language. If you implement the former naively and ignore the latter you'll get what presumably N++ has: not a very powerful system. It's actually trivial* to implement a lexer for Scintilla which would work exactly as TextPad does, including use of the same configuration files. * That is, if you know exactly how TextPad works.
 And these are only strings.  Try to write regexp which treats .__15 as
 number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
 number(2), operator(..), number(3).

We'd need many regexps to handle all possible cases, but a possible set to cover these cases and a few others (listed in a possible order of priority) is: \._*[0-9][0-9_]* ([1-9][0-9]*)(\.\.) [0-9]+\.[0-9]* [1-9][0-9]* \.\. \. [a-zA-Z_][a-zA-Z0-9_]*

Basically yes, but they're going to be much more complex. 3Lu...5 is also a range. 0x3e22.f5p6fi is a valid floating-point number. And still, regexps don't nest. Don't you want to highlight DDoc sections and macros?
Aug 15 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Sergey Gromov:
 Sure, TextPad uses a dozen of simple hacks specific to lexing
 programming languages.  They're ad-hoc and they're limited to exactly
 what TextPad authors thought were important.

Today the difference isn't much important because CPUs are fast. But on Windows with a Pentium3 Scintilla was very slow. TextPad was fast enough even for very quick fingers. (TextPad may even contain some parts coded in assembly). TextPad on Windows is very fast :-) Bye, bearophile
Aug 15 2009
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Sergey Gromov wrote:
 Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:
 
 Sergey Gromov wrote:
 "foo
 bar"

line-by-line basis. But matching regexps over a whole file is no harder in principle than matching line-by-line and, when the maximal munch principle is never called to action, it can't be much less efficient. (The only bit of C or D strings that relies on maximal munch is octal escapes.)

Highlighting the whole file every time a charater is typed is slow. Scintilla doesn't do that. It provides the lexer with a range of changed lines. The lexer is then free to choose a larger range if it cannot deduce context from the initial range. I tried to ignore this range and re-highlight the whole file in my lexer. The performance was unacceptable.

Of course. I suppose now that the right strategy is line-by-line with some preservation of state between lines: - Keep a note of the state at the beginning of each line - When something is changed, re-highlight those lines that have changed - Carry on re-highlighting until the state is back in sync with what was there before. If this means going way beyond the visible area of the file, record the state of the next however many lines as unknown (so that it will have another go when/if those lines are later scrolled into view). - If a range of lines that has just come into view begins in unknown state, it's up to the particular lexer module to start from the first visible line or backtrack as far as it likes to get some context. Is this anything like how Scintilla works? <snip>
 It's actually trivial* to implement a lexer for Scintilla which would
 work exactly as TextPad does, including use of the same configuration
 files.
 
 * That is, if you know exactly how TextPad works.

It would also be straightforward to improve TextPad's scheme to support an arbitrary number of string/comment types. How about this as an all-in-one replacement for TP's comment and string syntax directives? [DelimitedToken1] Start = /** End = */ Type = DocComment SpanLines = Yes Nest = No [DelimitedToken2] Start = /*! End = */ Type = DocComment SpanLines = Yes Nest = No [DelimitedToken3] Start = /* End = */ Type = Comment SpanLines = Yes Nest = No [DelimitedToken4] Start = /+ End = +/ Type = Comment SpanLines = Yes Nest = Yes [DelimitedToken5] Start = // Type = Comment SpanLines = No Nest = No [DelimitedToken6] Start = r" End = " Type = String SpanLines = Yes Nest = No [DelimitedToken7] Start = ` End = ` Type = String SpanLines = Yes Nest = No [DelimitedToken8] Start = " End = " Esc = \ Type = String SpanLines = Yes Nest = No [DelimitedToken9] Start = ' End = ' Esc = \ Type = Char SpanLines = No Nest = No There, we have all of D1 covered now, and not a regexp in sight. <snip>
 Basically yes, but they're going to be much more complex.  3Lu...5 is
 also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
 still, regexps don't nest.  Don't you want to highlight DDoc sections
 and macros?

That would be nice as well, as would being able to do things with Doxygen comments. But let's not try to run before we can walk. Stewart.
Aug 17 2009
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 17 Aug 2009 21:23:56 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 Highlighting the whole file every time a charater is typed is slow.
 Scintilla doesn't do that.  It provides the lexer with a range of
 changed lines.  The lexer is then free to choose a larger range if it
 cannot deduce context from the initial range.  I tried to ignore this
 range and re-highlight the whole file in my lexer.  The performance was
 unacceptable.

Of course. I suppose now that the right strategy is line-by-line with some preservation of state between lines: - Keep a note of the state at the beginning of each line - When something is changed, re-highlight those lines that have changed - Carry on re-highlighting until the state is back in sync with what was there before. If this means going way beyond the visible area of the file, record the state of the next however many lines as unknown (so that it will have another go when/if those lines are later scrolled into view). - If a range of lines that has just come into view begins in unknown state, it's up to the particular lexer module to start from the first visible line or backtrack as far as it likes to get some context. Is this anything like how Scintilla works?

Exactly. There is a 32-bit "style" known for every character, plus another 32-bit field associated with every line. A lexer is free to use these fields for any purpose, except the lower byte of a style defines the characters' color.
 
 <snip>
 It's actually trivial* to implement a lexer for Scintilla which would
 work exactly as TextPad does, including use of the same configuration
 files.
 
 * That is, if you know exactly how TextPad works.

It would also be straightforward to improve TextPad's scheme to support an arbitrary number of string/comment types. How about this as an all-in-one replacement for TP's comment and string syntax directives? [...] [DelimitedToken8] Start = " End = " Esc = \ Type = String SpanLines = Yes Nest = No [DelimitedToken9] Start = ' End = ' Esc = \ Type = Char SpanLines = No Nest = No There, we have all of D1 covered now, and not a regexp in sight.

Yes and no, because your ad-hoc format doesn't cover subtle differences between C and D strings. Like C strings don't support embedded EOLs. Though you may consider this minor.
 <snip>
 Basically yes, but they're going to be much more complex.  3Lu...5 is
 also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
 still, regexps don't nest.  Don't you want to highlight DDoc sections
 and macros?

That would be nice as well, as would being able to do things with Doxygen comments. But let's not try to run before we can walk.

This assumes that TextPad could run at some point. ;) This is exactly where I'm sceptical. I think that when it runs it'll have so many weird rules and settings that it won't be fun anymore. And they won't be powerful enough for anything authors didn't consider anyway.
Aug 17 2009
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Sergey Gromov wrote:
 Mon, 17 Aug 2009 21:23:56 +0100, Stewart Gordon wrote:

 Is this anything like how Scintilla works?

Exactly. There is a 32-bit "style" known for every character, plus another 32-bit field associated with every line. A lexer is free to use these fields for any purpose, except the lower byte of a style defines the characters' color.

Does it keep around in memory the style of every character, or only the 32-bit field associated with the line so that the lexer can re-style the characters on repaint/scroll? <snip>
 [DelimitedToken9]
 Start = '
 End = '
 Esc = \
 Type = Char
 SpanLines = No
 Nest = No

 There, we have all of D1 covered now, and not a regexp in sight.

Yes and no, because your ad-hoc format doesn't cover subtle differences between C and D strings. Like C strings don't support embedded EOLs.

I don't understand. How does SpanLines not achieve this? Then what _does_ SpanLines achieve according to whatever conclusion you've come to?
 Though you may consider this minor.
 
 <snip>
 Basically yes, but they're going to be much more complex.  3Lu...5 is
 also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
 still, regexps don't nest.  Don't you want to highlight DDoc sections
 and macros?

Doxygen comments. But let's not try to run before we can walk.

This assumes that TextPad could run at some point.

You're right - it turns out TP doesn't get all the D floating point notations right. It appears that TP has hard-coded the syntax of C numeric literals. I must've just not noticed since I had never before changed the number colour from the same as the default text colour. Maybe we do want regexps for all these floating point notations after all.
 ;)  This is exactly where I'm sceptical.  I think that when it runs 
 it'll have so many weird rules and settings that it won't be fun 
 anymore.  And they won't be powerful enough for anything authors 
 didn't consider anyway.

Maybe someone can come up with something.... Stewart.
Aug 18 2009
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Tue, 18 Aug 2009 20:40:37 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 Exactly.  There is a 32-bit "style" known for every character, plus
 another 32-bit field associated with every line.  A lexer is free to use
 these fields for any purpose, except the lower byte of a style defines
 the characters' color.

Does it keep around in memory the style of every character, or only the 32-bit field associated with the line so that the lexer can re-style the characters on repaint/scroll?

It can tell about any character of which style it is. This is to repaint unchanged lines without ever calling a lexer.
 <snip>
 [DelimitedToken9]
 Start = '
 End = '
 Esc = \
 Type = Char
 SpanLines = No
 Nest = No

 There, we have all of D1 covered now, and not a regexp in sight.

Yes and no, because your ad-hoc format doesn't cover subtle differences between C and D strings. Like C strings don't support embedded EOLs.

I don't understand. How does SpanLines not achieve this? Then what _does_ SpanLines achieve according to whatever conclusion you've come to?

Here's a string which is valid in D but is invalid in C: "foo bar" Here's another string which is, on the contrary, valid in C but is invalid in D: "foo\ bar" They both "span lines."
Aug 20 2009
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Sergey Gromov wrote:
<snip>
 Here's a string which is valid in D but is invalid in C:
 
 "foo
 bar"
 
 Here's another string which is, on the contrary, valid in C but is
 invalid in D:
 
 "foo\
 bar"
 
 They both "span lines."

Doesn't quite relate to what I was querying ... but anyway, it's perfectly straightforward to add another rule like LineSplice = \ among other possibilities. You could argue over whether it's worth going to all this effort, if you think the only point is to support C, C++ and D. But really, there are many C-like languages out there with their own slightly different rules, and even the likes of Prolog, SQL and Unix shell scripts with their own variants of C string syntax. I think the scheme I've come up with would be a good way to capture the subtle differences between these languages' string syntaxes, while at the same time being something that the average user wanting to add a new language to the system should be able to get their head around sooner or later. Stewart.
Aug 21 2009
prev sibling parent reply Don <nospam nospam.com> writes:
Sergey Gromov wrote:
 Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:
 
 Sergey Gromov wrote:
 Well I think it's hard to create a regular expression engine flexible
 enough to allow arbitrary highlighting.

end of a C string or character literal. This (Posix?) regexp "(\\.|[^\\"])*" works as I try (though not in the tiny subset of Posix regexps that N++ understands). But that's an aside - you don't need regexps at all to get it working at this basic level, only a rudimentary concept of escape sequences.
 I think the best such engine
 I've seen was Colorer by Igor Russkih, and even there I wasn't able to
 express D's WYSIWYG or delimited strings.  You need a real programming
 language for that.

supports: - the aforementioned string escapes - multiple types of string literals distinguished by whether they support string escapes, and not just delimiters TextPad's syntax highlighting engine manages 2/3 of this without any regexps (or anything to that effect). That said, I've just found that it can do a little bit of what remains: I can make it do `...` but not r"..." at the expense of distinguishing string and character literals. But token-delimited strings are indeed more complex to deal with. (How many people do we have putting them to practical use at the moment, for that matter?)

Well, you can write a regexp to handle a simple C string. That is, if your regexp is matched against the whole file, which is usually not the case. Otherwise you'll have troubles with C string: "foo\ bar" or D string: "foo bar" Then you want to highlight string escapes and probably format specifiers. Therefore you need not simple regexps but hierarchies of them, and also you need to know where *internals* of the string start and end. Then you have r"foo" which probably can be handled with regexps. Then you have q"/foo/" where "/" can be anything. Still can be handled by extended regexps, even though they won't be regular expressions in scientific sense. Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}. Regexps cannot translate while substituting, so you must create regexps for all possible parens.

Remember that the whole point of q{} strings was that they should NOT be highlighted as strings!
Aug 17 2009
parent Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 17 Aug 2009 10:37:47 +0200, Don wrote:

 Sergey Gromov wrote:
 Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
 Regexps cannot translate while substituting, so you must create regexps
 for all possible parens.

Remember that the whole point of q{} strings was that they should NOT be highlighted as strings!

You confuse q{} and q"{}" here. The former is a token string which may contain only valid D tokens. The latter is a delimited string with nesting delimiters. Like q"<<a href="#hi">hello</a>>".
Aug 17 2009
prev sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Stewart Gordon wrote:
<snip>
 TextPad's syntax highlighting engine manages 2/3 of this without any 
 regexps (or anything to that effect).  That said, I've just found that 
 it can do a little bit of what remains: I can make it do `...` but not 
 r"..." at the expense of distinguishing string and character literals.

For the record, what I'd done is StringStart = " StringEnd = " StringAlt = ' StringEsc = \ CharStart = ` CharEnd = ` CharEsc = however, I've just found a bigger problem: only string literals, not char literals, can span lines in TP. Stewart.
Aug 14 2009
prev sibling parent Kagamin <spam here.lot> writes:
Stewart Gordon Wrote:

 For the record, there's a SciLexer.dll in my Notepad++ dir, but no 
 d.properties to be found.  The SciLexer.dll reports itself as file 
 version 1.7.8.0, product version 1.78.  So maybe the question is of what 
 effect replacing it with a fork of version 1.76 would have.  (Do SciTE 
 versions correspond directly to Scintilla versions?)

Wrong lexer is used here. Scintilla builtin d lexer supported nested comments and escape sequences from version 1.72, but support for multiline strings was added in version 1.79.
Aug 13 2009
prev sibling next sibling parent Jussi Jumppanen <jussij zeusedit.com> writes:
Stewart Gordon Wrote:

 Or maybe I should just go back to TextPad (which isn't perfect 
 either) and put up with its not supporting Unicode....

FWIW Zeus is very similar to TextPad in feature set and the latest version also adds support for Unicode/UTF8. http://www.zeusedit.com/ It will do D syntax highlighting and code folding out of the box. It also comes with a version of ctags.exe made with these changes specifically for the D languages: http://www.zeusedit.com/z300/ctags_src.zip meaning it can produce tags infomation for your D source files. NOTE: Zeus like TextPad is shareware. Jussi Jumppanen Author: Zeus for Windows
Aug 12 2009
prev sibling parent reply Kagamin <spam here.lot> writes:
Stewart Gordon Wrote:

 Anyway, attached is the result.  Can anybody do better (other than by 
 telling it to treat D as C or some other language instead)?

I don't see how the lexer is being chosen. Programmer's Notepad does it correctly.
Aug 13 2009
parent Kagamin <spam here.lot> writes:
Nick Sabalausky Wrote:

 I don't see how the lexer is being chosen.
 Programmer's Notepad does it correctly.

I use Programmer's Notepad. It's good, but it still has some problems: http://code.google.com/p/pnotepad/issues/detail?id=480 (Proper Highlighting for D's Wysiwyg Strings) http://code.google.com/p/pnotepad/issues/detail?id=481 (In D, strings with embedded newlines are not highlighted correctly) http://code.google.com/p/pnotepad/issues/detail?id=482 (Support for D's nested comments)

At least PN chooses lexer. That's what I meant. These issues do not pertain to PN. They're RFEs for Scintilla D lexer and as I said they were fixed in version 1.79. PN developer just plans to upgrade to new Scintilla in PN 3, in fact I compiled scintilla 1.78 with recent D lexer an it works fine. BTW bug 482 is invalid, support for nested comments was there from the start, make sure you don't use C lexer.
Aug 14 2009