digitalmars.D - Notepad++

Stewart Gordon (18/18) Aug 12 2009 What's the best anybody's managed to get Notepad++ to syntax-highlight

Sergey Gromov (11/24) Aug 12 2009 Scintilla uses plugins to highlight source. These plugins are written

Stewart Gordon (17/29) Aug 12 2009 "1. If you have SciTE 1.76 for Windows installed simply replace

Sergey Gromov (29/61) Aug 12 2009 There are two problems at least:

Andrei Alexandrescu (5/9) Aug 12 2009 If they use binary interfacing with virtual functions a la COM's binary

Sergey Gromov (7/16) Aug 12 2009 They don't, unfortunately. Every lexer defines a static instance of a

Kagamin (2/4) Aug 13 2009 Uh... that's not an option.
Stewart Gordon (29/55) Aug 13 2009 I can't see how it can be at all complicated to find the beginning and

Sergey Gromov (28/66) Aug 14 2009 Well, you can write a regexp to handle a simple C string. That is, if

Stewart Gordon (51/82) Aug 14 2009 So there is a problem if the highlighter works by matching regexps on a

Sergey Gromov (25/68) Aug 15 2009 Highlighting the whole file every time a charater is typed is slow.

bearophile (4/7) Aug 15 2009 Today the difference isn't much important because CPUs are fast. But on ...
Stewart Gordon (78/105) Aug 17 2009 Of course. I suppose now that the right strategy is line-by-line with

Sergey Gromov (12/74) Aug 17 2009 Exactly. There is a 32-bit "style" known for every character, plus

Stewart Gordon (16/50) Aug 18 2009 Does it keep around in memory the style of every character, or only the

Sergey Gromov (11/38) Aug 20 2009 It can tell about any character of which style it is. This is to

Stewart Gordon (16/28) Aug 21 2009 Doesn't quite relate to what I was querying ... but anyway, it's

Don (3/65) Aug 17 2009 Remember that the whole point of q{} strings was that they should NOT be...

Sergey Gromov (4/11) Aug 17 2009 You confuse q{} and q"{}" here. The former is a token string which may

Stewart Gordon (14/18) Aug 14 2009

Kagamin (2/7) Aug 13 2009 Wrong lexer is used here. Scintilla builtin d lexer supported nested com...

Jussi Jumppanen (12/14) Aug 12 2009 FWIW Zeus is very similar to TextPad in feature set and the latest
Kagamin (3/5) Aug 13 2009 I don't see how the lexer is being chosen.

Kagamin (3/14) Aug 14 2009 At least PN chooses lexer. That's what I meant.

Stewart Gordon <smjg_1998 yahoo.com> writes:

What's the best anybody's managed to get Notepad++ to syntax-highlight 
D?  (I'm on version 5.4.5, if that makes a difference.)

My userDefineLang.xml file is as given here
http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport/NotepadPlus
(note that I've fixed a few errors I've no idea how got there).

Notepad++ does a good job of syntax-highlighting PHP files, whose 
syntactic structure is more complex than that of D.  So clearly, 
Notepad++ is a powerful syntax-highlighter (or Scintilla is, whatever). 
  However, at the moment I can't even seem to get it up to C standard! 
(Can anybody find a full reference of the userDefineLang.xml format, for 
that matter?)

Maybe it's just a case in point of some comments here:
http://d.puremagic.com/issues/show_bug.cgi?id=3193

Anyway, attached is the result.  Can anybody do better (other than by 
telling it to treat D as C or some other language instead)?

Or maybe I should just go back to TextPad (which isn't perfect either) 
and put up with its not supporting Unicode....

Stewart.

Aug 12 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:

 What's the best anybody's managed to get Notepad++ to syntax-highlight 
 D?  (I'm on version 5.4.5, if that makes a difference.)
 
 My userDefineLang.xml file is as given here
 http://www.prowiki.org/wiki4d/wiki.cgi?EditorSupport/NotepadPlus
 (note that I've fixed a few errors I've no idea how got there).
 
 Notepad++ does a good job of syntax-highlighting PHP files, whose 
 syntactic structure is more complex than that of D.  So clearly, 
 Notepad++ is a powerful syntax-highlighter (or Scintilla is, whatever). 
   However, at the moment I can't even seem to get it up to C standard! 
 (Can anybody find a full reference of the userDefineLang.xml format, for 
 that matter?)

Scintilla uses plugins to highlight source.  These plugins are written
in C++ and have almost full access to the buffer so the highlighter code
may be arbitrarily complex.  I actually wrote such a plugin to highlight
D a while back:

http://dsource.org/projects/scrapple/browser/trunk/scilexer

It seems like Notepad++ developers added their own highlighter plugin
which takes userDefineLang.xml as its configuration.  Such a
configurable plugin is presumably much less flexible than pure C++
implementation for a particular language.  It's very likely that PHP
highlighter is written in C++ and comes bundled with Scintilla.

Aug 12 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sergey Gromov wrote:
 Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:

<snip>
 Scintilla uses plugins to highlight source.  These plugins are written
 in C++ and have almost full access to the buffer so the highlighter code
 may be arbitrarily complex.  I actually wrote such a plugin to highlight
 D a while back:
 
 http://dsource.org/projects/scrapple/browser/trunk/scilexer

"1.  If you have SciTE 1.76 for Windows installed simply replace
SciLexer.dll and d.properties with the supplied files.

2.  If you wish to build Scintilla from source:"

Can it be used in Scintilla-based editors besides SciTE short of 
acquiring the whole Scintilla source and rebuilding it?

For the record, there's a SciLexer.dll in my Notepad++ dir, but no 
d.properties to be found.  The SciLexer.dll reports itself as file 
version 1.7.8.0, product version 1.78.  So maybe the question is of what 
effect replacing it with a fork of version 1.76 would have.  (Do SciTE 
versions correspond directly to Scintilla versions?)

 It seems like Notepad++ developers added their own highlighter plugin
 which takes userDefineLang.xml as its configuration.  Such a
 configurable plugin is presumably much less flexible than pure C++
 implementation for a particular language.  It's very likely that PHP
 highlighter is written in C++ and comes bundled with Scintilla.

It puzzles me that they didn't make this plugin powerful enough to 
highlight the language it (and indeed the whole of Notepad++) is written 
in.  Even more so considering the sheer number of C-like languages out 
there, which people are likely to want to use N++ to write.

Stewart.

Aug 12 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 13 Aug 2009 01:40:47 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 Wed, 12 Aug 2009 18:12:41 +0100, Stewart Gordon wrote:

 <snip>
 Scintilla uses plugins to highlight source.  These plugins are written
 in C++ and have almost full access to the buffer so the highlighter code
 may be arbitrarily complex.  I actually wrote such a plugin to highlight
 D a while back:
 
 http://dsource.org/projects/scrapple/browser/trunk/scilexer

 
 "1.  If you have SciTE 1.76 for Windows installed simply replace
 SciLexer.dll and d.properties with the supplied files.
 
 2.  If you wish to build Scintilla from source:"
 
 Can it be used in Scintilla-based editors besides SciTE short of 
 acquiring the whole Scintilla source and rebuilding it?

There are two problems at least:

1.  SciLexer.dll contains *all* of the built-in lexer modules.
Replacing your DLL with another version will remove any extra lexers
which 3rd party put there, like an XML-configurable lexer in case of
Notepad++.

2.  Lexers are written in C++ and interface with the rest of Scintilla
via C++ classes.  Therefore if a field is added or removed anywhere, or
if you use a different compiler to build your DLL than that used to
build Scintilla, you'll get GPF, or worse.

Good news is that Notepad++ is on SourceForge so that the "from source"
way is at least possible.

 For the record, there's a SciLexer.dll in my Notepad++ dir, but no 
 d.properties to be found.  The SciLexer.dll reports itself as file 
 version 1.7.8.0, product version 1.78.  So maybe the question is of what 
 effect replacing it with a fork of version 1.76 would have.  (Do SciTE 
 versions correspond directly to Scintilla versions?)

Yes, SciTE versions seem to be in sync with Scintilla versions.

 It seems like Notepad++ developers added their own highlighter plugin
 which takes userDefineLang.xml as its configuration.  Such a
 configurable plugin is presumably much less flexible than pure C++
 implementation for a particular language.  It's very likely that PHP
 highlighter is written in C++ and comes bundled with Scintilla.

 
 It puzzles me that they didn't make this plugin powerful enough to 
 highlight the language it (and indeed the whole of Notepad++) is written 
 in.  Even more so considering the sheer number of C-like languages out 
 there, which people are likely to want to use N++ to write.

Well I think it's hard to create a regular expression engine flexible
enough to allow arbitrary highlighting.  I think the best such engine
I've seen was Colorer by Igor Russkih, and even there I wasn't able to
express D's WYSIWYG or delimited strings.  You need a real programming
language for that.

---

I've just had a look at Notepad++ sources.  The Scintilla they use
contains Scintilla's built-in D lexer.  I think it's just not
configured.  SciTE uses *.properties files to configure stuff.
Notepad++ uses XML files for the same purpose.  I think it's all in
langs.model.xml.  My current idea is to take d.properties from the
corresponding release of SciTE and try to translate it into the
langs.model.xml format.  I'll probably try it later when I have time.

Of course it would be nice to replace the original D lexer with mine.
Or, even better, to ask Scintilla developers to include my lexer into
the official bundle.  May be worth a try.

Aug 12 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Sergey Gromov wrote:
 2.  Lexers are written in C++ and interface with the rest of Scintilla
 via C++ classes.  Therefore if a field is added or removed anywhere, or
 if you use a different compiler to build your DLL than that used to
 build Scintilla, you'll get GPF, or worse.

If they use binary interfacing with virtual functions a la COM's binary 
standard, then field presence shouldn't matter. Also, most compilers on 
Windows respect the basic ABI. No?

Andrei

Aug 12 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Wed, 12 Aug 2009 21:35:02 -0500, Andrei Alexandrescu wrote:

 Sergey Gromov wrote:
 2.  Lexers are written in C++ and interface with the rest of Scintilla
 via C++ classes.  Therefore if a field is added or removed anywhere, or
 if you use a different compiler to build your DLL than that used to
 build Scintilla, you'll get GPF, or worse.

 
 If they use binary interfacing with virtual functions a la COM's
 binary standard, then field presence shouldn't matter.

They don't, unfortunately.  Every lexer defines a static instance of a
LexerModule class.  The coloring function receives a reference to an
Accessor class.  They're full-blown classes, with fields and stuff.

 Also, most compilers on Windows respect the basic ABI. No?

Even though they don't use inheritance, and therefore most compilers
will likely build identical data layouts for them, there is still zero
compatibility between different versions of those classes.

Aug 12 2009

Kagamin <spam here.lot> writes:

Sergey Gromov Wrote:

 Or, even better, to ask Scintilla developers to include my lexer into
 the official bundle.  May be worth a try.

Uh... that's not an option.

Aug 13 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sergey Gromov wrote:
 Thu, 13 Aug 2009 01:40:47 +0100, Stewart Gordon wrote:

<snip>
 It puzzles me that they didn't make this plugin powerful enough to 
 highlight the language it (and indeed the whole of Notepad++) is written 
 in.  Even more so considering the sheer number of C-like languages out 
 there, which people are likely to want to use N++ to write.

 
 Well I think it's hard to create a regular expression engine flexible
 enough to allow arbitrary highlighting.

I can't see how it can be at all complicated to find the beginning and 
end of a C string or character literal.

This (Posix?) regexp

"(\\.|[^\\"])*"

works as I try (though not in the tiny subset of Posix regexps that N++ 
understands).  But that's an aside - you don't need regexps at all to 
get it working at this basic level, only a rudimentary concept of escape 
sequences.

 I think the best such engine
 I've seen was Colorer by Igor Russkih, and even there I wasn't able to
 express D's WYSIWYG or delimited strings.  You need a real programming
 language for that.

For WYSIWYG strings, all that's needed is a generic highlighter that 
supports:
- the aforementioned string escapes
- multiple types of string literals distinguished by whether they 
support string escapes, and not just delimiters

TextPad's syntax highlighting engine manages 2/3 of this without any 
regexps (or anything to that effect).  That said, I've just found that 
it can do a little bit of what remains: I can make it do `...` but not 
r"..." at the expense of distinguishing string and character literals.

But token-delimited strings are indeed more complex to deal with.  (How 
many people do we have putting them to practical use at the moment, for 
that matter?)

 ---
 
 I've just had a look at Notepad++ sources.  The Scintilla they use
 contains Scintilla's built-in D lexer.  I think it's just not
 configured.

Sounds as though N++'s developers overlooked to keep the configuration 
files up to date as new languages have been added to Scintilla.

 SciTE uses *.properties files to configure stuff.
 Notepad++ uses XML files for the same purpose.  I think it's all in
 langs.model.xml.  My current idea is to take d.properties from the
 corresponding release of SciTE and try to translate it into the
 langs.model.xml format.  I'll probably try it later when I have time.
 
 Of course it would be nice to replace the original D lexer with mine.
 Or, even better, to ask Scintilla developers to include my lexer into
 the official bundle.  May be worth a try.

You have two good plans there.

Scintilla's definition of a plugin is confusing - normally plugins are 
things that can be dynamically loaded at runtime, rather than having to 
compile them in.  If only....

Stewart.

Aug 13 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 Well I think it's hard to create a regular expression engine flexible
 enough to allow arbitrary highlighting.

 
 I can't see how it can be at all complicated to find the beginning and 
 end of a C string or character literal.
 
 This (Posix?) regexp
 
 "(\\.|[^\\"])*"
 
 works as I try (though not in the tiny subset of Posix regexps that N++ 
 understands).  But that's an aside - you don't need regexps at all to 
 get it working at this basic level, only a rudimentary concept of escape 
 sequences.
 
 I think the best such engine
 I've seen was Colorer by Igor Russkih, and even there I wasn't able to
 express D's WYSIWYG or delimited strings.  You need a real programming
 language for that.

 
 For WYSIWYG strings, all that's needed is a generic highlighter that 
 supports:
 - the aforementioned string escapes
 - multiple types of string literals distinguished by whether they 
 support string escapes, and not just delimiters
 
 TextPad's syntax highlighting engine manages 2/3 of this without any 
 regexps (or anything to that effect).  That said, I've just found that 
 it can do a little bit of what remains: I can make it do `...` but not 
 r"..." at the expense of distinguishing string and character literals.
 
 But token-delimited strings are indeed more complex to deal with.  (How 
 many people do we have putting them to practical use at the moment, for 
 that matter?)

Well, you can write a regexp to handle a simple C string.  That is, if
your regexp is matched against the whole file, which is usually not the
case.  Otherwise you'll have troubles with C string:

"foo\
bar"

or D string:

"foo
bar"

Then you want to highlight string escapes and probably format
specifiers.  Therefore you need not simple regexps but hierarchies of
them, and also you need to know where *internals* of the string start
and end.

Then you have r"foo" which probably can be handled with regexps.

Then you have q"/foo/" where "/" can be anything.  Still can be handled
by extended regexps, even though they won't be regular expressions in
scientific sense.

Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
Regexps cannot translate while substituting, so you must create regexps
for all possible parens.

And of course q"BLAH
whatever BLAH here
BLAH", well, probably nice for help texts.

And these are only strings.  Try to write regexp which treats .__15 as
number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
number(2), operator(..), number(3).

 Scintilla's definition of a plugin is confusing - normally plugins are 
 things that can be dynamically loaded at runtime, rather than having to 
 compile them in.  If only....

I'm not sure they call them "plugins".  They're lexer modules made so
that lexer is relatively easily extendable.

Aug 14 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sergey Gromov wrote:
<snip>
 Well, you can write a regexp to handle a simple C string.  That is, if
 your regexp is matched against the whole file, which is usually not the
 case.  Otherwise you'll have troubles with C string:
 
 "foo\
 bar"
 
 or D string:
 
 "foo
 bar"

So there is a problem if the highlighter works by matching regexps on a 
line-by-line basis.  But matching regexps over a whole file is no harder 
in principle than matching line-by-line and, when the maximal munch 
principle is never called to action, it can't be much less efficient. 
(The only bit of C or D strings that relies on maximal munch is octal 
escapes.)

 Then you want to highlight string escapes and probably format
 specifiers.  Therefore you need not simple regexps but hierarchies of
 them, and also you need to know where *internals* of the string start
 and end.

Let's just concentrate for the moment on the simple process of finding 
the beginning and end of a string.  Here's a snippet of a TextPad syntax 
file:

StringsSpanLines = Yes
StringStart = "
StringEnd = "
StringEsc = \

A possible snippet of lexer code to handle this (which FAIK might be 
near enough how TP does it):

if (*c == StringStart) {
     beginHighlightString(c);
     for (++c; *c != StringEnd && *c != '\0'
           &&(StringsSpanLines || *c != '\n'); ++c) {
         if (*c == StringEsc) ++c;
     }
     endHighlightString(c+1);
}

It's simple and it should work.  (OK, there are two assumptions made for 
simplicity: that line breaks are normalised to LF, and that the file is 
terminated by at least two null bytes in memory, but you get the idea.)

While it doesn't support highlighting of escapes, I can't see this fact 
as being the reason N++'s developers haven't implemented even this in 
the generic lexer module.  I probably couldn't see it being the reason 
even if the C lexer did highlight escapes (which it doesn't).

 Then you have r"foo" which probably can be handled with regexps.
 
 Then you have q"/foo/" where "/" can be anything.  Still can be handled
 by extended regexps, even though they won't be regular expressions in
 scientific sense.
 
 Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
 Regexps cannot translate while substituting, so you must create regexps
 for all possible parens.

Yes, these aspects are more complicated.  Both TP and N++ (out of the 
box, anyway) are probably far from being able to lex D2 properly.  But 
they certainly could do better in supporting D1.  Still, once N++ gains 
access to Scintilla's D lexer, things will certainly be better.

 And of course q"BLAH
 whatever BLAH here
 BLAH", well, probably nice for help texts.
 
 And these are only strings.  Try to write regexp which treats .__15 as
 number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
 number(2), operator(..), number(3).

<snip>

We'd need many regexps to handle all possible cases, but a possible set 
to cover these cases and a few others (listed in a possible order of 
priority) is:

\._*[0-9][0-9_]*
([1-9][0-9]*)(\.\.)
[0-9]+\.[0-9]*
[1-9][0-9]*
\.\.
\.
[a-zA-Z_][a-zA-Z0-9_]*

Note the use of capturing groups to handle the 2..3 case.  Each 
capturing group would match a token, while in the other cases the whole 
regexp matches a token.

Stewart.

Aug 14 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 
 "foo
 bar"

 
 So there is a problem if the highlighter works by matching regexps on a 
 line-by-line basis.  But matching regexps over a whole file is no harder 
 in principle than matching line-by-line and, when the maximal munch 
 principle is never called to action, it can't be much less efficient. 
 (The only bit of C or D strings that relies on maximal munch is octal 
 escapes.)

Highlighting the whole file every time a charater is typed is slow.
Scintilla doesn't do that.  It provides the lexer with a range of
changed lines.  The lexer is then free to choose a larger range if it
cannot deduce context from the initial range.  I tried to ignore this
range and re-highlight the whole file in my lexer.  The performance was
unacceptable.

 Then you want to highlight string escapes and probably format
 specifiers.  Therefore you need not simple regexps but hierarchies of
 them, and also you need to know where *internals* of the string start
 and end.

 
 Let's just concentrate for the moment on the simple process of finding 
 the beginning and end of a string.  Here's a snippet of a TextPad syntax 
 file:
 
 StringsSpanLines = Yes
 StringStart = "
 StringEnd = "
 StringEsc = \
 
 A possible snippet of lexer code to handle this (which FAIK might be 
 [...]

Sure, TextPad uses a dozen of simple hacks specific to lexing
programming languages.  They're ad-hoc and they're limited to exactly
what TextPad authors thought were important.

Regexps is a different approach.  They are more generic but are limited,
too, because they're slow and don't nest naturally.  Slow means they
must try to re-color as little lines as possible.  Not nestable means
you need to invent some framework around regexps which is another sort
of description language.  If you implement the former naively and ignore
the latter you'll get what presumably N++ has: not a very powerful
system.

It's actually trivial* to implement a lexer for Scintilla which would
work exactly as TextPad does, including use of the same configuration
files.

* That is, if you know exactly how TextPad works.

 And these are only strings.  Try to write regexp which treats .__15 as
 number(.__15), .__foo as operator(.), ident(__foo), and 2..3 as
 number(2), operator(..), number(3).

 <snip>
 
 We'd need many regexps to handle all possible cases, but a possible set 
 to cover these cases and a few others (listed in a possible order of 
 priority) is:
 
 \._*[0-9][0-9_]*
 ([1-9][0-9]*)(\.\.)
 [0-9]+\.[0-9]*
 [1-9][0-9]*
 \.\.
 \.
 [a-zA-Z_][a-zA-Z0-9_]*

Basically yes, but they're going to be much more complex.  3Lu...5 is
also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
still, regexps don't nest.  Don't you want to highlight DDoc sections
and macros?

Aug 15 2009

bearophile <bearophileHUGS lycos.com> writes:

Sergey Gromov:
 Sure, TextPad uses a dozen of simple hacks specific to lexing
 programming languages.  They're ad-hoc and they're limited to exactly
 what TextPad authors thought were important.

Today the difference isn't much important because CPUs are fast. But on Windows
with a Pentium3 Scintilla was very slow. TextPad was fast enough even for very
quick fingers. (TextPad may even contain some parts coded in assembly). TextPad
on Windows is very fast :-)

Bye,
bearophile

Aug 15 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sergey Gromov wrote:
 Sat, 15 Aug 2009 01:36:26 +0100, Stewart Gordon wrote:
 
 Sergey Gromov wrote:
 "foo
 bar"

 So there is a problem if the highlighter works by matching regexps on a 
 line-by-line basis.  But matching regexps over a whole file is no harder 
 in principle than matching line-by-line and, when the maximal munch 
 principle is never called to action, it can't be much less efficient. 
 (The only bit of C or D strings that relies on maximal munch is octal 
 escapes.)

 
 Highlighting the whole file every time a charater is typed is slow.
 Scintilla doesn't do that.  It provides the lexer with a range of
 changed lines.  The lexer is then free to choose a larger range if it
 cannot deduce context from the initial range.  I tried to ignore this
 range and re-highlight the whole file in my lexer.  The performance was
 unacceptable.

Of course.  I suppose now that the right strategy is line-by-line with 
some preservation of state between lines:

- Keep a note of the state at the beginning of each line
- When something is changed, re-highlight those lines that have changed
- Carry on re-highlighting until the state is back in sync with what was 
there before.  If this means going way beyond the visible area of the 
file, record the state of the next however many lines as unknown (so 
that it will have another go when/if those lines are later scrolled into 
view).
- If a range of lines that has just come into view begins in unknown 
state, it's up to the particular lexer module to start from the first 
visible line or backtrack as far as it likes to get some context.

Is this anything like how Scintilla works?

<snip>
 It's actually trivial* to implement a lexer for Scintilla which would
 work exactly as TextPad does, including use of the same configuration
 files.
 
 * That is, if you know exactly how TextPad works.

It would also be straightforward to improve TextPad's scheme to support 
an arbitrary number of string/comment types.  How about this as an 
all-in-one replacement for TP's comment and string syntax directives?

[DelimitedToken1]
Start = /**
End = */
Type = DocComment
SpanLines = Yes
Nest = No

[DelimitedToken2]
Start = /*!
End = */
Type = DocComment
SpanLines = Yes
Nest = No

[DelimitedToken3]
Start = /*
End = */
Type = Comment
SpanLines = Yes
Nest = No

[DelimitedToken4]
Start = /+
End = +/
Type = Comment
SpanLines = Yes
Nest = Yes

[DelimitedToken5]
Start = //
Type = Comment
SpanLines = No
Nest = No

[DelimitedToken6]
Start = r"
End = "
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken7]
Start = `
End = `
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken8]
Start = "
End = "
Esc = \
Type = String
SpanLines = Yes
Nest = No

[DelimitedToken9]
Start = '
End = '
Esc = \
Type = Char
SpanLines = No
Nest = No

There, we have all of D1 covered now, and not a regexp in sight.

<snip>
 Basically yes, but they're going to be much more complex.  3Lu...5 is
 also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
 still, regexps don't nest.  Don't you want to highlight DDoc sections
 and macros?

That would be nice as well, as would being able to do things with 
Doxygen comments.  But let's not try to run before we can walk.

Stewart.

Aug 17 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Mon, 17 Aug 2009 21:23:56 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 Highlighting the whole file every time a charater is typed is slow.
 Scintilla doesn't do that.  It provides the lexer with a range of
 changed lines.  The lexer is then free to choose a larger range if it
 cannot deduce context from the initial range.  I tried to ignore this
 range and re-highlight the whole file in my lexer.  The performance was
 unacceptable.

 
 Of course.  I suppose now that the right strategy is line-by-line with 
 some preservation of state between lines:
 
 - Keep a note of the state at the beginning of each line
 - When something is changed, re-highlight those lines that have changed
 - Carry on re-highlighting until the state is back in sync with what was 
 there before.  If this means going way beyond the visible area of the 
 file, record the state of the next however many lines as unknown (so 
 that it will have another go when/if those lines are later scrolled into 
 view).
 - If a range of lines that has just come into view begins in unknown 
 state, it's up to the particular lexer module to start from the first 
 visible line or backtrack as far as it likes to get some context.
 
 Is this anything like how Scintilla works?

Exactly.  There is a 32-bit "style" known for every character, plus
another 32-bit field associated with every line.  A lexer is free to use
these fields for any purpose, except the lower byte of a style defines
the characters' color.

 
 <snip>
 It's actually trivial* to implement a lexer for Scintilla which would
 work exactly as TextPad does, including use of the same configuration
 files.
 
 * That is, if you know exactly how TextPad works.

 
 It would also be straightforward to improve TextPad's scheme to support 
 an arbitrary number of string/comment types.  How about this as an 
 all-in-one replacement for TP's comment and string syntax directives?
 
 [...]
 
 [DelimitedToken8]
 Start = "
 End = "
 Esc = \
 Type = String
 SpanLines = Yes
 Nest = No
 
 [DelimitedToken9]
 Start = '
 End = '
 Esc = \
 Type = Char
 SpanLines = No
 Nest = No
 
 There, we have all of D1 covered now, and not a regexp in sight.

Yes and no, because your ad-hoc format doesn't cover subtle differences
between C and D strings.  Like C strings don't support embedded EOLs.
Though you may consider this minor.

 <snip>
 Basically yes, but they're going to be much more complex.  3Lu...5 is
 also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
 still, regexps don't nest.  Don't you want to highlight DDoc sections
 and macros?

 
 That would be nice as well, as would being able to do things with 
 Doxygen comments.  But let's not try to run before we can walk.

This assumes that TextPad could run at some point.  ;)  This is exactly
where I'm sceptical.  I think that when it runs it'll have so many weird
rules and settings that it won't be fun anymore.  And they won't be
powerful enough for anything authors didn't consider anyway.

Aug 17 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sergey Gromov wrote:
 Mon, 17 Aug 2009 21:23:56 +0100, Stewart Gordon wrote:

<snip>
 Is this anything like how Scintilla works?

 
 Exactly.  There is a 32-bit "style" known for every character, plus
 another 32-bit field associated with every line.  A lexer is free to use
 these fields for any purpose, except the lower byte of a style defines
 the characters' color.

Does it keep around in memory the style of every character, or only the 
32-bit field associated with the line so that the lexer can re-style the 
characters on repaint/scroll?

<snip>
 [DelimitedToken9]
 Start = '
 End = '
 Esc = \
 Type = Char
 SpanLines = No
 Nest = No

 There, we have all of D1 covered now, and not a regexp in sight.

 
 Yes and no, because your ad-hoc format doesn't cover subtle differences
 between C and D strings.  Like C strings don't support embedded EOLs.

I don't understand.  How does SpanLines not achieve this?

Then what _does_ SpanLines achieve according to whatever conclusion 
you've come to?

 Though you may consider this minor.
 
 <snip>
 Basically yes, but they're going to be much more complex.  3Lu...5 is
 also a range.  0x3e22.f5p6fi is a valid floating-point number.  And
 still, regexps don't nest.  Don't you want to highlight DDoc sections
 and macros?

 That would be nice as well, as would being able to do things with 
 Doxygen comments.  But let's not try to run before we can walk.

 
 This assumes that TextPad could run at some point. 

You're right - it turns out TP doesn't get all the D floating point 
notations right.  It appears that TP has hard-coded the syntax of C 
numeric literals.  I must've just not noticed since I had never before 
changed the number colour from the same as the default text colour.

Maybe we do want regexps for all these floating point notations after all.

 ;)  This is exactly where I'm sceptical.  I think that when it runs 
 it'll have so many weird rules and settings that it won't be fun 
 anymore.  And they won't be powerful enough for anything authors 
 didn't consider anyway.

Maybe someone can come up with something....

Stewart.

Aug 18 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Tue, 18 Aug 2009 20:40:37 +0100, Stewart Gordon wrote:

 Sergey Gromov wrote:
 Exactly.  There is a 32-bit "style" known for every character, plus
 another 32-bit field associated with every line.  A lexer is free to use
 these fields for any purpose, except the lower byte of a style defines
 the characters' color.

 
 Does it keep around in memory the style of every character, or only the 
 32-bit field associated with the line so that the lexer can re-style the 
 characters on repaint/scroll?

It can tell about any character of which style it is.  This is to
repaint unchanged lines without ever calling a lexer.

 <snip>
 [DelimitedToken9]
 Start = '
 End = '
 Esc = \
 Type = Char
 SpanLines = No
 Nest = No

 There, we have all of D1 covered now, and not a regexp in sight.

 
 Yes and no, because your ad-hoc format doesn't cover subtle differences
 between C and D strings.  Like C strings don't support embedded EOLs.

 
 I don't understand.  How does SpanLines not achieve this?
 
 Then what _does_ SpanLines achieve according to whatever conclusion 
 you've come to?

Here's a string which is valid in D but is invalid in C:

"foo
bar"

Here's another string which is, on the contrary, valid in C but is
invalid in D:

"foo\
bar"

They both "span lines."

Aug 20 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sergey Gromov wrote:
<snip>
 Here's a string which is valid in D but is invalid in C:
 
 "foo
 bar"
 
 Here's another string which is, on the contrary, valid in C but is
 invalid in D:
 
 "foo\
 bar"
 
 They both "span lines."

Doesn't quite relate to what I was querying ... but anyway, it's 
perfectly straightforward to add another rule like

LineSplice = \

among other possibilities.

You could argue over whether it's worth going to all this effort, if you 
think the only point is to support C, C++ and D.  But really, there are 
many C-like languages out there with their own slightly different rules, 
and even the likes of Prolog, SQL and Unix shell scripts with their own 
variants of C string syntax.  I think the scheme I've come up with would 
be a good way to capture the subtle differences between these languages' 
string syntaxes, while at the same time being something that the average 
user wanting to add a new language to the system should be able to get 
their head around sooner or later.

Stewart.

Aug 21 2009

Don <nospam nospam.com> writes:

Sergey Gromov wrote:
 Thu, 13 Aug 2009 22:57:24 +0100, Stewart Gordon wrote:
 
 Sergey Gromov wrote:
 Well I think it's hard to create a regular expression engine flexible
 enough to allow arbitrary highlighting.

 I can't see how it can be at all complicated to find the beginning and 
 end of a C string or character literal.

 This (Posix?) regexp

 "(\\.|[^\\"])*"

 works as I try (though not in the tiny subset of Posix regexps that N++ 
 understands).  But that's an aside - you don't need regexps at all to 
 get it working at this basic level, only a rudimentary concept of escape 
 sequences.

 I think the best such engine
 I've seen was Colorer by Igor Russkih, and even there I wasn't able to
 express D's WYSIWYG or delimited strings.  You need a real programming
 language for that.

 For WYSIWYG strings, all that's needed is a generic highlighter that 
 supports:
 - the aforementioned string escapes
 - multiple types of string literals distinguished by whether they 
 support string escapes, and not just delimiters

 TextPad's syntax highlighting engine manages 2/3 of this without any 
 regexps (or anything to that effect).  That said, I've just found that 
 it can do a little bit of what remains: I can make it do `...` but not 
 r"..." at the expense of distinguishing string and character literals.

 But token-delimited strings are indeed more complex to deal with.  (How 
 many people do we have putting them to practical use at the moment, for 
 that matter?)

 
 Well, you can write a regexp to handle a simple C string.  That is, if
 your regexp is matched against the whole file, which is usually not the
 case.  Otherwise you'll have troubles with C string:
 
 "foo\
 bar"
 
 or D string:
 
 "foo
 bar"
 
 Then you want to highlight string escapes and probably format
 specifiers.  Therefore you need not simple regexps but hierarchies of
 them, and also you need to know where *internals* of the string start
 and end.
 
 Then you have r"foo" which probably can be handled with regexps.
 
 Then you have q"/foo/" where "/" can be anything.  Still can be handled
 by extended regexps, even though they won't be regular expressions in
 scientific sense.
 
 Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
 Regexps cannot translate while substituting, so you must create regexps
 for all possible parens.

Remember that the whole point of q{} strings was that they should NOT be 
highlighted as strings!

Aug 17 2009

Sergey Gromov <snake.scaly gmail.com> writes:

Mon, 17 Aug 2009 10:37:47 +0200, Don wrote:

 Sergey Gromov wrote:
 Then you have q"{foo}" where "{" and "}" can be any of ()[]<>{}.
 Regexps cannot translate while substituting, so you must create regexps
 for all possible parens.

 
 Remember that the whole point of q{} strings was that they should NOT be 
 highlighted as strings!

You confuse q{} and q"{}" here.  The former is a token string which may
contain only valid D tokens.  The latter is a delimited string with
nesting delimiters.  Like q"<<a href="#hi">hello</a>>".

Aug 17 2009

Stewart Gordon <smjg_1998 yahoo.com> writes:

Stewart Gordon wrote:
<snip>
 TextPad's syntax highlighting engine manages 2/3 of this without any 
 regexps (or anything to that effect).  That said, I've just found that 
 it can do a little bit of what remains: I can make it do `...` but not 
 r"..." at the expense of distinguishing string and character literals.

<snip>

For the record, what I'd done is

StringStart = "
StringEnd = "
StringAlt = '
StringEsc = \
CharStart = `
CharEnd = `
CharEsc =

however, I've just found a bigger problem: only string literals, not 
char literals, can span lines in TP.

Stewart.

Aug 14 2009

Kagamin <spam here.lot> writes:

Stewart Gordon Wrote:

 For the record, there's a SciLexer.dll in my Notepad++ dir, but no 
 d.properties to be found.  The SciLexer.dll reports itself as file 
 version 1.7.8.0, product version 1.78.  So maybe the question is of what 
 effect replacing it with a fork of version 1.76 would have.  (Do SciTE 
 versions correspond directly to Scintilla versions?)

Wrong lexer is used here. Scintilla builtin d lexer supported nested comments
and escape sequences from version 1.72, but support for multiline strings was
added in version 1.79.

Aug 13 2009

Jussi Jumppanen <jussij zeusedit.com> writes:

Stewart Gordon Wrote:

 Or maybe I should just go back to TextPad (which isn't perfect 
 either) and put up with its not supporting Unicode....

FWIW Zeus is very similar to TextPad in feature set and the latest 
version also adds support for Unicode/UTF8.

   http://www.zeusedit.com/

It will do D syntax highlighting and code folding out of the box.

It also comes with a version of ctags.exe made with these 
changes specifically for the D languages:

   http://www.zeusedit.com/z300/ctags_src.zip

meaning it can produce tags infomation for your D source files.

NOTE: Zeus like TextPad is shareware.

Jussi Jumppanen
Author: Zeus for Windows

Aug 12 2009

Kagamin <spam here.lot> writes:

Stewart Gordon Wrote:

 Anyway, attached is the result.  Can anybody do better (other than by 
 telling it to treat D as C or some other language instead)?

I don't see how the lexer is being chosen.
Programmer's Notepad does it correctly.

Aug 13 2009

Kagamin <spam here.lot> writes:

Nick Sabalausky Wrote:

 I don't see how the lexer is being chosen.
 Programmer's Notepad does it correctly.

 
 I use Programmer's Notepad. It's good, but it still has some problems:
 
 http://code.google.com/p/pnotepad/issues/detail?id=480 (Proper Highlighting 
 for D's Wysiwyg Strings)
 http://code.google.com/p/pnotepad/issues/detail?id=481 (In D, strings with 
 embedded newlines are not highlighted correctly)
 http://code.google.com/p/pnotepad/issues/detail?id=482 (Support for D's 
 nested comments)

At least PN chooses lexer. That's what I meant.

These issues do not pertain to PN. They're RFEs for Scintilla D lexer and as I
said they were fixed in version 1.79. PN developer just plans to upgrade to new
Scintilla in PN 3, in fact I compiled scintilla 1.78 with recent D lexer an it
works fine. BTW bug 482 is invalid, support for nested comments was there from
the start, make sure you don't use C lexer.

Aug 14 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Notepad++