www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 1772] New: regexp.split behaves incorrectly for paths with captures

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772

           Summary: regexp.split behaves incorrectly for paths with captures
           Product: D
           Version: 1.025
          Platform: PC
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: bugzilla digitalmars.com
        ReportedBy: wbaxter gmail.com


I want to split columns out of a row of numbers.  They may be separated by
comas or by just white space.  So I tried this:

The splitter regexp
    auto re_splitter = new RegExp(r"(\s+|\s*,\s*)");
    char[][] numbers = re_splitter.split(line);

if input is a line like:
410.90711,352.879

The output from that is the array
[410.90711,,,352.879]

After a bit of debugging, it turns out the problem is the grouping in the
regexp.
Removing the parens fixes the problem in this case, but there are cases where
you need parens for grouping and not for the capturing side effect.  So I think
this is a bug.  Only match 0 should be considered significant for splitting,
not the submatches.


-- 
Jan 07 2008
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772


wbaxter gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|regexp.split behaves        |regexp.split behaves
                   |incorrectly for paths with  |incorrectly using regexps
                   |captures                    |with captures





fixed summary


-- 
Jan 07 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772


wbaxter gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|regexp.split behaves        |regexp.split behavior with
                   |incorrectly using regexps   |captures needs to be
                   |with captures               |documented





It seems I've been duped by writefln's output.
Further investigation shows that this:
   [410.90711,,,352.879]
is actually this:
   ["410.90711",  ",",  "352.879"]
and not a 4-element list with two empty strings as I thought.

I discovered this because I checked what python does with captures, and it is
this:
"""
If capturing parentheses are used in pattern, then the text of all groups in
the pattern are also returned as part of the resulting list. 
"""
So that made me think maybe D could be trying to do something similar.

Apparently it is.  So please just document it.


-- 
Jan 08 2008
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |andrei metalanguage.com
         AssignedTo|nobody puremagic.com        |andrei metalanguage.com


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 11 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|andrei metalanguage.com     |dmitry.olsh gmail.com



17:48:10 PDT ---
Reassigning to Dmitry.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 04 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772




03:28:16 PDT ---
https://github.com/D-Programming-Language/phobos/pull/491

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 12 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772




Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/62b464b48d61b076c89f7585dc0ac7632f57ba49
fix Issue 1772 - regexp.split behavior with captures needs to be documented

A documentation clarification, the report itself is largely outdated.

https://github.com/D-Programming-Language/phobos/commit/6d782c6efd9ba6a7b7a314002a52d3455fa00d8c


fix Issue 1772 - regexp.split behavior with captures needs to be documen...

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 14 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772


yebblies <yebblies gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |yebblies gmail.com



Is this fixed/D1 only now?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 23 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|1.025                       |D1



05:17:17 PDT ---
I'm no expert D1 stuff, but I belive issue is still applicable for D1.
Come to think of, I closed few D1 issues like this in the past, maybe we should
close this one too (marked as D1 for now).
D1/D2 regexp is broken in many ways and nobody is doing any work on Phobos/D1
to fix it AFIAK, Tango folks have their own regex anyway.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 23 2012
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=1772


yebblies <yebblies gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
           Platform|x86                         |All
         AssignedTo|dmitry.olsh gmail.com       |nobody puremagic.com
            Summary|regexp.split behavior with  |(D1 only) regexp.split
                   |captures needs to be        |behavior with captures
                   |documented                  |needs to be documented
         OS/Version|Windows                     |All



I guess it can be closed when D1 is discontinued at the end of the year.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 23 2012