www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Volunteer for research project?

reply Brad Roberts <braddr puremagic.com> writes:
Would any of you be interested in helping out (read that as "doing") a research
/ data mining project for us?  I'd love
to take all of the regressions this year (or for the last year, or whatever
period of time can be reasonably
accomplished) and track them back to which commit introduced each of them
(already done for some of them).  From there,
I'd like to see what sort of correlations can be found.  Is there a particular
area of code that's responsible for them.
 Is there a particular feature (spread across a lot of files, maybe) that's
responsible.  Etc.

Maybe it's all over the map.  Maybe it will highlight one or a few areas to
take a harder look at.

Anyone interested?

Thanks,
Brad
Feb 20 2013
parent reply "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Thursday, 21 February 2013 at 07:03:08 UTC, Brad Roberts wrote:
 Would any of you be interested in helping out (read that as 
 "doing") a research / data mining project for us?  I'd love
 to take all of the regressions this year (or for the last year, 
 or whatever period of time can be reasonably
 accomplished) and track them back to which commit introduced 
 each of them (already done for some of them).  From there,
 I'd like to see what sort of correlations can be found.  Is 
 there a particular area of code that's responsible for them.
  Is there a particular feature (spread across a lot of files, 
 maybe) that's responsible.  Etc.

 Maybe it's all over the map.  Maybe it will highlight one or a 
 few areas to take a harder look at.

 Anyone interested?

 Thanks,
 Brad
It sounds interesting, but what are you expecting to found? And how much are you sure you can found something? I would expect that often code which fixes some feature breaks the same feature in another aspect of functioning which is quite obvious. Sometimes one code relies implicitly on functioning of other code, so when you change the the latter, the former stops working correctly. You provide example with spreading across several files - how does knowing this helps in reducing regressions?
Feb 21 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Feb 22, 2013 at 06:51:53AM +0100, Maxim Fomin wrote:
 On Thursday, 21 February 2013 at 07:03:08 UTC, Brad Roberts wrote:
Would any of you be interested in helping out (read that as "doing")
a research / data mining project for us?  I'd love to take all of the
regressions this year (or for the last year, or whatever period of
time can be reasonably accomplished) and track them back to which
commit introduced each of them (already done for some of them).  From
there, I'd like to see what sort of correlations can be found.  Is
there a particular area of code that's responsible for them.  Is
there a particular feature (spread across a lot of files, maybe)
that's responsible.  Etc.

Maybe it's all over the map.  Maybe it will highlight one or a few
areas to take a harder look at.

Anyone interested?

Thanks,
Brad
It sounds interesting, but what are you expecting to found? And how much are you sure you can found something? I would expect that often code which fixes some feature breaks the same feature in another aspect of functioning which is quite obvious. Sometimes one code relies implicitly on functioning of other code, so when you change the the latter, the former stops working correctly. You provide example with spreading across several files - how does knowing this helps in reducing regressions?
I would think he's referring to issues that are filed in the bugtracker. Obviously, we have no way of knowing if a code change broke something if nobody found any bug afterwards! So I'm thinking it's probably a matter of going through the regression bugs in the bugtracker, and making test cases to reproduce them, and then use git bisect to figure out which commit introduced the problem. T -- Public parking: euphemism for paid parking. -- Flora
Feb 21 2013
parent "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Friday, 22 February 2013 at 06:02:20 UTC, H. S. Teoh wrote:
 I would think he's referring to issues that are filed in the 
 bugtracker.
 Obviously, we have no way of knowing if a code change broke 
 something if
 nobody found any bug afterwards!
Yes, it is obvious that he refers to bugzilla issues.
 So I'm thinking it's probably a matter of going through the 
 regression
 bugs in the bugtracker, and making test cases to reproduce 
 them, and
 then use git bisect to figure out which commit introduced the 
 problem.


 T
This is also obvious. The question is what to do with such information next, how to analyze it and interpret the results. For example http://d.puremagic.com/issues/show_bug.cgi?id=9406 (there is commit which introduced regression). What can you infer from fixed regressions (http://d.puremagic.com/issues/buglist.cgi?query_format=advanced&bug_severity=regression&bug_status=RESOLV D&resolution=FIXED) which can be useful in fighting against non-closed ones? P.S. There is something wrong either with forum or with your answering. The discussion in mailbox is single piece, but in forum it is splitted into two threads. Posting message in one thread in answering to reply in another is strange. Do you use email for answering or forum?
Feb 21 2013
prev sibling parent Brad Roberts <braddr puremagic.com> writes:
On 2/21/2013 10:00 PM, H. S. Teoh wrote:
 On Fri, Feb 22, 2013 at 06:51:53AM +0100, Maxim Fomin wrote:
 On Thursday, 21 February 2013 at 07:03:08 UTC, Brad Roberts wrote:
 Would any of you be interested in helping out (read that as "doing")
 a research / data mining project for us?  I'd love to take all of the
 regressions this year (or for the last year, or whatever period of
 time can be reasonably accomplished) and track them back to which
 commit introduced each of them (already done for some of them).  From
 there, I'd like to see what sort of correlations can be found.  Is
 there a particular area of code that's responsible for them.  Is
 there a particular feature (spread across a lot of files, maybe)
 that's responsible.  Etc.

 Maybe it's all over the map.  Maybe it will highlight one or a few
 areas to take a harder look at.

 Anyone interested?

 Thanks,
 Brad
It sounds interesting, but what are you expecting to found? And how much are you sure you can found something? I would expect that often code which fixes some feature breaks the same feature in another aspect of functioning which is quite obvious. Sometimes one code relies implicitly on functioning of other code, so when you change the the latter, the former stops working correctly. You provide example with spreading across several files - how does knowing this helps in reducing regressions?
I would think he's referring to issues that are filed in the bugtracker. Obviously, we have no way of knowing if a code change broke something if nobody found any bug afterwards! So I'm thinking it's probably a matter of going through the regression bugs in the bugtracker, and making test cases to reproduce them, and then use git bisect to figure out which commit introduced the problem. T
Pretty much that. (Nearly) every bug comes with a test case already. The part that will be work is taking that test case and finding the exact commit that broke it. By definition, a regression once worked and something changed that broke it. My hope is that one or more people can spend some time going through each regression report in bugzilla and tracking down the exact commit for each. What will be uncovered by the effort? Who knows. It's better to not try to anticipate or predict since that can bias the analysis. The entire point of the exercise is to find out. If there is one or move obvious or detectible clusters, that gives us some interesting data. It might well point out a part of the code that's particularly sensitive to change. Or is very poorly covered by the test suite. Or is flawed in some other way. Regardless, if there are clusters, it's worth some study and pondering to consider what can be done to make it/them NOT hot beds of regressions. It's a research project. It might turn out to yield nothing useful. That's certainly a risk. I suspect it won't turn out to be fruitless. To seed the effort, here's all the regression bugs that have changed since the beginning of the year: http://d.puremagic.com/issues/buglist.cgi?chfieldto=Now&query_format=advanced&chfieldfrom=2013-01-01&bug_severity=regression&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=RESOLVED&bug_status=VERIFIED&bug_status=CLOSED
Feb 21 2013