www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - The delang is using merge instead of rebase/squash

reply deadalnix <deadalnix gmail.com> writes:
This is making the history very spaghettified. Is that possible 
to have the bot rebase/squash commits and then pushing ?
Mar 15 2017
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 15 March 2017 at 13:14:31 UTC, deadalnix wrote:
 This is making the history very spaghettified. Is that possible 
 to have the bot rebase/squash commits and then pushing ?
Arf I fat fingered the title, i meant the dlang bot.
Mar 15 2017
parent Seb <seb wilzba.ch> writes:
On Wednesday, 15 March 2017 at 13:23:00 UTC, deadalnix wrote:
 On Wednesday, 15 March 2017 at 13:14:31 UTC, deadalnix wrote:
 This is making the history very spaghettified. Is that 
 possible to have the bot rebase/squash commits and then 
 pushing ?
Arf I fat fingered the title, i meant the dlang bot.
I absolutely agree with you and in fact Andrei and I pushed for auto-merge-squash, which was enabled for the last three months. However, recently Martin disabled it to prevent "accidental squashes". Discussion: https://github.com/dlang-bots/dlang-bot/issues/64
Mar 15 2017
prev sibling parent reply Martin Nowak <code dawg.eu> writes:
On Wednesday, 15 March 2017 at 13:14:31 UTC, deadalnix wrote:
 This is making the history very spaghettified. Is that possible 
 to have the bot rebase/squash commits and then pushing ?
I don't really agree with the argument. A merge commit is a clear way to integrate changes from a PR/branch. Just rebasing a PR on top of master removes a lot of information from git, only leaving references to github. Can you be more specific, what you mean w/ spaghetti? The fact that reciew fixes are added to PRs. Also github's commit view misleadingly shows commits from merged PRs/branches, which aren't actually in master.
Mar 19 2017
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Monday, 20 March 2017 at 05:10:04 UTC, Martin Nowak wrote:
 On Wednesday, 15 March 2017 at 13:14:31 UTC, deadalnix wrote:
 This is making the history very spaghettified. Is that 
 possible to have the bot rebase/squash commits and then 
 pushing ?
I don't really agree with the argument. A merge commit is a clear way to integrate changes from a PR/branch. Just rebasing a PR on top of master removes a lot of information from git, only leaving references to github. Can you be more specific, what you mean w/ spaghetti? The fact that reciew fixes are added to PRs. Also github's commit view misleadingly shows commits from merged PRs/branches, which aren't actually in master.
Because a picture is clearer than a thousand words: | | | | | | | | * | | | | | | | 08ae52d8 The Dlang Bot RazvanN7/Update_generated | |_|_|_|_|/ / / |/| | | | | | | | | | | | | | | | * | | | | | | c6480976 RazvanN7 |/ / / / / / / Updated posix.mak makefile to use ../tools/checkwhitespace.d | | | | | | | * | | | | | | 1181fcf7 The Dlang Bot sprinkle131313/ignore-vscode-lib | | | | | | | | | * | | | | | | f1b8d0d4 sprinkle131313 | | | | | | | | Add temp/tmp folder to gitignore. | | | | | | | | | * | | | | | | b67bf9d1 sprinkle131313 | | |_|/ / / / Add vscode folder and lib files to gitignore. | |/| | | | | | | | | | | | * | | | | | | 0b41c996 The Dlang Bot wilzbach/fix-lref-links | | | | | | | | | * | | | | | | 090d5164 Sebastian Wilzbach |/ / / / / / / Fix links from $(LREF $(D ...)) -> $(LREF ...) | | | | | | | * | | | | | | f2a019df The Dlang Bot MartinNowak/merge_stable | | | | | | | | | * | | | | | | a6cb85b8 Sebastian Wilzbach | | | | | | | | Add safe to std.regex unittest | | | | | | | | | * | | | | | | ad70b082 Martin Nowak | |\ \ \ \ \ \ \ Merge remote-tracking branch 'upstream/stable' into merge_stable |/ / / / / / / / | | | | | | | | | * | | | | | | 694dd174 Stefan Koch DmitryOlshansky/fix-freeform-regex | | | | | | | | | | | * | | | | | | 62cf615d Dmitry Olshansky | |/ / / / / / / Fix issue 17212 std.regex doesn't ignore whitespace after character classes | | | | | | | | * | | | | | | | 5b07bd59 Sebastian Wilzbach | |_|_|_|/ / / [BOOKTABLES]: Add BOOKTABLE to |/| | | | | | | | | | | | | * | | | | | | 75059373 Jack Stouffer wilzbach/booktable-std-utf | |_|_|_|_|_|/ |/| | | | | | What the hell is going on in there ? In addition there are a bunch of practical issues with this way of doing things. First there is no given that any intermediate state is sound, or even builds at all. That makes it very hard to bissect anything. There are also a lot of errands and correction that are made during review that are not that interesting to keep in the project history. Knowing that someone did thing the A way and then changed it the B way after review is more noise than useful infos in the general case, and in the rare case when someone actually wants to know, the github PR is still out there (on that note, yes GH PR kind fo sucks, but that's another topic).
Mar 20 2017
next sibling parent Martin Nowak <code dawg.eu> writes:
On Monday, 20 March 2017 at 12:25:22 UTC, deadalnix wrote:
 In addition there are a bunch of practical issues with this way 
 of doing things. First there is no given that any intermediate 
 state is sound, or even builds at all. That makes it very hard 
 to bissect anything.
You bissect on master and there is one merge commit per PR, no intermediate states involved.
 There are also a lot of errands and correction that are made 
 during review that are not that interesting to keep in the 
 project history. Knowing that someone did thing the A way and 
 then changed it the B way after review is more noise than 
 useful infos in the general case, and in the rare case when 
 someone actually wants to know, the github PR is still out 
 there (on that note, yes GH PR kind fo sucks, but that's 
 another topic).
That's a conflict of interest. As said, GH's interface is targetted toward pushing review fixes as new commits, rather than ammending changes. And yes those commits mostly provide little information, but they're also on a separate branch. Using auto-squash before merging https://github.com/dlang-bots/dlang-bot/issues/64#issuecomment-284155249 made sense but isn't offered by GH's API and thus requires quite some work. Just squashing everything to a single commit and putting that on master (as done by GH's squash+rebase) doesn't preserve all information in git. Also smells like it might cause automation troubles at some point.
Mar 20 2017
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Monday, 20 March 2017 at 12:25:22 UTC, deadalnix wrote:
 Because a picture is clearer than a thousand words:
What this tells me is that the default way git-log presents history is not very useful. Consider this presentation of the same information: RazvanN7/Update_generated - c6480976 RazvanN7: Updated posix.mak makefile to use ../tools/checkwhitespace.d sprinkle131313/ignore-vscode-lib - f1b8d0d4 sprinkle131313: Add temp/tmp folder to gitignore. - b67bf9d1 sprinkle131313: Add vscode folder and lib files to gitignore. wilzbach/fix-lref-links - 090d5164 Sebastian Wilzbach: Fix links from $(LREF $(D ...)) -> $(LREF ...) MartinNowak/merge_stable - a6cb85b8 Sebastian Wilzbach: Add safe to std.regex unittest - ad70b082 Martin Nowak: Merge remote-tracking branch 'upstream/stable' into merge_st… DmitryOlshansky/fix-freeform-… - 62cf615d Dmitry Olshansky: Fix issue 17212 std.regex doesn't ignore whitespace … 5b07bd59 Sebastian Wilzbach: [BOOKTABLES]: Add BOOKTABLE to wilzbach/booktable-std-utf In particular, the origin commit of a branch is often not interesting; only the list of commits that are on one branch and aren't on another are. Anyway, personally I don't think there is a severe problem in dire need of fixing with the git log excerpt you pasted. If you're interested in looking at changes to the master branch, look for asterisks in the first column.
 In addition there are a bunch of practical issues with this way 
 of doing things.
There seem to be more practical issues with the opposite approach.
 First there is no given that any intermediate state is sound, 
 or even builds at all. That makes it very hard to bissect 
 anything.
Bisecting D is not something that can be reasonably done by looking at just one repository's history anyway; this is why we have D-dot-git and Digger. Either way, for pull requests that make non-trivial changes or additions, you will need to descend into the pull request itself.
 There are also a lot of errands and correction that are made 
 during review that are not that interesting to keep in the 
 project history. Knowing that someone did thing the A way and 
 then changed it the B way after review is more noise than 
 useful infos in the general case,
Agreed, this is one case where squashing is appropriate. However, consider the worst-case scenarios where either merge strategy is abused: - If a pull request that should have been squashed has been merged without squashing, the result is: - Some clutter in the git history; - Possible (but avoidable) complications when doing git-bisect on a single repository, which you shouldn't be doing anyway. - If a pull request that should not have been squashed has been squashed while merging, the result is: - Commit messages are lost and remain available only on GitHub. - Any logical separation of changes that might have been represented through separate commits is lost and remains available only on GitHub. - "git blame" becomes less useful because it can only lead to the big blob of the squashed changes. - "git blame" becomes less useful because in some situations it loses its ability to track moved code, which should and often is done in separate commits. - Bisection becomes more difficult because it is no longer easily possible to dive into a PR, as has been occasionally necessary. In general, I am not opposed to giving reviewers the option to merge pull requests with squashing, assuming we can all agree to not abuse it and only use it for PRs where there nothing useful can be gained by preserving the multiple commits as they are; however, their words and actions have shown that this doesn't seem to be an attainable point of agreement.
Mar 20 2017
parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 21 March 2017 at 01:39:39 UTC, Vladimir Panteleev 
wrote:
 On Monday, 20 March 2017 at 12:25:22 UTC, deadalnix wrote:
 Because a picture is clearer than a thousand words:
What this tells me is that the default way git-log presents history is not very useful. Consider this presentation of the same information:
It's not good either. Why would I want to look at a DAG when the serie of event is strictly linear to begin with ?
 In particular, the origin commit of a branch is often not 
 interesting; only the list of commits that are on one branch 
 and aren't on another are.
Yes, that's why rebasing makes thing clearer. Nobody care what the master commit was when the work was started.
 First there is no given that any intermediate state is sound, 
 or even builds at all. That makes it very hard to bissect 
 anything.
Bisecting D is not something that can be reasonably done by looking at just one repository's history anyway; this is why we have D-dot-git and Digger. Either way, for pull requests that make non-trivial changes or additions, you will need to descend into the pull request itself.
"Our source control is completely broken, but that's not a problem because we developed 3rd party tools to work around the brokenness" While I agree with you that things like bisecting are broken in D, I don't see it as a reason to screw things up even more. I'm not a big fan of "it's already broken, so we can break it even more". This should, and can, be fixed. https://danluu.com/monorepo/ Incidentally, I got a company contacting me last week willing to pay me good money to help them transition toward these kind of workflow.
 - If a pull request that should not have been squashed has been 
 squashed while merging, the result is:
   - Commit messages are lost and remain available only on 
 GitHub.
   - Any logical separation of changes that might have been 
 represented through separate commits is lost and remains 
 available only on GitHub.
   - "git blame" becomes less useful because it can only lead to 
 the big blob of the squashed changes.
   - "git blame" becomes less useful because in some situations 
 it loses its ability to track moved code, which should and 
 often is done in separate commits.
   - Bisection becomes more difficult because it is no longer 
 easily possible to dive into a PR, as has been occasionally 
 necessary.
Then it should have been 2 PR or more to begin with. Splitting PR in smaller ones is a good practice in general, there are ample proof that is increase the quality of the code review, reduce conflicts surface with other PR, makes reverting easier and more targeted when something happens, etc... Keeping this PR's commits is just a way to mitigate one of the negative consequences of kitchen sink PRs. It does so by impacting negatively others aspects of source control, and does nothing to mitigate other negatives aspects of kitchen sink PRs, such as review fatigue (see a specific example below).
 In general, I am not opposed to giving reviewers the option to 
 merge pull requests with squashing, assuming we can all agree 
 to not abuse it and only use it for PRs where there nothing 
 useful can be gained by preserving the multiple commits as they 
 are; however, their words and actions have shown that this 
 doesn't seem to be an attainable point of agreement.
If multiple commits are important for the PR, then the PR should have been several PR to begin with. Asking people to split s the way to go. Consider this PR: https://github.com/BitcoinUnlimited/BitcoinUnlimited/pull/164 You can see in the comments that I asked the original author to split it up because it was a kitchen sink and very hard to review in its current form. This was ignored. The PR ended up containing a bug that would cost about $12 500 to one of the users of the software, plus a fair amount of reputational damage. The change containing the bug did not need to be bundled with the rest of the PR, and would have almost certainly be noticed if it had been made on a PR of its own. Bundling several changes in the same PR has real world consequences that go beyond screwing up source control.
Mar 21 2017
next sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 21 March 2017 at 11:59:42 UTC, deadalnix wrote:
 It's not good either. Why would I want to look at a DAG when 
 the serie of event is strictly linear to begin with ?
Not sure what you mean here. The way it's presented is not a DAG.
 Yes, that's why rebasing makes thing clearer. Nobody care what 
 the master commit was when the work was started.
Sure, I'm not against rebasing. It's the squashing that's problematic.
 "Our source control is completely broken, but that's not a 
 problem because we developed 3rd party tools to work around the 
 brokenness"
That's fallacious.
 While I agree with you that things like bisecting are broken in 
 D, I don't see it as a reason to screw things up even more. I'm 
 not a big fan of "it's already broken, so we can break it even 
 more". This should, and can, be fixed.

 https://danluu.com/monorepo/

 Incidentally, I got a company contacting me last week willing 
 to pay me good money to help them transition toward these kind 
 of workflow.
I don't disagree with you, but this is a different discussion that's orthogonal to this one.
 Then it should have been 2 PR or more to begin with. Splitting 
 PR in smaller ones is a good practice in general,
You are changing the subject. I'll reply in another post with a different subject.
Mar 21 2017
parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 21 March 2017 at 12:45:45 UTC, Vladimir Panteleev 
wrote:
 On Tuesday, 21 March 2017 at 11:59:42 UTC, deadalnix wrote:
 It's not good either. Why would I want to look at a DAG when 
 the serie of event is strictly linear to begin with ?
Not sure what you mean here. The way it's presented is not a DAG.
Blue is red, up is down, and the commit graph is not a DAG.
 "Our source control is completely broken, but that's not a 
 problem because we developed 3rd party tools to work around 
 the brokenness"
That's fallacious.
If you can't bissect, it's broken. Listen, you know it's broken because you wrote tools to work around the brokenness. If it wasn't broken you wouldn't have written these tools as there would be no need to do so. So let's not play pretend.
Mar 21 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 21 March 2017 at 17:58:06 UTC, deadalnix wrote:
 On Tuesday, 21 March 2017 at 12:45:45 UTC, Vladimir Panteleev 
 wrote:
 On Tuesday, 21 March 2017 at 11:59:42 UTC, deadalnix wrote:
 It's not good either. Why would I want to look at a DAG when 
 the serie of event is strictly linear to begin with ?
Not sure what you mean here. The way it's presented is not a DAG.
Blue is red, up is down, and the commit graph is not a DAG.
Not sure what you mean. The commit graph is a DAG. The way you quoted my post made your remark seem to refer to my attempt to reformat it, which is not presented as a DAG.
 "Our source control is completely broken, but that's not a 
 problem because we developed 3rd party tools to work around 
 the brokenness"
That's fallacious.
If you can't bissect, it's broken.
By that definition of "broken", all git repositories which use branch merging are "broken". That includes some of the biggest open-source projects. Frankly, if you want to stick to that definition, I have nothing against it.
 Listen, you know it's broken because you wrote tools to work 
 around the brokenness. If it wasn't broken you wouldn't have 
 written these tools as there would be no need to do so. So 
 let's not play pretend.
Digger would probably have existed even if D were a monorepo and squashed PRs' commits from the start, because it also knows how to satisfy each prior version's build dependencies and how to invoke the build scripts. Regardless, D is perfectly suitable for automatic bisection, which is unreasonably awkward with git itself - Digger makes it much easier. I think there's no shame in writing domain-specific tools to enhance some functionality of standard ones.
Mar 21 2017
next sibling parent Sebastien Alaiwan <ace17 free.fr> writes:
It's common practice for "merge" commits to have the form:
"merge work from some/branch, fix PR #somenumber".

This basically tells me nothing about what this commit does.

We already know it's a merge commit, we don't care so much what 
branch it's from, and we don't want to dig into the bug tracker 
to translate the issue number into english.

We care more about how this merge modifies the code behaviour.

What if "merge" commits had better messages, not containing the 
word "merge" at all?
This way, the depth-0 history, which is always linear, would be 
human-readable and bisectable.
Mar 21 2017
prev sibling parent Daniel N <no public.email> writes:
On Wednesday, 22 March 2017 at 01:25:37 UTC, Vladimir Panteleev 
wrote:
 On Tuesday, 21 March 2017 at 17:58:06 UTC, deadalnix wrote:
 On Tuesday, 21 March 2017 at 12:45:45 UTC, Vladimir Panteleev 
 wrote:
 On Tuesday, 21 March 2017 at 11:59:42 UTC, deadalnix wrote:
 It's not good either. Why would I want to look at a DAG when 
 the serie of event is strictly linear to begin with ?
This is almost human readable... git log --first-parent --no-merges --decorate ... except if a merge commit is tagged, I haven't found any solution for that, can you? It's very important to be able to see tags, yet filter away merge commits. Fortunately I managed to convert my team to rebase, so I no longer suffer this problem at work, only with D. Even this simplest git commands break down: git show WTF? there was no difference? Ahh... I was supposed to type: git show --first-parent well at least this case can be solved by a simple alias, but log cannot.
Mar 22 2017
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 21 March 2017 at 11:59:42 UTC, deadalnix wrote:
 Then it should have been 2 PR or more to begin with. Splitting 
 PR in smaller ones is a good practice in general,
This is probably true for many cases, but I don't think it's a general truth. First, there are extreme cases like these: https://github.com/dlang/druntime/pull/1402 https://github.com/dlang/phobos/pull/260 I think we can agree that it would be better to have 1 pull request with 70 commits than 70 pull requests with 1 commit. Second, there are many cases that fall in the middle: some ancillary change (such as a minor refactoring, or a build file change) is needed by a bigger change, but it is too minor to be a PR of its own. Putting both in one commit is also unwarranted. I think that the philosophy to prefer squashing is not suited for projects such as D, where we care about history. Pushing for one change per PR also pushes people to put too many things in one commit, and write less descriptive commit messages. Finally, some of the biggest open source projects merge pull requests consisting of multiple commits, and encourage submitters to divide their changes into as many commits as is logical, which seems to be the workflow they consider optimal.
 there are ample proof that is increase the quality of the code 
 review,
OK, where is the proof? It is worth pointing out that GitHub's UI is heavily biased towards reviewing PRs in entirety, however it does allow reviewing PRs commit by commit, which is how I think non-trivial submissions and reviews should occur anyway.
 reduce conflicts surface with other PR, makes reverting easier 
 and more targeted when something happens, etc...
Sure, I'm not advocating that all submissions happen as one PR. The way I understand it, it is you who is advocating the extreme position that all PRs should always contain a single commit.
 Keeping this PR's commits is just a way to mitigate one of the 
 negative consequences of kitchen sink PRs. It does so by 
 impacting negatively others aspects of source control, and does 
 nothing to mitigate other negatives aspects of kitchen sink PRs,
Frankly I don't think this makes any sense at all.
 such as review fatigue (see a specific example below).
The other side of the coin is submitter fatigue. I've seen this happen: 1. Submitter submits a PR, containing two commits: a change, and a refactoring required for the change. 2. Reviewers ask the submitter to split it into two PRs. 3. Submitter resubmits the refactoring as a separate PR. 4. The refactoring PR sits in the review queue forever because at best, it does nothing, at worst it introduces a regression. Reviewers who did not see or bother to read the first PR ask what this refactoring is for and why it's needed. 5. Submitter is fed up and leaves.
 Consider this PR: 
 https://github.com/BitcoinUnlimited/BitcoinUnlimited/pull/164

 You can see in the comments that I asked the original author to 
 split it up because it was a kitchen sink and very hard to 
 review in its current form. This was ignored. The PR ended up 
 containing a bug that would cost about $12 500 to one of the 
 users of the software, plus a fair amount of reputational 
 damage. The change containing the bug did not need to be 
 bundled with the rest of the PR, and would have almost 
 certainly be noticed if it had been made on a PR of its own.

 Bundling several changes in the same PR has real world 
 consequences that go beyond screwing up source control.
I don't think that's a fair example at all. What exactly prevents reviewing a PR consisting of multiple commits differently from multiple PRs consisting of one commit? In both cases, you can: - look at each change individually - add review comments on the change, either on the change in entirety or on individual lines - selectively pick and merge a subset of the submitted changes (though, GitHub makes this more difficult for multi-commit PRs). I can only guess that by "difficult to review" you mean from only looking at the "diff" tab; however, I think it's disingenuous to say that if you are not using the tools properly. Anyway, to reiterate, this is a distinct argument from which merge strategy to use. However, generally, I think this approach is more bad than good because it pushes towards destroying information (commit messages and separation) and clumping too many minor changes into single commits.
Mar 21 2017
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 21 March 2017 at 12:49:22 UTC, Vladimir Panteleev 
wrote:
 there are ample proof that is increase the quality of the code 
 review,
OK, where is the proof?
Large companies such as Google or Facebook measure these things. You have presented 0 arguments so far, and dismissed both facts and argument that were presented to you (one of them as unfair, because fairness and correctness surely are correlated). But cool guys you are right, don't change anything. This is great. I have other things to do to convince you guys when other are paying me to do so.
Mar 21 2017
next sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 21 March 2017 at 18:07:57 UTC, deadalnix wrote:
 You have presented 0 arguments so far, and dismissed both facts 
 and argument that were presented to you (one of them as unfair, 
 because fairness and correctness surely are correlated).
This is factually wrong, as is obvious from reading the thread. If you are not interested in constructive discussion, then I'm sorry that both of our times have been wasted.
Mar 21 2017
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 21 March 2017 at 18:07:57 UTC, deadalnix wrote:
 Large companies such as Google or Facebook
A blind appeal to authority is fallacious, but it's still worthwhile to see what others are doing. I think it's important to look at projects that are similar to our own, so I looked at what other programming language implementations do. - Go is developed using Google's source code infrastructure, and code reviews happen using Gerrit. On Gerrit, every commit is reviewed separately (as I've been advocating). Furthermore, if you push multiple commits to Gerrit, this automatically creates one review page per commit, and marks them as inter-dependent in the commit order. This is an awesome approach, and I wish GitHub made this workflow more practical. Importantly, Gerrit does not squash commits - you are expected to squash fixup commits yourself. - Rust uses GitHub, and all merges seem to be done by a bot. We are heading in that direction too. The bot uses regular merges and does not squash commits or rebase them onto master. - Python: I looked at the CPython repository on GitHub. They seem to be using squashing exclusively, and only using branches for version maintenance. However, when I tried to find how they would deal with a contribution that would be desirable to be split into several PRs/commits, I couldn't find one on the first 5 pages of merged PRs. I guess the project is in the stage of mostly minor bugfixes only - we're certainly not there yet. Curiously, submitters are expected to resubmit the same PR themselves against every maintenance branch, e.g. here is the same PR submitted 4 times, to different branches: - https://github.com/python/cpython/pull/629 - https://github.com/python/cpython/pull/633 - https://github.com/python/cpython/pull/634 - https://github.com/python/cpython/pull/635 - Ruby uses Subversion, a GitHub mirror, and a bot which synchronizes between the two. I don't think there's anything we can learn from here. - OCaml uses GitHub PRs and regular git merges. - Clang and GHC use Phabricator. I'm not too familiar with it, but I understand it's not too different from Gerrit: it creates one review per commit, and you can push multiple commits at once which will do the right thing. To sum it up, I don't think we're doing anything too weird. Though it would be nice if GitHub's UI were to improve to better handle this workflow, I don't think it makes sense to force submitters to go through the busywork of creating one PR per commit for many cases.
Mar 22 2017
next sibling parent reply Daniel N <no public.email> writes:
On Wednesday, 22 March 2017 at 09:02:24 UTC, Vladimir Panteleev 
wrote:
 Importantly, Gerrit does not squash commits - you are expected 
 to squash fixup commits yourself.
You can configure Gerrit to do virtually anything, including squashing, even cherry-pick if you fancy.
Mar 22 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Wednesday, 22 March 2017 at 10:43:46 UTC, Daniel N wrote:
 On Wednesday, 22 March 2017 at 09:02:24 UTC, Vladimir Panteleev 
 wrote:
 Importantly, Gerrit does not squash commits - you are expected 
 to squash fixup commits yourself.
You can configure Gerrit to do virtually anything, including squashing, even cherry-pick if you fancy.
Ah, thanks. Could you link me to the relevant documentation? Looking at https://bugs.chromium.org/p/gerrit/issues/detail?id=1254, it seems Gerrit can't handle multiple commits in any scenarios right now. Either way, I guess it doesn't squash a series of inter-dependent commits/reviews into one :)
Mar 22 2017
parent reply Daniel N <no public.email> writes:
On Wednesday, 22 March 2017 at 10:49:37 UTC, Vladimir Panteleev 
wrote:
 On Wednesday, 22 March 2017 at 10:43:46 UTC, Daniel N wrote:
 On Wednesday, 22 March 2017 at 09:02:24 UTC, Vladimir 
 Panteleev wrote:
 Importantly, Gerrit does not squash commits - you are 
 expected to squash fixup commits yourself.
You can configure Gerrit to do virtually anything, including squashing, even cherry-pick if you fancy.
Ah, thanks. Could you link me to the relevant documentation? Looking at https://bugs.chromium.org/p/gerrit/issues/detail?id=1254, it seems Gerrit can't handle multiple commits in any scenarios right now. Either way, I guess it doesn't squash a series of inter-dependent commits/reviews into one :)
https://gerrit-review.googlesource.com/Documentation/project-configuration.html#submit_type It's also possible to extend the basic functionality with plugins.
Mar 22 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Wednesday, 22 March 2017 at 11:26:49 UTC, Daniel N wrote:
 On Wednesday, 22 March 2017 at 10:49:37 UTC, Vladimir Panteleev 
 wrote:
 On Wednesday, 22 March 2017 at 10:43:46 UTC, Daniel N wrote:
 You can configure Gerrit to do virtually anything, including 
 squashing, even cherry-pick if you fancy.
Ah, thanks. Could you link me to the relevant documentation?
https://gerrit-review.googlesource.com/Documentation/project-configuration.html#submit_type
Thanks. I don't see anything about squashing, though.
Mar 22 2017
parent Daniel N <no public.email> writes:
On Wednesday, 22 March 2017 at 11:35:11 UTC, Vladimir Panteleev 
wrote:
 On Wednesday, 22 March 2017 at 11:26:49 UTC, Daniel N wrote:
 On Wednesday, 22 March 2017 at 10:49:37 UTC, Vladimir 
 Panteleev wrote:
 On Wednesday, 22 March 2017 at 10:43:46 UTC, Daniel N wrote:
 You can configure Gerrit to do virtually anything, including 
 squashing, even cherry-pick if you fancy.
Ah, thanks. Could you link me to the relevant documentation?
https://gerrit-review.googlesource.com/Documentation/project-configuration.html#submit_type
Thanks. I don't see anything about squashing, though.
The documentation is not very obvious. IIRC it was this option... "Create a new change for every commit not in the target branch: false" ... but it also requires a specific submit type in combination with that option which submits what is in the change-review instead of what's in the branch. (this is useful because reviewers can simply opt to fix a spelling error directly in the browser rather than just commenting on it, totally avoiding the normal ping-pong).
Mar 22 2017
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 22 March 2017 at 09:02:24 UTC, Vladimir Panteleev 
wrote:
 On Tuesday, 21 March 2017 at 18:07:57 UTC, deadalnix wrote:
 Large companies such as Google or Facebook
A blind appeal to authority is fallacious, but it's still worthwhile to see what others are doing. I think it's important to look at projects that are similar to our own, so I looked at what other programming language implementations do.
The good new is, you provided much more authorities to confirm my claim, so is it so blind ?
 - Go is developed using Google's source code infrastructure, 
 and code reviews happen using Gerrit. On Gerrit, every commit 
 is reviewed separately (as I've been advocating). Furthermore, 
 if you push multiple commits to Gerrit, this automatically 
 creates one review page per commit, and marks them as 
 inter-dependent in the commit order. This is an awesome 
 approach, and I wish GitHub made this workflow more practical. 
 Importantly, Gerrit does not squash commits - you are expected 
 to squash fixup commits yourself.
So Go use squash.
 - Rust uses GitHub, and all merges seem to be done by a bot. We 
 are heading in that direction too. The bot uses regular merges 
 and does not squash commits or rebase them onto master.
So that's 1.
 - Python: I looked at the CPython repository on GitHub. They 
 seem to be using squashing exclusively, and only using branches 
 for version maintenance. However, when I tried to find how they 
 would deal with a contribution that would be desirable to be 
 split into several PRs/commits, I couldn't find one on the 
 first 5 pages of merged PRs. I guess the project is in the 
 stage of mostly minor bugfixes only - we're certainly not there 
 yet.

   Curiously, submitters are expected to resubmit the same PR 
 themselves against every maintenance branch, e.g. here is the 
 same PR submitted 4 times, to different branches:

   - https://github.com/python/cpython/pull/629
   - https://github.com/python/cpython/pull/633
   - https://github.com/python/cpython/pull/634
   - https://github.com/python/cpython/pull/635
So they use squash.
 - Ruby uses Subversion, a GitHub mirror, and a bot which 
 synchronizes between the two. I don't think there's anything we 
 can learn from here.
So they use squash (that's the only thing svn knows how to do).
 - OCaml uses GitHub PRs and regular git merges.
That's 2.
 - Clang and GHC use Phabricator. I'm not too familiar with it, 
 but I understand it's not too different from Gerrit: it creates 
 one review per commit, and you can push multiple commits at 
 once which will do the right thing.
Phabricator can be configured to do many things, pretty much like gerrit, but in the case of clang and LLVM, they use squash.
 To sum it up, I don't think we're doing anything too weird. 
 Though it would be nice if GitHub's UI were to improve to 
 better handle this workflow, I don't think it makes sense to 
 force submitters to go through the busywork of creating one PR 
 per commit for many cases.
4 out of your 6 examples use squash.
Mar 22 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Wednesday, 22 March 2017 at 15:59:29 UTC, deadalnix wrote:
 4 out of your 6 examples use squash.
No, and at this point I don't know if you're being willfully ignorant or plainly malicious. The Gerrit/Phabricator equivalent of squashing GitHub PRs would be to squash multiple inter-dependent reviewed changesets after they've all been reviewed. Suffice to say that this is not a thing that happens. There are two ways to attempt to use the Gerrit workflow on GitHub: 1. Use one PR per commit. This is pretty far from Gerrit, because there is a lot of overhead in creating and maintaining the PRs, and no way to indicate inter-dependent PRs in the system itself. Although possible in theory, it is too inconvenient to be practical (see earlier arguments in this thread). Gerrit has a lot of tooling and workflow conventions that are geared towards making this workflow practical, things which are completely absent in the GitHub world. 2. Use one PR per patch series, and review commit by commit. GitHub does allow reviewing a PR commit by commit, so even considering that it's more difficult to merge just some parts of a PR, and the results from the various merge strategies, I believe this to overall be the better solution. Without a doubt, if GitHub provided better tooling to submit a patch series in which each commit gets its own first-class review page, and allow easily updating each part of the patch series without causing severe maintenance overhead for the rest, it would be the way to go. As things are, consider, how would you submit a change set consisting of 5 commits, where each one depends on the previous one? You would need to either waste a lot of time updating dependent PRs as you update earlier commits, waste time waiting until each commit is reviewed before submitting the rest, or eschew git best practices and lump things into as few commits as you can get away with. Whereas, when all the commits are on a single branch, updating changes in an early commit is as easy as an interactive rebase and force push.
Mar 22 2017
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I'm a bit confused. This got settled a while ago, in part to avoid silly 
debates over the inconsequential. Our organization prefers squash before 
commit in the majority of cases. For a minority of pull requests (that 
touch many files, are semi-mechanical etc) multiple commits in one PR 
are fine within reason. These would be about one order of magnitude less 
frequent. -- Andrei
Mar 22 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Wednesday, 22 March 2017 at 17:16:09 UTC, Andrei Alexandrescu 
wrote:
 I'm a bit confused. This got settled a while ago, in part to 
 avoid silly debates over the inconsequential. Our organization 
 prefers squash before commit in the majority of cases. For a 
 minority of pull requests (that touch many files, are 
 semi-mechanical etc) multiple commits in one PR are fine within 
 reason. These would be about one order of magnitude less 
 frequent. -- Andrei
Well, I don't think we shouldn't keep researching for ways to improve wolkflow. I certainly don't think it's inconsequential, and anyone who has time and thinks they can bring fresh arguments to the table is welcome to do so. There are some very solid arguments in favor of moving to an exclusively one-commit-per-PR model, with no exceptions (with more involved contributions occurring in feature branches), the main obstacle for which is that the tooling isn't there. I also think we can do better for the current model - the diff tab is often misused when reviewing per-commit is more appropriate.
Mar 22 2017
next sibling parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, March 23, 2017 02:57:04 Vladimir Panteleev via Digitalmars-d 
wrote:
 On Wednesday, 22 March 2017 at 17:16:09 UTC, Andrei Alexandrescu

 wrote:
 I'm a bit confused. This got settled a while ago, in part to
 avoid silly debates over the inconsequential. Our organization
 prefers squash before commit in the majority of cases. For a
 minority of pull requests (that touch many files, are
 semi-mechanical etc) multiple commits in one PR are fine within
 reason. These would be about one order of magnitude less
 frequent. -- Andrei
Well, I don't think we shouldn't keep researching for ways to improve wolkflow. I certainly don't think it's inconsequential, and anyone who has time and thinks they can bring fresh arguments to the table is welcome to do so. There are some very solid arguments in favor of moving to an exclusively one-commit-per-PR model, with no exceptions (with more involved contributions occurring in feature branches), the main obstacle for which is that the tooling isn't there. I also think we can do better for the current model - the diff tab is often misused when reviewing per-commit is more appropriate.
Honestly, I think that having only one commit per PR would encourage overly large commits. Being able to have a series of small commits merged together is a strength of git, whereas something like svn usually results in patches that are single, larger commits. I also don't like the idea that commits get squashed when merged. In theory, they were separate for a reason, and in the cases that squashing them all makes sense, the commits were probably too small to begin with. But there is a bit of an art to creating commits that are small enough to be sensible while not having too many of them, and I've definitely seen PRs for Phobos that had way too many small commits, because the person who created the PR didn't bother to squash stuff where it made sense. - Jonathan M Davis
Mar 22 2017
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/23/17 4:57 AM, Vladimir Panteleev wrote:
 On Wednesday, 22 March 2017 at 17:16:09 UTC, Andrei Alexandrescu wrote:
 I'm a bit confused. This got settled a while ago, in part to avoid
 silly debates over the inconsequential. Our organization prefers
 squash before commit in the majority of cases. For a minority of pull
 requests (that touch many files, are semi-mechanical etc) multiple
 commits in one PR are fine within reason. These would be about one
 order of magnitude less frequent. -- Andrei
Well, I don't think we shouldn't keep researching for ways to improve wolkflow. I certainly don't think it's inconsequential, and anyone who has time and thinks they can bring fresh arguments to the table is welcome to do so.
Of course, new research is always welcome! The more, the better. Bring it over! There's a spectrum at work; at one extreme there's be close-mindedness that keeps a rigid status quo and refuses to accept new evidence. At the other end of the spectrum there's frequent reopening of the same debate with the same arguments, then repeatedly agreeing to close it just to repeat the cycle at the next opportunity. Walter and I think the better course of action for the community is to favor small pull requests that are squashed upon committing. There are reasons and a body of evidence that has been hashed over several times. Clearly there are extreme cases that don't do well with this flow, which confirm our understanding that no rule is a replacement of good judgment. Such rare exceptions are fine with us. But we can't afford this incessant challenge of the slightest authority and this reopening of old decisions with no new arguments. Far as I understand (and please do correct me if I'm wrong) what's being discussed now does not qualify as new research and is a reopening of a previous discussion with no new evidence, in which one side of the dialog accuses the other of appeal to authority, while simultaneously invoking appeal to its own authority. I have spent a long time this day thinking how to reply to this so as to close the argument once and for all, after I had already spent more time than is reasonable thinking and discussing this in public and in private. There really is no time for this kind of stuff if we want to scale. We should discuss how to make exceptions nogc and reference counted strings and a bunch of other important and urgent things. What steps can we take on this particular matter so we save everybody involved time and cognitive load? Thanks, Andrei
Mar 23 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Thursday, 23 March 2017 at 22:35:13 UTC, Andrei Alexandrescu 
wrote:
 Far as I understand (and please do correct me if I'm wrong) 
 what's being discussed now does not qualify as new research and 
 is a reopening of a previous discussion with no new evidence,
Actually I think there were some interesting new arguments presented in this thread, as well as useful ancillary information on Gerrit et al and other projects' practices.
 There really is no time for this kind of stuff if we want to 
 scale. We should discuss how to make exceptions  nogc and 
 reference counted strings and a bunch of other important and 
 urgent things.
I think that if you do not think that discussing this subject any further is worth your time, then you shouldn't allocate any of your time time towards it. As previously mentioned, I don't think the arguments presented here warrant changing the status quo, so it is all theorizing.
Mar 23 2017
parent reply Seb <seb wilzba.ch> writes:
On Friday, 24 March 2017 at 05:10:54 UTC, Vladimir Panteleev 
wrote:
 I think that if you do not think that discussing this subject 
 any further is worth your time, then you shouldn't allocate any 
 of your time time towards it. As previously mentioned, I don't 
 think the arguments presented here warrant changing the status 
 quo, so it is all theorizing.
FWIW (as mentioned before) the status quo is different to what's intended as auto-merge-squash has been disabled by Martin: https://github.com/dlang-bots/dlang-bot/issues/64
Mar 23 2017
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Friday, 24 March 2017 at 05:56:57 UTC, Seb wrote:
 On Friday, 24 March 2017 at 05:10:54 UTC, Vladimir Panteleev 
 wrote:
 I think that if you do not think that discussing this subject 
 any further is worth your time, then you shouldn't allocate 
 any of your time time towards it. As previously mentioned, I 
 don't think the arguments presented here warrant changing the 
 status quo, so it is all theorizing.
FWIW (as mentioned before) the status quo is different to what's intended as auto-merge-squash has been disabled by Martin: https://github.com/dlang-bots/dlang-bot/issues/64
Yep, because of the misuse-worst-case arguments. Simple solutions that guard against such mistakes are welcome. E.g. we could allow squashing if all commits' commit messages except the first one's start with "[SQUASH] " or "fixup! ".
Mar 24 2017
parent deadalnix <deadalnix gmail.com> writes:
On Friday, 24 March 2017 at 09:27:54 UTC, Vladimir Panteleev 
wrote:
 Yep, because of the misuse-worst-case arguments. Simple 
 solutions that guard against such mistakes are welcome. E.g. we 
 could allow squashing if all commits' commit messages except 
 the first one's start with "[SQUASH] " or "fixup! ".
Because it is meant to be the default, doing only when some specific message exist is not going to fly. Using !donotsquash or alike in the commit message is, however, a good way to proceed.
Mar 24 2017
prev sibling parent Martin Nowak <code dawg.eu> writes:
On Tuesday, 21 March 2017 at 12:49:22 UTC, Vladimir Panteleev 
wrote:
 On Tuesday, 21 March 2017 at 11:59:42 UTC, deadalnix wrote:
 Then it should have been 2 PR or more to begin with. Splitting 
 PR in smaller ones is a good practice in general,
This is probably true for many cases, but I don't think it's a general truth.
Let's please not conflate a small technical discussion about how to preserve information in git (e.g. squash vs. merge vs. rebase) with a workflow debate about proper PR size.
Mar 24 2017
prev sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Monday, 20 March 2017 at 05:10:04 UTC, Martin Nowak wrote:
 On Wednesday, 15 March 2017 at 13:14:31 UTC, deadalnix wrote:
 This is making the history very spaghettified. Is that 
 possible to have the bot rebase/squash commits and then 
 pushing ?
I don't really agree with the argument. A merge commit is a clear way to integrate changes from a PR/branch. Just rebasing a PR on top of master removes a lot of information from git, only leaving references to github. Can you be more specific, what you mean w/ spaghetti? The fact that reciew fixes are added to PRs. Also github's commit view misleadingly shows commits from merged PRs/branches, which aren't actually in master.
There's a compromise, which I'm using right now. Always rebase and always merge. You can see that it branched off for a purpose (the argument for merging) and the history is much cleaner (the argument for rebasing). i.e.: git rebase master my_branch git checkout master git merge --no-ff my_branch gitlab supports doing this via the web interface, I don't know about github. Atila
Mar 21 2017
parent reply Martin Nowak <code dawg.eu> writes:
On Tuesday, 21 March 2017 at 20:16:00 UTC, Atila Neves wrote:
 git rebase master my_branch
 git checkout master
 git merge --no-ff my_branch
Yes, that's about what we aim for, rebase w/ --autosquash though, so that people can `git commit --fixup` new fixup commits to open PRs w/o leaving noise behind. https://github.com/dlang-bots/dlang-bot/issues/64 Requires a local checkout of the repo which the bot doesn't have atm.
Mar 24 2017
parent reply Meta <jared771 gmail.com> writes:
On Friday, 24 March 2017 at 16:34:46 UTC, Martin Nowak wrote:
 On Tuesday, 21 March 2017 at 20:16:00 UTC, Atila Neves wrote:
 git rebase master my_branch
 git checkout master
 git merge --no-ff my_branch
Yes, that's about what we aim for, rebase w/ --autosquash though, so that people can `git commit --fixup` new fixup commits to open PRs w/o leaving noise behind. https://github.com/dlang-bots/dlang-bot/issues/64 Requires a local checkout of the repo which the bot doesn't have atm.
Did we come to any consensus on this? I ran into a dilemma with https://github.com/dlang/phobos/pull/5577 where I added a couple fixup commits, and now I don't want to merge until somebody rebases it because the history will be polluted with those extra commits. Also, looking at the PRs linked in this thread, I see that they're still open so AFAICT there is no clear solution.
Nov 18 2017
parent timotheecour <timothee.cour2 gmail.com> writes:
On Sunday, 19 November 2017 at 04:44:24 UTC, Meta wrote:
 On Friday, 24 March 2017 at 16:34:46 UTC, Martin Nowak wrote:
 On Tuesday, 21 March 2017 at 20:16:00 UTC, Atila Neves wrote:
 git rebase master my_branch
 git checkout master
 git merge --no-ff my_branch
Yes, that's about what we aim for, rebase w/ --autosquash though, so that people can `git commit --fixup` new fixup commits to open PRs w/o leaving noise behind. https://github.com/dlang-bots/dlang-bot/issues/64 Requires a local checkout of the repo which the bot doesn't have atm.
Did we come to any consensus on this? I ran into a dilemma with https://github.com/dlang/phobos/pull/5577 where I added a couple fixup commits, and now I don't want to merge until somebody rebases it because the history will be polluted with those extra commits. Also, looking at the PRs linked in this thread, I see that they're still open so AFAICT there is no clear solution.
This is an important issue; rebase workflows are standard practice; http://kensheedlo.com/essays/why-you-should-use-a-rebase-workflow/ https://help.github.com/articles/configuring-commit-rebasing-for-pull-requests/
Feb 02 2018