digitalmars.D - Swift does away with pointers == pervasive ARC

Manu via Digitalmars-d (5/5) Jun 16 2014 What say you to that, Walter?

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (8/10) Jun 16 2014 All performance tests so far says Swift is slower than
Dicebot (3/9) Jun 16 2014 Good luck writing games in Swift.

Namespace (2/12) Jun 16 2014 https://github.com/fullstackio/FlappySwift

Benjamin Thaut (2/13) Jun 16 2014 That's not something I would consider a full blown game.

Dicebot (2/19) Jun 16 2014 And definitely not kind of games Manu usually speaks about.

Namespace (2/22) Jun 16 2014 You said "game" and it's a game. ;) But I know what you mean.

Wyatt (7/12) Jun 16 2014 Here's some kindling; a quick and dirty peek at a few performance
Walter Bright (7/12) Jun 16 2014 I know very little about Swift.

Sean Kelly (7/13) Jun 16 2014 Swift targets the same VM as ObjC so I think ARC was a foregone
Manu via Digitalmars-d (6/21) Jun 16 2014 Hmmm, I still don't buy it, but I'm thinking more that it might

Walter Bright (7/12) Jun 16 2014 I know, but you also have not responded to my explanation of why. Such a...

Manu via Digitalmars-d (26/39) Jun 16 2014 Granted. I don't really understand the situation well enough to

Walter Bright (15/25) Jun 17 2014 inc

Manu via Digitalmars-d (25/32) Jun 17 2014 ARC is useless without compiler support. If we can't experiment with

Walter Bright (19/48) Jun 17 2014 It is not possible to understand the tradeoffs with ARC without understa...

Sean Kelly (5/7) Jun 17 2014 One thing D has going for it compared to C++ for reference

Walter Bright (2/6) Jun 17 2014 Yup.

H. S. Teoh via Digitalmars-d (12/21) Jun 17 2014 [...]

Walter Bright (5/11) Jun 17 2014 Such a claim still cannot be made without understanding the costs incurr...

Kapps (13/35) Jun 17 2014 Is there a way to move the costs of exception handling to occur

Walter Bright (5/9) Jun 17 2014 That's already been done to a large extent. But you can see what's left ...

H. S. Teoh via Digitalmars-d (14/43) Jun 17 2014 [...]

Steven Schveighoffer (25/40) Jun 17 2014 This is the issue, because the release can do a LOT of shit. It could

Nick Sabalausky (8/12) Jun 16 2014 Pardon my ignorance here, but is this a cost that would *always* be

Walter Bright (4/5) Jun 17 2014 Yes.

Nick Sabalausky (12/18) Jun 17 2014 It's not that I don't believe you, I guess I must have just missed a lot...

Walter Bright (4/14) Jun 17 2014 Oh, I totally understand where it's coming from. I'm trying to point out...

Nick Sabalausky (8/23) Jun 17 2014 Oh right, I definitely don't mean to imply it's any sort of "no cost"
eles (14/19) Jun 17 2014 It is not (only) about cost, it is about determinism. Exceptions

Walter Bright (3/5) Jun 17 2014 Yes, I do understand that. What I am trying to get across is that ARC ca...

Nick Sabalausky (15/21) Jun 17 2014 That isn't necessarily a deal-breaker for certain applications. For

Walter Bright (6/8) Jun 17 2014 Yes, they are, and the problems/costs of ARC are dismissed as either

Jonathan M Davis via Digitalmars-d (5/8) Jun 17 2014 You probably explained it elsewhere and I just missed it, but what's a D...

Tofu Ninja (3/5) Jun 17 2014 I think he is probably talking about decrementing the ref count.

Walter Bright (2/5) Jun 17 2014 Yes

Jonathan M Davis via Digitalmars-d (4/12) Jun 17 2014 Okay, thanks.

whassup (4/4) Jun 18 2014 Doesn't Nimrod do deferred referencing counting(with backup

deadalnix (4/8) Jun 18 2014 Ho, that make me think, doing RC but not when unwinding exception

Jonathan M Davis via Digitalmars-d (8/9) Jun 17 2014 They could definitely appear in hot code if the exception is for a case ...
Jacob Carlborg (4/6) Jun 17 2014 It does not support exceptions.

Walter Bright (2/6) Jun 17 2014 I wouldn't be surprised if it doesn't in order to make ARC more palatabl...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/9) Jun 17 2014 I don't know, but on iOS you are supposed to save state to disk

Walter Bright (6/12) Jun 17 2014 If an app can be cheaply restarted, an easy & fast way to do memory allo...

H. S. Teoh via Digitalmars-d (17/34) Jun 17 2014 I don't think the user would enjoy the app "randomly" shutting down and

Nick Sabalausky (6/19) Jun 17 2014 Sounds cool, but I would think in that case you may as well just stick

Walter Bright (2/4) Jun 17 2014 You can hijack the function that does it with your own, and do it how yo...

Adam D. Ruppe (3/5) Jun 17 2014 Or do some kind of functor thingy and pass the closed over

deadalnix (4/6) Jun 17 2014 Yes and no. You can create a delegate manually by creating a
H. S. Teoh via Digitalmars-d (15/38) Jun 17 2014 [...]

Nick Sabalausky (11/46) Jun 17 2014 Actually, I was just referring to traditional usages of region

H. S. Teoh via Digitalmars-d (19/48) Jun 17 2014 But this may wind up being worse than a traditional compacting GC,

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (12/17) Jun 17 2014 I think it was common to use a region allocator in ray tracing,

H. S. Teoh via Digitalmars-d (13/32) Jun 17 2014 [...]

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (15/23) Jun 17 2014 One key difference is that iOS is a controlled framework and

deadalnix (4/10) Jun 16 2014 http://stackoverflow.com/questions/24101718/swift-performance-sorting-ar...

Manu via Digitalmars-d (5/15) Jun 16 2014 -Ofast seems to perform the same as C++. -Ofast allegedly does

Ary Borenszweig (3/21) Jun 16 2014 But other languages are very fast without loosing the bounds check...

Manu via Digitalmars-d (5/33) Jun 16 2014 The disassembly showed that without -Ofast, lots of redundant ref
Walter Bright (4/26) Jun 16 2014 It's also not a ref-counting benchmark.
Nick Sabalausky (11/34) Jun 16 2014 Well, I think interesting part we're trying to look at here is the ARC's...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (37/43) Jun 16 2014 ARC without deep whole program analysis is bound to be slow. It

Nick Sabalausky (13/17) Jun 17 2014 Right, but what I'm mainly curious about is "How much slower?" Depending...

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (22/24) Jun 17 2014 Sure. Empirical data is needed on many levels:

Araq (2/5) Jun 17 2014 What's the point? Nimrod already exists and answers most of your

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (12/14) Jun 17 2014 If you know the answers then I am an eager listener. Go ahead! :-)

Manu via Digitalmars-d (30/51) Jun 17 2014 Andrei posted a document some time back comparing an advanced RC

ed (6/90) Jun 17 2014 Check out the compiler and start the experiment you keep talking
Walter Bright (4/5) Jun 17 2014 Instead of taking positions, you can do some research. Understand how sh...

Joseph Rushton Wakeling via Digitalmars-d (8/23) Jun 17 2014 I'm broadly sympathetic with your position here, but I think to be hones...

Walter Bright (3/6) Jun 17 2014 There's another way. Simply write the code that a hypothetical ARC would...

H. S. Teoh via Digitalmars-d (7/12) Jun 17 2014 [...]

Nick Sabalausky (12/21) Jun 17 2014 I think that's exactly the concern he's voicing.

Steven Schveighoffer (6/8) Jun 17 2014 This is not an issue that can be solved by removing separate compilation...

deadalnix (5/18) Jun 17 2014 It is certainly doable in release build, granted one have a

H. S. Teoh via Digitalmars-d (10/29) Jun 17 2014 Well, first we need an option to turn on the GC in dmd... ;-) Otherwise

w0rp (12/39) Jun 17 2014 The thing which slowed his loop down was the compiler not

Manu via Digitalmars-d (8/28) Jun 17 2014 Huh? I'm not sure what you mean?

Jacob Carlborg (5/10) Jun 17 2014 I think Swift is only intended for high level application development.

Paulo Pinto (7/18) Jun 17 2014 Why not? It enjoys feature parity with Objective-C and then some.

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

What say you to that, Walter?

Apple have committed to pervasive ARC, which you consistently argue is
not feasible...
Have I missed something, or is this a demonstration that it is
actually practical?

Jun 16 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d 
wrote:
 Have I missed something, or is this a demonstration that it is
 actually practical?

All performance tests so far says Swift is slower than 
Objective-C, which is slow to begin with, but it is still in Beta.

I don't think you are supposed to do signal processing in Swift, 
most apps can be done with higher level/script-like programming 
and leave the performance sensitive part to the rich iOS 
frameworks.

Jun 16 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d 
wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently 
 argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

Good luck writing games in Swift.

Jun 16 2014

"Namespace" <rswhite4 googlemail.com> writes:

On Monday, 16 June 2014 at 16:19:55 UTC, Dicebot wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d 
 wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently 
 argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

 Good luck writing games in Swift.

https://github.com/fullstackio/FlappySwift

Jun 16 2014

Benjamin Thaut <code benjamin-thaut.de> writes:

Am 16.06.2014 18:23, schrieb Namespace:
 On Monday, 16 June 2014 at 16:19:55 UTC, Dicebot wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

 Good luck writing games in Swift.

 https://github.com/fullstackio/FlappySwift

That's not something I would consider a full blown game.

Jun 16 2014

"Dicebot" <public dicebot.lv> writes:

On Monday, 16 June 2014 at 17:12:27 UTC, Benjamin Thaut wrote:
 Am 16.06.2014 18:23, schrieb Namespace:
 On Monday, 16 June 2014 at 16:19:55 UTC, Dicebot wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via 
 Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you 
 consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it 
 is
 actually practical?

 Good luck writing games in Swift.

 https://github.com/fullstackio/FlappySwift

 That's not something I would consider a full blown game.

And definitely not kind of games Manu usually speaks about.

Jun 16 2014

"Namespace" <rswhite4 googlemail.com> writes:

On Monday, 16 June 2014 at 17:21:20 UTC, Dicebot wrote:
 On Monday, 16 June 2014 at 17:12:27 UTC, Benjamin Thaut wrote:
 Am 16.06.2014 18:23, schrieb Namespace:
 On Monday, 16 June 2014 at 16:19:55 UTC, Dicebot wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via 
 Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you 
 consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it 
 is
 actually practical?

 Good luck writing games in Swift.

 https://github.com/fullstackio/FlappySwift

 That's not something I would consider a full blown game.

 And definitely not kind of games Manu usually speaks about.

You said "game" and it's a game. ;) But I know what you mean.

Jun 16 2014

"Wyatt" <wyatt.epp gmail.com> writes:

On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d 
wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently 
 argue is not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

Here's some kindling; a quick and dirty peek at a few performance 
indicators:
http://www.splasmata.com/?p=2798
(Code:  http://splasm.com/keithg/Swift%20Tests.zip )

-Wyatt

Jun 16 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/16/2014 8:16 AM, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

I know very little about Swift.

But I did not say it was "not feasible". I said pervasive ARC in D simply would 
not deliver the results you wanted. I doubt it delivers them in Swift, either, 
though of course I don't have experience with Swift.

I.e. pervasive ARC in D would not deliver performance unless memory safety was 
discarded.

Jun 16 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Monday, 16 June 2014 at 18:12:59 UTC, Walter Bright wrote:
 But I did not say it was "not feasible". I said pervasive ARC 
 in D simply would not deliver the results you wanted. I doubt 
 it delivers them in Swift, either, though of course I don't 
 have experience with Swift.

 I.e. pervasive ARC in D would not deliver performance unless 
 memory safety was discarded.

Swift targets the same VM as ObjC so I think ARC was a foregone
conclusion.  And for that case it makes sense, as predictable
performance is crucial on mobile platforms.  Since D has raw
pointers and inline assembly I don't see ARC as being terribly
practical here however.  It's kind of the same issue as having
write barriers to support an incremental GC.

Jun 16 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 17 June 2014 04:13, Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 6/16/2014 8:16 AM, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

 I know very little about Swift.

 But I did not say it was "not feasible". I said pervasive ARC in D simply
 would not deliver the results you wanted. I doubt it delivers them in Swift,
 either, though of course I don't have experience with Swift.

 I.e. pervasive ARC in D would not deliver performance unless memory safety
 was discarded.

Hmmm, I still don't buy it, but I'm thinking more that it might
benefit from something like a Rust borrowed pointer to maintain safety
and performance. You like the 'unsafe escape' defense, so we need to
address that, and I think Rust gave us the answer.

Jun 16 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/16/2014 5:48 PM, Manu via Digitalmars-d wrote:
 Hmmm, I still don't buy it,

I know, but you also have not responded to my explanation of why. Such as the 
"dec" being required to be inside an expensive exception handler.

Note that Swift seems to not do exceptions (I may be wrong, again, I know
little 
about Swift), which is one way to avoid that problem.


 but I'm thinking more that it might
 benefit from something like a Rust borrowed pointer to maintain safety
 and performance. You like the 'unsafe escape' defense, so we need to
 address that, and I think Rust gave us the answer.

Have you or anyone you know written any non-trivial programs in Rust and tried 
this out?

Jun 16 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 17 June 2014 13:18, Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 6/16/2014 5:48 PM, Manu via Digitalmars-d wrote:
 Hmmm, I still don't buy it,


 I know, but you also have not responded to my explanation of why. Such as
 the "dec" being required to be inside an expensive exception handler.

Granted. I don't really understand the situation well enough to
comment with any authority. What are the conditions that create the
requirement, or could relax it?

The problem is if something throws, it need an implicit catch to
release the ref right?
nothrow obviously relaxes this requirement. Also in cases where the
ref fiddling was able to be eliminated, which appear to be fairly
numerous.
I don't know enough about other circumstances, but I can see a few
possibilities. Also, just the frequency of pointer copying I see in my
own code is very low, and I would NEVER generate that code in hot
loops.
I find it very hard to convince myself either way without evidence :/

Exceptions are one of my biggest weaknesses. I've never used the
exception handler before, in C++ or D (on XBox360 for instance, the
exception handler is actually broken, generates bad code, and
Microsoft recommend to disable C++ exceptions). I've used scope(exit),
but never in hot code or criticised the codegen.

I can't imagine exceptions would appear in hot code very often/ever?


 Note that Swift seems to not do exceptions (I may be wrong, again, I know
 little about Swift), which is one way to avoid that problem.

I wonder if that was a deliberate choice based on a reaction to this problem?


 but I'm thinking more that it might
 benefit from something like a Rust borrowed pointer to maintain safety
 and performance. You like the 'unsafe escape' defense, so we need to
 address that, and I think Rust gave us the answer.


 Have you or anyone you know written any non-trivial programs in Rust and
 tried this out?

No, I intend to have a better play with Rust when I get some time. I
need to break through the syntax barrier first though! ;)
But this is the key innovation in Rust I'm interested to know more
about in practise.

Jun 16 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/16/2014 10:02 PM, Manu via Digitalmars-d wrote:
 Granted. I don't really understand the situation well enough to
 comment with any authority. What are the conditions that create the
 requirement, or could relax it?

inc
try {
    ... code that may throw an exception ...
}
finally {
    dec;
}


 nothrow obviously relaxes this requirement.

Yes, it does.


 I don't know enough about other circumstances, but I can see a few
 possibilities. Also, just the frequency of pointer copying I see in my
 own code is very low, and I would NEVER generate that code in hot
 loops.
 I find it very hard to convince myself either way without evidence :/

I suggest writing some C++ code with shared_ptr<T>, and disassemble the result.


 I can't imagine exceptions would appear in hot code very often/ever?

I've tried to explain this to you for months. You don't believe my
explanations, 
we just go round in circles. I strongly suggest you write some code with 
shared_ptr<T> and try it out. Disassemble the result. Benchmark it. Use 
Microsoft C++, so I won't be sabotaging your results and it won't be because I 
write crappy compilers :-)

Jun 17 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 17 June 2014 18:18, Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 6/16/2014 10:02 PM, Manu via Digitalmars-d wrote:

 I can't imagine exceptions would appear in hot code very often/ever?


 I've tried to explain this to you for months. You don't believe my
 explanations, we just go round in circles. I strongly suggest you write some
 code with shared_ptr<T> and try it out. Disassemble the result. Benchmark
 it. Use Microsoft C++, so I won't be sabotaging your results and it won't be
 because I write crappy compilers :-)

ARC is useless without compiler support. If we can't experiment with
the compiler's ability to eliminate redundant RC related work, then we
aren't 'experimenting' with anything of interest.
We agree ARC isn't acceptable without compiler support. That's never
been on debate.
I have no way of testing whether the compiler is able to produce
acceptable results in C or D today. shared_ptr will demonstrate what
we already know is not acceptable, not whether the results of compiler
support for RC optimisation is satisfactory.

I believe your explanations, but that's not where I'm hung up. In most
cases I can visualise, there is significant opportunity for the
compiler to eliminate redundant work, and in the remaining cases, I
can imagine numerous very simple approaches to remove the bumps from
hot code without interfering very much at all.
Andrei's detailed document from months ago demonstrating a good RC
implementation was encouraging, although he somehow presented it as
evidence against RC, which I never understood.

It's all about the compilers ability to eliminate the redundant work,
and it's possible that in some instances, that might result in
slightly different usage or access patterns, which is a relevant part
of the experiment. I don't see how I can do any meaningful experiment
with shared_ptr, since the bumps will never go away no matter how
they're arranged.

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 4:36 AM, Manu via Digitalmars-d wrote:
 On 17 June 2014 18:18, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 6/16/2014 10:02 PM, Manu via Digitalmars-d wrote:

 I can't imagine exceptions would appear in hot code very often/ever?


 I've tried to explain this to you for months. You don't believe my
 explanations, we just go round in circles. I strongly suggest you write some
 code with shared_ptr<T> and try it out. Disassemble the result. Benchmark
 it. Use Microsoft C++, so I won't be sabotaging your results and it won't be
 because I write crappy compilers :-)

 ARC is useless without compiler support.

It is not possible to understand the tradeoffs with ARC without understanding 
the cost of the DEC.


 If we can't experiment with
 the compiler's ability to eliminate redundant RC related work, then we
 aren't 'experimenting' with anything of interest.

That is assuming that a sufficiently smart compiler can eliminate all costs 
associated with ARC, so there is no need to understand those costs. This really 
is a faulty assumption. No compiler has achieved this, or come even close.


 I have no way of testing whether the compiler is able to produce
 acceptable results in C or D today. shared_ptr will demonstrate what
 we already know is not acceptable, not whether the results of compiler
 support for RC optimisation is satisfactory.

What shared_ptr will demonstrate to you is the cost of a dec, which you do not 
believe to be significant, and how that is affected by exception handling,
which 
you stated that you do not understand.

I don't think it is possible to make appropriate tradeoffs without
understanding 
these costs, and when/why they are incurred. Nor does it make it possible to 
understand the tradeoffs in relation to the design decisions made by Rust/Swift.

To advocate ARC for D means understanding this stuff. It'll only take a few 
minutes of your time.


 Andrei's detailed document from months ago demonstrating a good RC
 implementation was encouraging, although he somehow presented it as
 evidence against RC, which I never understood.

Please, take some time to try out RC and disassemble the code.


 It's all about the compilers ability to eliminate the redundant work,
 and it's possible that in some instances, that might result in
 slightly different usage or access patterns, which is a relevant part
 of the experiment. I don't see how I can do any meaningful experiment
 with shared_ptr, since the bumps will never go away no matter how
 they're arranged.

If compilers were smart enough to remove the ref counts, you'd be right. But 
they aren't - not even close. Hence there are big time penalties for those 
bumps, and to understand the tradeoffs it is necessary to understand the height 
of them.

Jun 17 2014

"Sean Kelly" <sean invisibleduck.org> writes:

On Tuesday, 17 June 2014 at 18:15:24 UTC, Walter Bright wrote:
 It is not possible to understand the tradeoffs with ARC without 
 understanding the cost of the DEC.

One thing D has going for it compared to C++ for reference
counting is shared vs. unshared types.  At least ARC in D
wouldn't have to default to performing synchronized reference
counting like shared_ptr does in C++.

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 11:20 AM, Sean Kelly wrote:
 One thing D has going for it compared to C++ for reference
 counting is shared vs. unshared types.  At least ARC in D
 wouldn't have to default to performing synchronized reference
 counting like shared_ptr does in C++.

Yup.

Jun 17 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 17, 2014 at 11:15:29AM -0700, Walter Bright via Digitalmars-d wrote:
 On 6/17/2014 4:36 AM, Manu via Digitalmars-d wrote:

[...]
If we can't experiment with the compiler's ability to eliminate
redundant RC related work, then we aren't 'experimenting' with
anything of interest.

 
 That is assuming that a sufficiently smart compiler can eliminate all
 costs associated with ARC, so there is no need to understand those
 costs. This really is a faulty assumption. No compiler has achieved
 this, or come even close.

[...]

I don't think he's claiming that the compiler can eliminate *all* costs
associated with ARC, just that it can do enough to be reasonably
performant. But as I said, until somebody actually implements such a
thing in a (github fork of a) D compiler, we have no real evidence as to
just how much the compiler can/cannot do.


T

-- 
I don't trust computers, I've spent too long programming to think that
they can get anything right. -- James Miller

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 11:31 AM, H. S. Teoh via Digitalmars-d wrote:
 I don't think he's claiming that the compiler can eliminate *all* costs
 associated with ARC, just that it can do enough to be reasonably
 performant.

Such a claim still cannot be made without understanding the costs incurred when 
it cannot be eliminated.


 But as I said, until somebody actually implements such a
 thing in a (github fork of a) D compiler, we have no real evidence as to
 just how much the compiler can/cannot do.

You can do data flow analysis in your head and mock up the results and test
them 
without too much difficulty.

Jun 17 2014

"Kapps" <opantm2+spam gmail.com> writes:

On Tuesday, 17 June 2014 at 18:15:24 UTC, Walter Bright wrote:
 On 6/17/2014 4:36 AM, Manu via Digitalmars-d wrote:
 On 17 June 2014 18:18, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 6/16/2014 10:02 PM, Manu via Digitalmars-d wrote:

 I can't imagine exceptions would appear in hot code very 
 often/ever?


 I've tried to explain this to you for months. You don't 
 believe my
 explanations, we just go round in circles. I strongly suggest 
 you write some
 code with shared_ptr<T> and try it out. Disassemble the 
 result. Benchmark
 it. Use Microsoft C++, so I won't be sabotaging your results 
 and it won't be
 because I write crappy compilers :-)

 ARC is useless without compiler support.

 It is not possible to understand the tradeoffs with ARC without 
 understanding the cost of the DEC.

Is there a way to move the costs of exception handling to occur
only if an exception is actually encountered? Perhaps something
similar to the mark & sweep for a garbage collection that can
update reference counts? The idea being that code that cares
enough about lack of pauses is not generally going to be throwing
exceptions in the first place (though may not necessarily be able
to be nothrow due to rare circumstances).

I'm guessing there's also no way to actually estimate what sort
of performance hit ARC would bring? Even something like a 20%
performance hit in functions that are not specifically optimized
for it would be reasonable if it comes at the benefit of lack of
pauses.

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 2:51 PM, Kapps wrote:
 Is there a way to move the costs of exception handling to occur
 only if an exception is actually encountered?

That's already been done to a large extent. But you can see what's left for 
yourself - compile some shared_ptr<T> code with clang/g++/VC++ and take a look 
at the generated code.


 I'm guessing there's also no way to actually estimate what sort
 of performance hit ARC would bring?

Yes, there is. Write code that a hypothetical ARC would generate, and then test
it.

Jun 17 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 17, 2014 at 09:36:09PM +1000, Manu via Digitalmars-d wrote:
 On 17 June 2014 18:18, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 6/16/2014 10:02 PM, Manu via Digitalmars-d wrote:

 I can't imagine exceptions would appear in hot code very
 often/ever?


 I've tried to explain this to you for months. You don't believe my
 explanations, we just go round in circles. I strongly suggest you
 write some code with shared_ptr<T> and try it out. Disassemble the
 result. Benchmark it. Use Microsoft C++, so I won't be sabotaging
 your results and it won't be because I write crappy compilers :-)

 
 ARC is useless without compiler support. If we can't experiment with
 the compiler's ability to eliminate redundant RC related work, then we
 aren't 'experimenting' with anything of interest.
 We agree ARC isn't acceptable without compiler support. That's never
 been on debate.
 I have no way of testing whether the compiler is able to produce
 acceptable results in C or D today. shared_ptr will demonstrate what
 we already know is not acceptable, not whether the results of compiler
 support for RC optimisation is satisfactory.
 
 I believe your explanations, but that's not where I'm hung up. In most
 cases I can visualise, there is significant opportunity for the
 compiler to eliminate redundant work, and in the remaining cases, I
 can imagine numerous very simple approaches to remove the bumps from
 hot code without interfering very much at all.

[...]

Perhaps the way to convince Walter, is to fork dmd on github, check it
out, and implement ARC in your fork (to whatever extent is necessary for
a first, possibly very crude, demonstration)? It doesn't have to work
100%, it doesn't even have to pass the test suite or compile the full D
language, it just has to do enough to prove that the compiler is capable
of automatically making performance-critical ARC optimizations on
whatever benchmark you choose to use.

This would add a lot of substance to all the hot air we've been pumping
at each other every time a GC/ARC-related thread comes up.


T

-- 
If it's green, it's biology, If it stinks, it's chemistry, If it has numbers
it's math, If it doesn't work, it's technology.

Jun 17 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 17 Jun 2014 01:02:32 -0400, Manu via Digitalmars-d  
<digitalmars-d puremagic.com> wrote:

 On 17 June 2014 13:18, Walter Bright via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 6/16/2014 5:48 PM, Manu via Digitalmars-d wrote:
 Hmmm, I still don't buy it,


 I know, but you also have not responded to my explanation of why. Such  
 as
 the "dec" being required to be inside an expensive exception handler.

 Granted. I don't really understand the situation well enough to
 comment with any authority. What are the conditions that create the
 requirement, or could relax it?

 The problem is if something throws, it need an implicit catch to
 release the ref right?

This is the issue, because the release can do a LOT of shit. It could  
release a whole tree of objects, it could start throwing its own  
exceptions.

It's like doing a lot of stuff inside a signal handler. Possible, but very  
hairy.

A possible solution might be to defer the releasing until the exception is  
caught, and have the catch statement call all the releases. Just speaking  
 from ignorance, I don't know if this solves Walter's concern. The stack  
unwinding still would have to figure all this stuff out. But if we just  
add these calls to a release pool that is drained at the catch point,  
maybe it helps solve the problem.

BTW, as a general response, I don't think swift is as proven yet as  
Objective C. As has been pointed out many times, one of the greatest (and  
arguably worst) parts of Objective C is that any C code is valid Objective  
C code. This means, you can instantly go into unsafe C mode if you need  
the performance.

 From what I can tell, swift does NOT allow that, there will need to be  
some sort of bridge there. I think it will make code that uses pure  
objective C calls and methods more pleasant to write (i.e. UI and  
framework code). I'm not sure how I feel about it yet, I have yet to use  
Swift. But I am hesitant to say that swift will be a step forward  
performance-wise. Apple claims it is, we shall see.

-Steve

Jun 17 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/16/2014 11:18 PM, Walter Bright wrote:
 On 6/16/2014 5:48 PM, Manu via Digitalmars-d wrote:
 Hmmm, I still don't buy it,

 I know, but you also have not responded to my explanation of why. Such
 as the "dec" being required to be inside an expensive exception handler.

Pardon my ignorance here, but is this a cost that would *always* be 
paid, or *only* while an exception is being thrown and the callstack is 
being unwound?

If it's only paid while an exception is thrown, then does it really 
matter? Exception throwing has never really come with any particular 
expectation of speed, being "exceptional" cases. Roughly how much of a 
slowdown are we talking here?

Jun 16 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/16/2014 10:30 PM, Nick Sabalausky wrote:
 is this a cost that would *always* be paid,

Yes.

I've probably written 30+ posts explaining this again and again. Nobody
believes 
me. I beg you to write some code and disassemble it and see for yourself.

Jun 17 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/17/2014 4:23 AM, Walter Bright wrote:
 On 6/16/2014 10:30 PM, Nick Sabalausky wrote:
 is this a cost that would *always* be paid,

 Yes.

 I've probably written 30+ posts explaining this again and again. Nobody
 believes me. I beg you to write some code and disassemble it and see for
 yourself.

It's not that I don't believe you, I guess I must have just missed a lot 
of those posts, or probably glanced through them too quickly. I'm more 
interested in this topic now than I was before, so I'm just trying to 
get up to speed (so to speak). I'll try to look up those posts again 
before I end up asking them to be repeated.

Keep in mind, for people in certain areas, the allure of a memory 
management system with no/minimal collection pauses, minimized memory 
requirements and good behavior in low-memory conditions (even if it all 
comes with a non-trivial overall performance cost and a little bit of 
care here and there) can be VERY strong. So, FWIW, that's the 
perspective where all this is coming from.

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 1:59 AM, Nick Sabalausky wrote:
 It's not that I don't believe you, I guess I must have just missed a lot of
 those posts, or probably glanced through them too quickly. I'm more interested
 in this topic now than I was before, so I'm just trying to get up to speed (so
 to speak). I'll try to look up those posts again before I end up asking them to
 be repeated.

 Keep in mind, for people in certain areas, the allure of a memory management
 system with no/minimal collection pauses, minimized memory requirements and
good
 behavior in low-memory conditions (even if it all comes with a non-trivial
 overall performance cost and a little bit of care here and there) can be VERY
 strong. So, FWIW, that's the perspective where all this is coming from.


Oh, I totally understand where it's coming from. I'm trying to point out that 
ARC is not a magic zero-cost system. Its costs are SUBSTANTIAL. But in order to 
understand those costs, it is necessary to understand how exception handling
works.

Jun 17 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/17/2014 5:10 AM, Walter Bright wrote:
 On 6/17/2014 1:59 AM, Nick Sabalausky wrote:
 Keep in mind, for people in certain areas, the allure of a memory
 management
 system with no/minimal collection pauses, minimized memory
 requirements and good
 behavior in low-memory conditions (even if it all comes with a
 non-trivial
 overall performance cost and a little bit of care here and there) can
 be VERY
 strong. So, FWIW, that's the perspective where all this is coming from.


 Oh, I totally understand where it's coming from.

Fair enough.

 I'm trying to point out
 that ARC is not a magic zero-cost system. Its costs are SUBSTANTIAL. But
 in order to understand those costs, it is necessary to understand how
 exception handling works.

Oh right, I definitely don't mean to imply it's any sort of "no cost" 
deal. It's certainly all tradeoffs, of course. As you undoubtedly know, 
for some things, GC's costs can be high, too. But like I've said, I'm an 
ARC novice, so I'm just trying to gain an understanding of the extent of 
ARC's costs since it does, at the very least, have some intriguing 
properties.

Jun 17 2014

"eles" <eles eles.com> writes:

On Tuesday, 17 June 2014 at 09:10:14 UTC, Walter Bright wrote:
 On 6/17/2014 1:59 AM, Nick Sabalausky wrote:

 Oh, I totally understand where it's coming from. I'm trying to 
 point out that ARC is not a magic zero-cost system. Its costs 
 are SUBSTANTIAL. But in order to understand those costs, it is 
 necessary to understand how exception handling works.

It is not (only) about cost, it is about determinism. Exceptions 
are on the error recovery path, so they are less important to be 
deterministic.

What is really critical is not that the execution time of such or 
such function, but the moment when it is called. And on the 
normal execution path.

This is what manual memory management (and, to a degree, ARC) and 
the GC fails to do: determinism of the calls on the normal 
execution path (and in the call of finalizers).

Yes, both ARC and manual memory management may be 
non-deterministic, but not on the normal execution path and not 
with respect to the calls (a malloc is nedeterministic, but its 
call is; you could always statically allocate beforehand).

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 2:31 AM, eles wrote:
 It is not (only) about cost, it is about determinism. Exceptions are on the
 error recovery path, so they are less important to be deterministic.

Yes, I do understand that. What I am trying to get across is that ARC can often 
consume MORE aggregate time than GC.

Jun 17 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/17/2014 2:00 PM, Walter Bright wrote:
 On 6/17/2014 2:31 AM, eles wrote:
 It is not (only) about cost, it is about determinism. Exceptions are
 on the
 error recovery path, so they are less important to be deterministic.

 Yes, I do understand that. What I am trying to get across is that ARC
 can often consume MORE aggregate time than GC.

That isn't necessarily a deal-breaker for certain applications. For 
example, most games will happily accept a slightly reduced "typical" 
framerate to reduce dropped frames or other time spikes. Framerate 
spikes can be far more jarring and noticeable for the player than 
loosing a few fps.

Of course, as usual, the acceptability of tradeoffs all depends on the 
actual numbers. Ex: An ARC that bloats overall execution time by 100x is 
certainly not worthwhile, but one that increases execution time by 0.1% 
can definitely be a fantastic tradeoff for soft-realtime and low-memory 
environments. Obviously those numbers are exaggerated examples, but just 
to illustrate the point.

Again, nobody's saying "ARC is definitely better/worthwhile". It's just 
that, since we don't have numbers, we really can't say that ARC *isn't* 
worthwhile.

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 11:38 AM, Nick Sabalausky wrote:
 That isn't necessarily a deal-breaker for certain applications.

I do know that.


 Again, nobody's saying "ARC is definitely better/worthwhile".

Yes, they are, and the problems/costs of ARC are dismissed as either 
insignificant or something that ordinary compiler technology can overcome.

I can't even get anyone to examine what code is generated for a DEC. It's like 
arguing about the best route from Seattle to Denver while refusing to look at a
map.

Jun 17 2014

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Tue, 17 Jun 2014 14:00:08 -0700
Walter Bright via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I can't even get anyone to examine what code is generated for a DEC.
 It's like arguing about the best route from Seattle to Denver while
 refusing to look at a map.

You probably explained it elsewhere and I just missed it, but what's a DEC?
I'm not finding much useful when I search for it.

- Jonathan M Davis

Jun 17 2014

"Tofu Ninja" <emmons0 purdue.edu> writes:

On Tuesday, 17 June 2014 at 23:45:20 UTC, Jonathan M Davis via 
Digitalmars-d wrote:
 You probably explained it elsewhere and I just missed it, but 
 what's a DEC?

I think he is probably talking about decrementing the ref count.

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 4:51 PM, Tofu Ninja wrote:
 On Tuesday, 17 June 2014 at 23:45:20 UTC, Jonathan M Davis via Digitalmars-d
wrote:
 You probably explained it elsewhere and I just missed it, but what's a DEC?

 I think he is probably talking about decrementing the ref count.

Yes

Jun 17 2014

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Tue, 17 Jun 2014 18:35:37 -0700
Walter Bright via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 On 6/17/2014 4:51 PM, Tofu Ninja wrote:
 On Tuesday, 17 June 2014 at 23:45:20 UTC, Jonathan M Davis via
 Digitalmars-d wrote:
 You probably explained it elsewhere and I just missed it, but
 what's a DEC?

 I think he is probably talking about decrementing the ref count.

 Yes

Okay, thanks.

- Jonathan M Davis

Jun 17 2014

"whassup" <Whasss yahoo.com> writes:

  Doesn't Nimrod do deferred referencing counting(with backup 
cycle detection). Stack references don't need to DEC/INC. Do 
conservative stack scan during collection. Nimrod doesn't have 
D's GC issues, why not just do that?

Jun 18 2014

"deadalnix" <deadalnix gmail.com> writes:

On Wednesday, 18 June 2014 at 17:08:25 UTC, whassup wrote:
  Doesn't Nimrod do deferred referencing counting(with backup 
 cycle detection). Stack references don't need to DEC/INC. Do 
 conservative stack scan during collection. Nimrod doesn't have 
 D's GC issues, why not just do that?

Ho, that make me think, doing RC but not when unwinding exception 
and backing the whole thing with a GC would probably be an 
interesting solution.

Jun 18 2014

Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Tue, 17 Jun 2014 15:02:32 +1000
Manu via Digitalmars-d <digitalmars-d puremagic.com> wrote:

 I can't imagine exceptions would appear in hot code very often/ever?

They could definitely appear in hot code if the exception is for a case that
is very rare, but it's certainly true that if an exception is likely to be
thrown frequently that that calls for rethinking that code so that exceptions
aren't thrown so frequently (and that's that much more true if you're talking
about hot code).

- Jonathan M Davis

Jun 17 2014

Jacob Carlborg <doob me.com> writes:

On 17/06/14 05:18, Walter Bright wrote:

 Note that Swift seems to not do exceptions (I may be wrong, again, I
 know little about Swift), which is one way to avoid that problem.

It does not support exceptions.

-- 
/Jacob Carlborg

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 5:21 AM, Jacob Carlborg wrote:
 On 17/06/14 05:18, Walter Bright wrote:

 Note that Swift seems to not do exceptions (I may be wrong, again, I
 know little about Swift), which is one way to avoid that problem.

 It does not support exceptions.

I wouldn't be surprised if it doesn't in order to make ARC more palatable.

Jun 17 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 17 June 2014 at 18:17:21 UTC, Walter Bright wrote:
 I wouldn't be surprised if it doesn't in order to make ARC more 
 palatable.

I don't know, but on iOS you are supposed to save state to disk 
continuously so that the app can die silently and reboot to where 
you left off. And mobile apps are supposed to boot real fast 
since they are used on the move.

So maybe they figured it was good enough to leave the try-catch 
complexity out…

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 11:25 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Tuesday, 17 June 2014 at 18:17:21 UTC, Walter Bright wrote:
 I wouldn't be surprised if it doesn't in order to make ARC more palatable.

 I don't know, but on iOS you are supposed to save state to disk continuously so
 that the app can die silently and reboot to where you left off. And mobile apps
 are supposed to boot real fast since they are used on the move.

 So maybe they figured it was good enough to leave the try-catch complexity
out…

If an app can be cheaply restarted, an easy & fast way to do memory allocation 
is to use a "bump allocator" like dmd does, and then restart when it runs out
of 
memory.

I'm not joking, this can actually be very practical for certain kinds of
programs.

Jun 17 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 17, 2014 at 11:39:50AM -0700, Walter Bright via Digitalmars-d wrote:
 On 6/17/2014 11:25 AM, "Ola Fosheim Grøstad"
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
On Tuesday, 17 June 2014 at 18:17:21 UTC, Walter Bright wrote:
I wouldn't be surprised if it doesn't in order to make ARC more
palatable.

I don't know, but on iOS you are supposed to save state to disk
continuously so that the app can die silently and reboot to where you
left off. And mobile apps are supposed to boot real fast since they
are used on the move.

So maybe they figured it was good enough to leave the try-catch
complexity out…

 
 If an app can be cheaply restarted, an easy & fast way to do memory
 allocation is to use a "bump allocator" like dmd does, and then
 restart when it runs out of memory.

I don't think the user would enjoy the app "randomly" shutting down and
starting up again on him. :-)

One idea that occurs to me, though, is to split the app into a frontend
that does not allocate during runtime, and a backend, which may. Design
it in such a way that the backend can freely restart anytime without
adversely affecting the frontend; then you can maintain an apparance of
continuous execution across backend restarts.

If the restart time can be reduced to within a single animation frame,
for example, one could actually write a game engine that never
deallocates, it just restarts itself when it runs out of memory and the
frontend maintains the façade of continuous execution. This will trump
GC, ARC, malloc, indeed, any memory allocation scheme beyond
bump-a-pointer. :-P


T

-- 
That's not a bug; that's a feature!

Jun 17 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/17/2014 3:04 PM, H. S. Teoh via Digitalmars-d wrote:
 I don't think the user would enjoy the app "randomly" shutting down and
 starting up again on him. :-)

 One idea that occurs to me, though, is to split the app into a frontend
 that does not allocate during runtime, and a backend, which may. Design
 it in such a way that the backend can freely restart anytime without
 adversely affecting the frontend; then you can maintain an apparance of
 continuous execution across backend restarts.

 If the restart time can be reduced to within a single animation frame,
 for example, one could actually write a game engine that never
 deallocates, it just restarts itself when it runs out of memory and the
 frontend maintains the façade of continuous execution. This will trump
 GC, ARC, malloc, indeed, any memory allocation scheme beyond
 bump-a-pointer. :-P

Sounds cool, but I would think in that case you may as well just stick 
with region allocators. Same effect with less overhead and more 
fine-tuning control. Or a global region allocator or something.

Speaking of manual memory management, is it currently possible to 
manually allocate closures?

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 1:36 PM, Nick Sabalausky wrote:
 Speaking of manual memory management, is it currently possible to manually
 allocate closures?

You can hijack the function that does it with your own, and do it how you like.

Jun 17 2014

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Tuesday, 17 June 2014 at 21:02:14 UTC, Walter Bright wrote:
 You can hijack the function that does it with your own, and do 
 it how you like.

Or do some kind of functor thingy and pass the closed over 
variables manually.

Jun 17 2014

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 17 June 2014 at 20:36:58 UTC, Nick Sabalausky wrote:
 Speaking of manual memory management, is it currently possible 
 to manually allocate closures?

Yes and no. You can create a delegate manually by creating a
struct with a method.

You can also force allocation on the stack by using scope.

Jun 17 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 17, 2014 at 04:36:49PM -0400, Nick Sabalausky via Digitalmars-d
wrote:
 On 6/17/2014 3:04 PM, H. S. Teoh via Digitalmars-d wrote:
I don't think the user would enjoy the app "randomly" shutting down
and starting up again on him. :-)

One idea that occurs to me, though, is to split the app into a
frontend that does not allocate during runtime, and a backend, which
may. Design it in such a way that the backend can freely restart
anytime without adversely affecting the frontend; then you can
maintain an apparance of continuous execution across backend
restarts.

If the restart time can be reduced to within a single animation
frame, for example, one could actually write a game engine that never
deallocates, it just restarts itself when it runs out of memory and
the frontend maintains the fa�ade of continuous execution. This will
trump GC, ARC, malloc, indeed, any memory allocation scheme beyond
bump-a-pointer. :-P

 
 Sounds cool, but I would think in that case you may as well just stick
 with region allocators. Same effect with less overhead and more
 fine-tuning control. Or a global region allocator or something.

[...]

Hmm, that's an idea. Instead of having a region allocator bound to a
particular function scope, have it at global level, and when it runs out
of memory, it will invoke a delegate that deallocates it, create a new
allocator, and reload all the objects back (presumably, the latest saved
copy of the objects is good enough to continue running with).

OTOH, thinking about this more carefully, it's no difference in essence
from a compacting GC (bump the pointer until you run out of memory, then
move all live objects to the bottom of the heap (i.e. restart the
backend and reload all "live objects" back into memory) and start over),
so this isn't exactly treading on new territory.


T

-- 
"I speak better English than this villain Bush" -- Mohammed Saeed al-Sahaf,
Iraqi Minister of Information

Jun 17 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/17/2014 5:13 PM, H. S. Teoh via Digitalmars-d wrote:
 On Tue, Jun 17, 2014 at 04:36:49PM -0400, Nick Sabalausky via Digitalmars-d
wrote:
 On 6/17/2014 3:04 PM, H. S. Teoh via Digitalmars-d wrote:
 I don't think the user would enjoy the app "randomly" shutting down
 and starting up again on him. :-)

 One idea that occurs to me, though, is to split the app into a
 frontend that does not allocate during runtime, and a backend, which
 may. Design it in such a way that the backend can freely restart
 anytime without adversely affecting the frontend; then you can
 maintain an apparance of continuous execution across backend
 restarts.

 If the restart time can be reduced to within a single animation
 frame, for example, one could actually write a game engine that never
 deallocates, it just restarts itself when it runs out of memory and
 the frontend maintains the façade of continuous execution. This will
 trump GC, ARC, malloc, indeed, any memory allocation scheme beyond
 bump-a-pointer. :-P

 Sounds cool, but I would think in that case you may as well just stick
 with region allocators. Same effect with less overhead and more
 fine-tuning control. Or a global region allocator or something.

 [...]

 Hmm, that's an idea. Instead of having a region allocator bound to a
 particular function scope, have it at global level, and when it runs out
 of memory, it will invoke a delegate that deallocates it, create a new
 allocator, and reload all the objects back (presumably, the latest saved
 copy of the objects is good enough to continue running with).

Actually, I was just referring to traditional usages of region 
allocators, but that's an idea too.

 OTOH, thinking about this more carefully, it's no difference in essence
 from a compacting GC (bump the pointer until you run out of memory, then
 move all live objects to the bottom of the heap (i.e. restart the
 backend and reload all "live objects" back into memory) and start over),
 so this isn't exactly treading on new territory.

I think there is a notable difference between this and compacting GC 
though. With this, you're trading off the inconvenience of paying 
careful attention to data persistence and fast reloading, for the 
benefit of eliminating the potentially-doubled memory requirements of a 
traditional compacting GC.

It's an interesting thought. Especially since the persistence stuff has 
other benefits too, like improved fault tolerance (well, as long as the 
persisted data doesn't get corrupted).

Jun 17 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 17, 2014 at 05:30:46PM -0400, Nick Sabalausky via Digitalmars-d
wrote:
 On 6/17/2014 5:13 PM, H. S. Teoh via Digitalmars-d wrote:
On Tue, Jun 17, 2014 at 04:36:49PM -0400, Nick Sabalausky via Digitalmars-d
wrote:
On 6/17/2014 3:04 PM, H. S. Teoh via Digitalmars-d wrote:



[...]
Hmm, that's an idea. Instead of having a region allocator bound to a
particular function scope, have it at global level, and when it runs
out of memory, it will invoke a delegate that deallocates it, create
a new allocator, and reload all the objects back (presumably, the
latest saved copy of the objects is good enough to continue running
with).

 
 Actually, I was just referring to traditional usages of region
 allocators, but that's an idea too.
 
OTOH, thinking about this more carefully, it's no difference in
essence from a compacting GC (bump the pointer until you run out of
memory, then move all live objects to the bottom of the heap (i.e.
restart the backend and reload all "live objects" back into memory)
and start over), so this isn't exactly treading on new territory.

 
 I think there is a notable difference between this and compacting GC
 though.  With this, you're trading off the inconvenience of paying
 careful attention to data persistence and fast reloading, for the
 benefit of eliminating the potentially-doubled memory requirements of
 a traditional compacting GC.

But this may wind up being worse than a traditional compacting GC,
because now you have to manually do essentially what the compacting GC
does for you -- decide (on slow-access persistent storage, no less!)
which objects are live and which are dead. So you end up reimplementing
your own GC. And when memory runs out, you have to load things back from
said slow-access storage, whereas a compacting GC would have the benefit
of copying only within the relatively-faster RAM.


 It's an interesting thought. Especially since the persistence stuff
 has other benefits too, like improved fault tolerance (well, as long
 as the persisted data doesn't get corrupted).

I suppose if your program already has persistence built in, then you
might as well take advantage of it by foregoing any kind of memory
management and just restart when memory runs out. I'm not sure if it's
actually any *better*, though. All of this rests on the assumption that
it's possible to (re)load all your live objects from persistent storage
within the space of a single animation frame, which may not even be
possible to begin with! :-)


T

-- 
Elegant or ugly code as well as fine or rude sentences have something in
common: they don't depend on the language. -- Luca De Vitis

Jun 17 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 17 June 2014 at 18:39:46 UTC, Walter Bright wrote:
 If an app can be cheaply restarted, an easy & fast way to do 
 memory allocation is to use a "bump allocator" like dmd does, 
 and then restart when it runs out of memory.

 I'm not joking, this can actually be very practical for certain 
 kinds of programs.

I think it was common to use a region allocator in ray tracing, 
per ray-path. Meaning you calculate the worst case memory 
consumption and malloc one big block (or put it in a segment in 
the exec), then just move a pointer ahead for allocations, and 
reset the pointer before you fire the next ray from the camera.

iOS ARC do use Autorelease Pool Blocks which takes care of 
release() at the end of the event loop. That probably affects ARC 
implementation/code gen. It basically means that you don't 
release allocated memory except at the end of an iteration. That 
won't work very well in D though?

https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html

Jun 17 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 17, 2014 at 07:19:54PM +0000, via Digitalmars-d wrote:
 On Tuesday, 17 June 2014 at 18:39:46 UTC, Walter Bright wrote:
If an app can be cheaply restarted, an easy & fast way to do memory
allocation is to use a "bump allocator" like dmd does, and then
restart when it runs out of memory.

I'm not joking, this can actually be very practical for certain kinds
of programs.

 
 I think it was common to use a region allocator in ray tracing, per
 ray-path. Meaning you calculate the worst case memory consumption and
 malloc one big block (or put it in a segment in the exec), then just
 move a pointer ahead for allocations, and reset the pointer before you
 fire the next ray from the camera.
 
 iOS ARC do use Autorelease Pool Blocks which takes care of release()
 at the end of the event loop. That probably affects ARC
 implementation/code gen. It basically means that you don't release
 allocated memory except at the end of an iteration. That won't work
 very well in D though?

[...]

Why wouldn't it? I thought that was the whole point of Andrei's work on
std.allocator, specifically, a pool allocator. You allocate a pool at
the beginning of an iteration, and it can be as simple as a
bump-the-pointer allocator inside the pool, then at the end of the
iteration you free the entire pool all at once. Presumably Andrei would
come up with some way of making sure that built-in constructs like ~
would allocate from the pool instead of the GC heap, then you don't even
need to restrict the code that runs inside each iteration.


T

-- 
Famous last words: I wonder what will happen if I do *this*...

Jun 17 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 17 June 2014 at 19:28:26 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 Why wouldn't it? I thought that was the whole point of Andrei's 
 work on
 std.allocator, specifically, a pool allocator. You allocate a 
 pool at
 the beginning of an iteration, and it can be as simple as a
 bump-the-pointer allocator inside the pool, then at the end of 
 the
 iteration you free the entire pool all at once.

One key difference is that iOS is a controlled framework and 
Apple ARC presumes that you only allocate a modest amount of ARC 
objects per event. So you have to take special care if you 
allocate lots of memory per event (create your own pools).

AFAIK,  Apple-style ARC is only safe to use if you don't treat it 
as an optional library solution. The AutoReleasePool does not 
free the entire pool, only the ones that have a retain-count of 
1. The basic idea is that you don't have to match up the initial 
allocation-retain() with a release(). So I believe the codegen 
only have to do retain()/release() when an ARC object might 
escape the "event handler" call chain…

Sure, you can do it for D, but you have to deal with additional 
issues (compared to GC).

Jun 17 2014

"deadalnix" <deadalnix gmail.com> writes:

On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d 
wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently 
 argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays

Does it answer the question ?

Jun 16 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 17 June 2014 10:08, deadalnix via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?


 http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays

 Does it answer the question ?

-Ofast seems to perform the same as C++. -Ofast allegedly does
basically what '-release -noboundscheck' does. You'd never try and
benchmark D code without those flags.

Jun 16 2014

Ary Borenszweig <ary esperanto.org.ar> writes:

On 6/16/14, 9:22 PM, Manu via Digitalmars-d wrote:
 On 17 June 2014 10:08, deadalnix via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?


 http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays

 Does it answer the question ?

 -Ofast seems to perform the same as C++. -Ofast allegedly does
 basically what '-release -noboundscheck' does. You'd never try and
 benchmark D code without those flags.

But other languages are very fast without loosing the bounds check... 
Other languages don't sacrifice safety and yet are very performant.

Jun 16 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 17 June 2014 10:57, Ary Borenszweig via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 6/16/14, 9:22 PM, Manu via Digitalmars-d wrote:
 On 17 June 2014 10:08, deadalnix via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?




 http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays

 Does it answer the question ?


 -Ofast seems to perform the same as C++. -Ofast allegedly does
 basically what '-release -noboundscheck' does. You'd never try and
 benchmark D code without those flags.


 But other languages are very fast without loosing the bounds check... Other
 languages don't sacrifice safety and yet are very performant.

The disassembly showed that without -Ofast, lots of redundant ref
fiddling remained, and also many bounds checks. It might be something
like rigid  safe-ty without -Ofast.

Jun 16 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/16/2014 5:57 PM, Ary Borenszweig wrote:
 On 6/16/14, 9:22 PM, Manu via Digitalmars-d wrote:
 On 17 June 2014 10:08, deadalnix via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?


 http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays

 Does it answer the question ?

 -Ofast seems to perform the same as C++. -Ofast allegedly does
 basically what '-release -noboundscheck' does. You'd never try and
 benchmark D code without those flags.

 But other languages are very fast without loosing the bounds check... Other
 languages don't sacrifice safety and yet are very performant.

It's also not a ref-counting benchmark.

BTW, just pushed a new optimization to dmd's back end that dramatically reduces 
the number of bounds checks. But that won't help with ref-counting, either.

Jun 16 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/16/2014 8:57 PM, Ary Borenszweig wrote:
 On 6/16/14, 9:22 PM, Manu via Digitalmars-d wrote:
 On 17 June 2014 10:08, deadalnix via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?


 http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays


 Does it answer the question ?

 -Ofast seems to perform the same as C++. -Ofast allegedly does
 basically what '-release -noboundscheck' does. You'd never try and
 benchmark D code without those flags.

 But other languages are very fast without loosing the bounds check...
 Other languages don't sacrifice safety and yet are very performant.

Well, I think interesting part we're trying to look at here is the ARC's 
impact on speed. We already know bounds-/overflow-checks can slow things 
down, so I'm not sure the -O3 and -O0 timings are relevant to the 
analysis of ARC's impact. (If anything, I have a hunch they're more 
indicative of Swift's current immaturity.)

But, the comments in that thread seem to suggest that -Ofast *keeps* the 
ARC. If that's so, then the -Ofast timings seem to suggest ARC might not 
necessarily be a performance killer. Although direct side-by-side 
comparison with a D equivalent (or an otherwise no-ARC version) would be 
more meaningful.

Jun 16 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 17 June 2014 at 05:52:37 UTC, Nick Sabalausky wrote:
 Well, I think interesting part we're trying to look at here is 
 the ARC's impact on speed.

ARC without deep whole program analysis is bound to be slow. It 
turns reads into writes. You even have to do writes to access 
read-only data-structures that will never be freed, due to 
separate compilation units.

ARC with multi-threading (and without language level 
transactions) is even worse. If you restrict ARC to thread-only 
then you might as well try to implement thread-local GC too.

Sure, with single thread restrictions, deep semantic analysis and 
heavy "templating" of functions you can get low overhead by 
inferring "borrowed pointer semantics" in the call tree, after 
taking ownership of an object, and let it propagate down the call 
chain.  But to get there you also need to deal with aggregates 
and arrays, so you need to take "over-reaching" ownership (e.g. 
take ownership of the entire array, graph or large struct) to 
avoid ref-counting every single pointer in the array/aggregate.

 We already know bounds-/overflow-checks can slow things down, 
 so I'm not sure the -O3 and -O0 timings are relevant to the 
 analysis of ARC's impact. (If anything, I have a hunch they're 
 more indicative of Swift's current immaturity.)

I somehow suspect that Apple is content if they can bring Swift 
within 10-20% of Objective-C's performance, which isn't 
impressive to begin with. The goal is to get programmers more 
productive, having a REPL etc. Swift's main competitors are 
ECMAScript6, Dart, HTML5 and cross-platform mobile scripting 
frameworks. JITs are now at about 40-20% of raw C speed, with 
multiple JITs to get fast spin-up time, that's good enough for 
most apps and even most of the code in an  average mobile game.

When devs go that route, in order to cut costs, then iOS loose 
out on iOS-only apps. The surge in mobile CPU/GPU processing 
power makes that a real threat since a yesterdays C app that runs 
on a 2-4x faster CPU can be implemented in a scripting language 
with comparable performance.

So Apple needs to cut dev costs for apps and Swift is a very 
attractive solution. Swift will probably never get the speed 
enhancements that involve challenging semantics due to the desire 
to keep the language simple (so businesses can hire cheaper 
programmers). That's the main motivation for having ARC 
everywhere IMO.

With Swift Apple can provide better tooling, not C/C++ level 
performance.

Jun 16 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/17/2014 2:56 AM, "Ola Fosheim Grøstad" 
<ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Tuesday, 17 June 2014 at 05:52:37 UTC, Nick Sabalausky wrote:
 Well, I think interesting part we're trying to look at here is the
 ARC's impact on speed.

 ARC without deep whole program analysis is bound to be slow.[...]

Right, but what I'm mainly curious about is "How much slower?" Depending 
how the numbers play out, then as Manu has mentioned, it could be that 
the relaxed memory requirements and amortized cost are enough to make it 
a good tradeoff for a lot of people (Like Manu, I have some interest in 
soft-realtime as well).

But I'm new to ARC, never even used ObjC, so I don't really even have 
much frame of reference or ballpark ideas here. So that's why I'm 
interested in the whole "How much slower?" Your descriptions of the ins 
and outs of it, and Apple's motivations, are definitely interesting. But 
even if nothing else, Manu's certainly right about one thing: What we 
need is some hard empirical data.

Jun 17 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 17 June 2014 at 08:36:10 UTC, Nick Sabalausky wrote:
 But even if nothing else, Manu's certainly right about one 
 thing: What we need is some hard empirical data.

Sure. Empirical data is needed on many levels:

1. How fast can you get the GC if you exploit all possibilities 
for semantic annotations or even constrain existing semantics?

2. How fast can you get the GC if you segment the collection (by 
thread, type, clustering of objects etc) and how does that affect 
semantics?

3. How fast can you get the GC if you change memory layout etc in 
order to limit the amount of touched cache lines?

4. How fast can you make transaction-based multithreading when 
you have Haswell-style hardware support in the CPU cache?

5. How far can you get by using region based allocators inferred 
by semantic analysis?

6. Can you exploit bit patterns on 64-but architectures if you 
provide your own malloc?

7. How far can you get by having type-based pools?

8. Can you deal with multiple pointer types if everyting is 
templated and then reduced by "de-foresting" of the AST (like 
common sub-expression elimination)?

I think D2 has too many competing features to experiment, so an 
experimental D-- implemented in D2 would be most interesting IMO. 
But it takes a group effort… :-/

Jun 17 2014

"Araq" <rumpf_a web.de> writes:

 I think D2 has too many competing features to experiment, so an 
 experimental D-- implemented in D2 would be most interesting 
 IMO. But it takes a group effort… :-/

What's the point? Nimrod already exists and answers most of your
questions. If only you would know how to ask...

Jun 17 2014

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Tuesday, 17 June 2014 at 09:24:32 UTC, Araq wrote:
 What's the point? Nimrod already exists and answers most of your
 questions. If only you would know how to ask...

If you know the answers then I am an eager listener. Go ahead! :-)

But isn't Nimrod a source-2-source compiler/translator?

In order to get efficient heap, GC, exception handling etc you 
need to go all the way down to exploiting the underlying hardware 
and memory layout on modern 64-bit architectures and leave some 
of the C-cruft behind. It basically requires runtime and 
backend-level changes.

You can get there by experimenting with one feature at the time, 
but if you add all features at once you lock yourself into a 
corner, IMO. So having a minimal and clean C-ish base as a 
starting point would be valuable.

Jun 17 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 17 June 2014 18:36, Nick Sabalausky via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 6/17/2014 2:56 AM, "Ola Fosheim Grøstad"
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Tuesday, 17 June 2014 at 05:52:37 UTC, Nick Sabalausky wrote:
 Well, I think interesting part we're trying to look at here is the
 ARC's impact on speed.


 ARC without deep whole program analysis is bound to be slow.[...]


 Right, but what I'm mainly curious about is "How much slower?" Depending how
 the numbers play out, then as Manu has mentioned, it could be that the
 relaxed memory requirements and amortized cost are enough to make it a good
 tradeoff for a lot of people (Like Manu, I have some interest in
 soft-realtime as well).

 But I'm new to ARC, never even used ObjC, so I don't really even have much
 frame of reference or ballpark ideas here. So that's why I'm interested in
 the whole "How much slower?" Your descriptions of the ins and outs of it,
 and Apple's motivations, are definitely interesting. But even if nothing
 else, Manu's certainly right about one thing: What we need is some hard
 empirical data.

Andrei posted a document some time back comparing an advanced RC
implementation with "the best GC", and it performed remarkably well,
within 10%!
D does not have 'the best GC'. I doubt D's GC is within 10% of 'the best GC'.
In addition, my colleagues have reported no significant pain working
with ARC on iOS, whereas Android developers are always crying about
the GC by contrast.

I can visualise Walter's criticisms, but what I don't know is whether
his criticisms are actually as costly as they may seem? I also haven't
seen the compilers ability to eliminate or simplify that work, and the
circumstances in which it fails. It's conceivable that simply
rearranging an access pattern slightly may offer the compiler the
structure it needs to properly eliminate the redundant work.

The thing is, I don't know! I really don't know, and I don't know any
practical way to experiment with this. D theoretically offers many
opportunities for ARC optimisation that other languages don't via it's
rich type system, so direct comparisons via O-C could probably be
reasonably considered to be quite conservative.

Here's what I do know though; nobody has offered conception of a GC
that may be acceptable on a memory limited device, and it's also not
very acceptable just by nature (destructors are completely broken;

cost).
As far as I know, there is NO OTHER CHOICE.
Either somebody invents the fantasy GC, or we actually *experiment* with ARC...

We know: GC is unacceptable, and nobody has any idea how to make one that is.
We don't know: ARC is acceptable/unacceptable. Why.

What other position can I take on this issue?

Jun 17 2014

"ed" <gmail gmail.com> writes:

On Tuesday, 17 June 2014 at 11:59:23 UTC, Manu via Digitalmars-d 
wrote:
 On 17 June 2014 18:36, Nick Sabalausky via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On 6/17/2014 2:56 AM, "Ola Fosheim Grøstad"
 <ola.fosheim.grostad+dlang gmail.com>" wrote:
 On Tuesday, 17 June 2014 at 05:52:37 UTC, Nick Sabalausky 
 wrote:
 Well, I think interesting part we're trying to look at here 
 is the
 ARC's impact on speed.


 ARC without deep whole program analysis is bound to be 
 slow.[...]


 Right, but what I'm mainly curious about is "How much slower?" 
 Depending how
 the numbers play out, then as Manu has mentioned, it could be 
 that the
 relaxed memory requirements and amortized cost are enough to 
 make it a good
 tradeoff for a lot of people (Like Manu, I have some interest 
 in
 soft-realtime as well).

 But I'm new to ARC, never even used ObjC, so I don't really 
 even have much
 frame of reference or ballpark ideas here. So that's why I'm 
 interested in
 the whole "How much slower?" Your descriptions of the ins and 
 outs of it,
 and Apple's motivations, are definitely interesting. But even 
 if nothing
 else, Manu's certainly right about one thing: What we need is 
 some hard
 empirical data.

 Andrei posted a document some time back comparing an advanced RC
 implementation with "the best GC", and it performed remarkably 
 well,
 within 10%!
 D does not have 'the best GC'. I doubt D's GC is within 10% of 
 'the best GC'.
 In addition, my colleagues have reported no significant pain 
 working
 with ARC on iOS, whereas Android developers are always crying 
 about
 the GC by contrast.

 I can visualise Walter's criticisms, but what I don't know is 
 whether
 his criticisms are actually as costly as they may seem? I also 
 haven't
 seen the compilers ability to eliminate or simplify that work, 
 and the
 circumstances in which it fails. It's conceivable that simply
 rearranging an access pattern slightly may offer the compiler 
 the
 structure it needs to properly eliminate the redundant work.

 The thing is, I don't know! I really don't know, and I don't 
 know any
 practical way to experiment with this. D theoretically offers 
 many
 opportunities for ARC optimisation that other languages don't 
 via it's
 rich type system, so direct comparisons via O-C could probably 
 be
 reasonably considered to be quite conservative.

 Here's what I do know though; nobody has offered conception of 
 a GC
 that may be acceptable on a memory limited device, and it's 
 also not
 very acceptable just by nature (destructors are completely 
 broken;

 amortised
 cost).
 As far as I know, there is NO OTHER CHOICE.
 Either somebody invents the fantasy GC, or we actually 
 *experiment* with ARC...

 We know: GC is unacceptable, and nobody has any idea how to 
 make one that is.
 We don't know: ARC is acceptable/unacceptable. Why.

 What other position can I take on this issue?

Check out the compiler and start the experiment you keep talking 
about.

Cheers,
ed

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 4:59 AM, Manu via Digitalmars-d wrote:
 What other position can I take on this issue?

Instead of taking positions, you can do some research. Understand how
shared_ptr 
works and what it costs, for starters. Try out Rust/Swift, disassemble the
code, 
and try to understand the tradeoffs those languages make.

Jun 17 2014

Joseph Rushton Wakeling via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 17/06/14 13:59, Manu via Digitalmars-d wrote:
 The thing is, I don't know! I really don't know, and I don't know any
 practical way to experiment with this. D theoretically offers many
 opportunities for ARC optimisation that other languages don't via it's
 rich type system, so direct comparisons via O-C could probably be
 reasonably considered to be quite conservative.

 Here's what I do know though; nobody has offered conception of a GC
 that may be acceptable on a memory limited device, and it's also not
 very acceptable just by nature (destructors are completely broken;

 cost).
 As far as I know, there is NO OTHER CHOICE.
 Either somebody invents the fantasy GC, or we actually *experiment* with ARC...

 We know: GC is unacceptable, and nobody has any idea how to make one that is.
 We don't know: ARC is acceptable/unacceptable. Why.

 What other position can I take on this issue?

I'm broadly sympathetic with your position here, but I think to be honest the 
only real answer is: the other position you can take is to grab the compiler 
sources and implement the support that you want to see.

I understand if you're worried that it might be a big investment of time and 
effort with uncertain prospects of acceptance, but I'm sure that if you were 
able to demonstrate something working effectively, it would shift the terms of 
the debate dramatically.

Jun 17 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 6/17/2014 2:47 PM, Joseph Rushton Wakeling via Digitalmars-d wrote:
 but I think to be honest the
 only real answer is: the other position you can take is to grab the compiler
 sources and implement the support that you want to see.

There's another way. Simply write the code that a hypothetical ARC would 
generate, and then compile/test it.

Jun 17 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 17, 2014 at 06:56:11AM +0000, via Digitalmars-d wrote:
 On Tuesday, 17 June 2014 at 05:52:37 UTC, Nick Sabalausky wrote:
Well, I think interesting part we're trying to look at here is the ARC's
impact on speed.

 
 ARC without deep whole program analysis is bound to be slow.

[...]

But isn't whole program analysis impractical with D's separate
compilation model?


T

-- 
My program has no bugs! Only unintentional features...

Jun 17 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 6/17/2014 12:13 PM, H. S. Teoh via Digitalmars-d wrote:
 On Tue, Jun 17, 2014 at 06:56:11AM +0000, via Digitalmars-d wrote:
 On Tuesday, 17 June 2014 at 05:52:37 UTC, Nick Sabalausky wrote:
 Well, I think interesting part we're trying to look at here is the ARC's
 impact on speed.

 ARC without deep whole program analysis is bound to be slow.

 [...]

 But isn't whole program analysis impractical with D's separate
 compilation model?

I think that's exactly the concern he's voicing.

Limitations of the separate compilation model have come up before. For 
example, the inability to retrieve all the subclasses of a base class. 
Pretty sure there's been others, too.

Since whole-program-at-once *is* a common way to use D, I've always felt 
that (issues of manpower aside, of course) whole-program-at-once users 
shouldn't necessarily be robbed of abilities just because *other* 
projects can't use them due to their own usage of separate-compilation. 
Sometimes things are mutually exclusive. Doesn't mean we can't let the 
user choose their tradeoffs for themselves.

But that's kinda venturing off onto a tangent.

Jun 17 2014

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 17 Jun 2014 13:55:33 -0400, Nick Sabalausky  
<SeeWebsiteToContactMe semitwist.com> wrote:

 Limitations of the separate compilation model have come up before. For  
 example, the inability to retrieve all the subclasses of a base class.

This is not an issue that can be solved by removing separate compilation.  
Dynamically loading objects could easily add to or remove from this list  
at runtime.

-Steve

Jun 17 2014

"deadalnix" <deadalnix gmail.com> writes:

On Tuesday, 17 June 2014 at 16:15:27 UTC, H. S. Teoh via
Digitalmars-d wrote:
 On Tue, Jun 17, 2014 at 06:56:11AM +0000, via Digitalmars-d 
 wrote:
 On Tuesday, 17 June 2014 at 05:52:37 UTC, Nick Sabalausky 
 wrote:
Well, I think interesting part we're trying to look at here 
is the ARC's
impact on speed.

 
 ARC without deep whole program analysis is bound to be slow.

 [...]

 But isn't whole program analysis impractical with D's separate
 compilation model?


 T

It is certainly doable in release build, granted one have a
powerful enough computer. Probably no realistic for fast, debug
builds.

Jun 17 2014

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 17, 2014 at 06:04:56PM +0000, deadalnix via Digitalmars-d wrote:
 On Tuesday, 17 June 2014 at 16:15:27 UTC, H. S. Teoh via
 Digitalmars-d wrote:
On Tue, Jun 17, 2014 at 06:56:11AM +0000, via Digitalmars-d wrote:
On Tuesday, 17 June 2014 at 05:52:37 UTC, Nick Sabalausky wrote:
Well, I think interesting part we're trying to look at here is the
ARC's impact on speed.

ARC without deep whole program analysis is bound to be slow.

[...]

But isn't whole program analysis impractical with D's separate
compilation model?


T

 
 It is certainly doable in release build, granted one have a
 powerful enough computer. Probably no realistic for fast, debug
 builds.

Well, first we need an option to turn on the GC in dmd... ;-) Otherwise
we'll run out of memory as soon as std.algorithm is involved. :-P

Frankly, I think dmd should enable its GC when compiling for -release;
it's the release build, you aren't expecting a fast code-compile-test
cycle anymore, and you probably want maximum optimization, so why not
take the time to conserve memory that can be used for other things, like
whole-program optimization, or a longer but more powerful optimization
pass?


--T

Jun 17 2014

"w0rp" <devw0rp gmail.com> writes:

On Tuesday, 17 June 2014 at 00:22:55 UTC, Manu via Digitalmars-d 
wrote:
 On 17 June 2014 10:08, deadalnix via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Monday, 16 June 2014 at 15:16:44 UTC, Manu via 
 Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently 
 argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?


 http://stackoverflow.com/questions/24101718/swift-performance-sorting-arrays

 Does it answer the question ?

 -Ofast seems to perform the same as C++. -Ofast allegedly does
 basically what '-release -noboundscheck' does. You'd never try 
 and
 benchmark D code without those flags.

The thing which slowed his loop down was the compiler not 
removing the retain and release calls for ARC in that case. As he 
says.

 With -O3 I get something that was beyond my wildest 
 imagination. The inner loop spans 88 lines of assembly code. I 
 did not try to understand all of it, but the most suspicious 
 parts are 13 invocations of "callq _swift_retain" and another 
 13 invocations of "callq _swift_release". That is, 26 
 subroutine calls in the inner loop!

Obviously they can optimise these retain and release calls away. 
However, I don't believe it will work with the right 
optimisations in place for hot code. I think the only thing which 
can work reliably in hot code is memory you manage yourself. I 
don't think there's an automatic memory management scheme which 
will do all of the work for you in most situations without 
incurring some additional runtime cost.

Jun 17 2014

Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:

On 17 June 2014 22:26, w0rp via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Tuesday, 17 June 2014 at 00:22:55 UTC, Manu via Digitalmars-d wrote:
 -Ofast seems to perform the same as C++. -Ofast allegedly does
 basically what '-release -noboundscheck' does. You'd never try and
 benchmark D code without those flags.


 The thing which slowed his loop down was the compiler not removing the
 retain and release calls for ARC in that case. As he says.

 With -O3 I get something that was beyond my wildest imagination. The inner
 loop spans 88 lines of assembly code. I did not try to understand all of it,
 but the most suspicious parts are 13 invocations of "callq _swift_retain"
 and another 13 invocations of "callq _swift_release". That is, 26 subroutine
 calls in the inner loop!


 Obviously they can optimise these retain and release calls away. However, I
 don't believe it will work with the right optimisations in place for hot
 code.

Huh? I'm not sure what you mean?
But yes, it appears that -Ofast enabled RC optimisations, which seems
like an important optimisation for an ARC based language.

 I think the only thing which can work reliably in hot code is memory
 you manage yourself. I don't think there's an automatic memory management
 scheme which will do all of the work for you in most situations without
 incurring some additional runtime cost.

Maybe. But 'hot' code would never be crafted so carelessly. It's very
easy to refactor reference capturing outside of loops. It would become
second nature in no time if it was a requirement for good performance.

Jun 17 2014

Jacob Carlborg <doob me.com> writes:

On 16/06/14 17:16, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

I think Swift is only intended for high level application development. 
It doesn't feel like a true systems or general purpose language.

-- 
/Jacob Carlborg

Jun 17 2014

"Paulo Pinto" <pjmlp progtools.org> writes:

On Tuesday, 17 June 2014 at 12:10:13 UTC, Jacob Carlborg wrote:
 On 16/06/14 17:16, Manu via Digitalmars-d wrote:
 What say you to that, Walter?

 Apple have committed to pervasive ARC, which you consistently 
 argue is
 not feasible...
 Have I missed something, or is this a demonstration that it is
 actually practical?

 I think Swift is only intended for high level application 
 development. It doesn't feel like a true systems or general 
 purpose language.

Why not? It enjoys feature parity with Objective-C and then some.

You want to do typical unsafe C style coding when required to do 
so, it is possible:

https://developer.apple.com/library/prerelease/ios/documentation/Swift/Conceptual/BuildingCocoaApps/InteractingWithCAPIs.html#//apple_ref/doc/uid/TP40014216-CH8-XID_13


--
Paulo

Jun 17 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Swift does away with pointers == pervasive ARC