digitalmars.D.announce - Work on ARM backend for DMD started

solidstate1991 (17/17) Jul 03 2017 While I currently don't have an ARM based hardware that would be

Iain Buclaw (11/29) Jul 03 2017 I'm aware that this is a topic that's occasionally brought up,

Brad Roberts via Digitalmars-d-announce (20/49) Jul 04 2017 Unless someone else toyed with it also, it was me. There's a branch

Martin Nowak (10/14) Jul 04 2017 Nice initiative.
Stefan Koch (10/28) Jul 04 2017 Far be it from be to discourage such efforts.

Walter Bright (12/14) Jul 04 2017 The backend's scheduler is actually very effective. It mattered with the...

Stefan Koch (8/18) Jul 04 2017 At a first glance it looks highly x86 specific.

Walter Bright (7/13) Jul 04 2017 The algorithm is not. The details are, of course, since if you read the ...
Walter Bright (8/9) Jul 04 2017 The code generator started out as 16 bits, and was that way for 10 years...

Johan Engelen (5/7) Jul 04 2017 Please...

Walter Bright (26/33) Jul 04 2017 With this PR:

H. S. Teoh via Digitalmars-d-announce (17/35) Jul 04 2017 I'd argue these are most important for output code quality, because

Walter Bright (5/9) Jul 04 2017 I wish people would look at it before assuming. It's not like it's a sec...

H. S. Teoh via Digitalmars-d-announce (127/144) Jul 05 2017 I did a simple test to see which loop optimizations dmd did, vs. gdc.

Walter Bright (18/21) Jul 05 2017 It does not in this case because:

Temtaime (5/5) Jul 07 2017 DMD is a piece of shit, and adding another one ARM backend with

solidstate1991 (21/26) Jul 20 2017 A few things you should be aware before you trash the reference

Walter Bright (4/5) Jul 20 2017 I wouldn't be discouraged by the nay-sayers. If you want to build an ARM...

Dejan Lekic (2/6) Jul 06 2018 Keep it that way and thanks for it!! :)

Joakim (3/8) Jul 06 2018 Btw, if you're still interested in this, AArch64 would be a

solidstate1991 <laszloszeremi outlook.com> writes:

While I currently don't have an ARM based hardware that would be 
easy to develop on, I'm planning to use QEMU to emulate some form 
of ARMv6 CPU, as it'll be the main target, as it's still being 
used in devices like the Raspberry Pi. ARMv5 is being considered 
if it doesn't need a lot of work, although I don't see a lot of 
reason behind doing it besides of the possibility of enabling the 
development of homebrew GBA, NDS, GP32, etc stuff.

As I became unemployed recently, I have a lot more time for 
development, so time now isn't an issue. Or at least until I find 
a job, which is hard due to my state as a college student, which 
I'm on the verge of losing it.

I would accept your input on various things, like if I should do 
some adjustments to the in-line assembly stuff, whether I should 
care about thumb (reduced size instruction set, not available on 
some newer targets) or not, etc. Got my hands on some official 
reference manual, it wouldn't hurt if I could research other ones 
too.

Jul 03 2017

Iain Buclaw <ibuclaw gdcproject.org> writes:

On Monday, 3 July 2017 at 23:16:07 UTC, solidstate1991 wrote:
 While I currently don't have an ARM based hardware that would 
 be easy to develop on, I'm planning to use QEMU to emulate some 
 form of ARMv6 CPU, as it'll be the main target, as it's still 
 being used in devices like the Raspberry Pi. ARMv5 is being 
 considered if it doesn't need a lot of work, although I don't 
 see a lot of reason behind doing it besides of the possibility 
 of enabling the development of homebrew GBA, NDS, GP32, etc 
 stuff.

 As I became unemployed recently, I have a lot more time for 
 development, so time now isn't an issue. Or at least until I 
 find a job, which is hard due to my state as a college student, 
 which I'm on the verge of losing it.

 I would accept your input on various things, like if I should 
 do some adjustments to the in-line assembly stuff, whether I 
 should care about thumb (reduced size instruction set, not 
 available on some newer targets) or not, etc. Got my hands on 
 some official reference manual, it wouldn't hurt if I could 
 research other ones too.

I'm aware that this is a topic that's occasionally brought up, 
but as someone is proposing to go from idea to implementation.  
It seems like a good time to point out.

Someone did this 5 years ago as part of splitting the backend 
into interfaces - or at least as a working concept that the new 
interfaces actually allowed you to implement a new target.

Maybe you should use their work as a starting or reference point. 
  You'd probably save yourself most the trouble of working out how 
things connect.

Iain.

Jul 03 2017

Brad Roberts via Digitalmars-d-announce writes:

On 7/3/2017 11:50 PM, Iain Buclaw via Digitalmars-d-announce wrote:
 On Monday, 3 July 2017 at 23:16:07 UTC, solidstate1991 wrote:
 While I currently don't have an ARM based hardware that would be easy 
 to develop on, I'm planning to use QEMU to emulate some form of ARMv6 
 CPU, as it'll be the main target, as it's still being used in devices 
 like the Raspberry Pi. ARMv5 is being considered if it doesn't need a 
 lot of work, although I don't see a lot of reason behind doing it 
 besides of the possibility of enabling the development of homebrew 
 GBA, NDS, GP32, etc stuff.

 As I became unemployed recently, I have a lot more time for 
 development, so time now isn't an issue. Or at least until I find a 
 job, which is hard due to my state as a college student, which I'm on 
 the verge of losing it.

 I would accept your input on various things, like if I should do some 
 adjustments to the in-line assembly stuff, whether I should care 
 about thumb (reduced size instruction set, not available on some 
 newer targets) or not, etc. Got my hands on some official reference 
 manual, it wouldn't hurt if I could research other ones too.

 I'm aware that this is a topic that's occasionally brought up, but as 
 someone is proposing to go from idea to implementation.  It seems like 
 a good time to point out.

 Someone did this 5 years ago as part of splitting the backend into 
 interfaces - or at least as a working concept that the new interfaces 
 actually allowed you to implement a new target.

 Maybe you should use their work as a starting or reference point. 
  You'd probably save yourself most the trouble of working out how 
 things connect.

 Iain.

Unless someone else toyed with it also, it was me.  There's a branch 
called 'arm' in my fork of dmd that has a lot of groundwork.  I'm sure 
it's somewhat bitrotten in the few years since I last looked at it.  I 
got as far as being able to emit some _extremely_ basic functions (like 
calls to libc -- printf worked) and link.  I wrote the asm code -- as an 
exercise to force being able to encode much of the arm instruction set 
(if I remember right, pretty much everything except the neon vector 
instructions, and maybe even part of that set) in code structs.  But I 
didn't get to writing the arm version of almost any cd* functions to 
translate the ir into actual code objects.

Honestly, it's a pretty bad proposition.  I did what I did as much to 
learn about the arm instruction set as to get an arm dmd backend.  It 
did teach me a lot and I don't consider it entirely wasted time, but if 
the aim is to do anything beyond learning, I'd urge looking for a 
different project.  Just getting code of really bad quality emitted will 
be a lot of work (on top of all the parts I did).  Getting mediocre code 
will be another large amount of work. Getting code close to ldc or gdc 
is unlikely to ever happen.

So, look closely at your motivations and available time.

Jul 04 2017

Martin Nowak <code dawg.eu> writes:

On Monday, 3 July 2017 at 23:16:07 UTC, solidstate1991 wrote:
 While I currently don't have an ARM based hardware that would 
 be easy to develop on, I'm planning to use QEMU to emulate some 
 form of ARMv6 CPU, as it'll be the main target, as it's still 
 being used in devices like the Raspberry Pi.

Nice initiative.

Let me still point out the obvious, we already do have working 
ARM backends from both gdc and ldc.
https://gdcproject.org/downloads
https://wiki.dlang.org/LDC#ARM

If you're interested in spending that amount of time into ARM 
development, you might find improving bare-metal ARM support for 
embedded systems (noeabi) or AARCH64 support of druntime/phobos 
equally interesting projects with a bit more impact.

Jul 04 2017

Stefan Koch <uplink.coder googlemail.com> writes:

On Monday, 3 July 2017 at 23:16:07 UTC, solidstate1991 wrote:
 While I currently don't have an ARM based hardware that would 
 be easy to develop on, I'm planning to use QEMU to emulate some 
 form of ARMv6 CPU, as it'll be the main target, as it's still 
 being used in devices like the Raspberry Pi. ARMv5 is being 
 considered if it doesn't need a lot of work, although I don't 
 see a lot of reason behind doing it besides of the possibility 
 of enabling the development of homebrew GBA, NDS, GP32, etc 
 stuff.

 As I became unemployed recently, I have a lot more time for 
 development, so time now isn't an issue. Or at least until I 
 find a job, which is hard due to my state as a college student, 
 which I'm on the verge of losing it.

 I would accept your input on various things, like if I should 
 do some adjustments to the in-line assembly stuff, whether I 
 should care about thumb (reduced size instruction set, not 
 available on some newer targets) or not, etc. Got my hands on 
 some official reference manual, it wouldn't hurt if I could 
 research other ones too.

Far be it from be to discourage such efforts.
But you should be aware that writing a backend for dmd from 
scratch is not an easy task.
It will take time alot of time. Even if you have previous 
experience with codegen.

And it is unlikely to yield satisfactory results.
Most arm implementation are not as forgiving as contemporary x86 
processors when it comes to bad register scheduling and the like.

What exactly is your motivation for doing this ?

Jul 04 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 7/4/2017 1:15 PM, Stefan Koch wrote:
 Most arm implementation are not as forgiving as contemporary x86 processors
when 
 it comes to bad register scheduling and the like.

The backend's scheduler is actually very effective. It mattered with the
Pentium 
and Pentium Pro processors, but not anymore. But the code is still there, and 
still works, and the algorithm is sound.

   https://github.com/dlang/dmd/blob/master/src/ddmd/backend/cgsched.c

The backend has also been accused of not doing data flow analysis. It does as 
good a flow analysis as any compiler.

Where the backend has fallen behind are:

1. loop unrolling
2. better inlining
3. SROA
4. vectorization

Jul 04 2017

Stefan Koch <uplink.coder googlemail.com> writes:

On Tuesday, 4 July 2017 at 21:10:45 UTC, Walter Bright wrote:
 On 7/4/2017 1:15 PM, Stefan Koch wrote:
 Most arm implementation are not as forgiving as contemporary 
 x86 processors when it comes to bad register scheduling and 
 the like.

 The backend's scheduler is actually very effective. It mattered 
 with the Pentium and Pentium Pro processors, but not anymore. 
 But the code is still there, and still works, and the algorithm 
 is sound.

   
 https://github.com/dlang/dmd/blob/master/src/ddmd/backend/cgsched.c

At a first glance it looks highly x86 specific.
I am not sure how much of this really lends itself to be applied 
on arm.
The backend-IR does not seem to be able to express some ARM 
concepts such as predicated instructions. While those maybe 
shoehorned in, it is likely to be impractical to reuse most of 
this code.

Jul 04 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 7/4/2017 2:25 PM, Stefan Koch wrote:
 At a first glance it looks highly x86 specific.

The algorithm is not. The details are, of course, since if you read the Intel 
CPU manual there is an incredible amount of detail.

 I am not sure how much of this really lends itself to be applied on arm.
 The backend-IR does not seem to be able to express some ARM concepts such as 
 predicated instructions.

Predicated instructions are just a larger pattern to the code generator. I 
didn't see anything in the LLVM IR that is specific to it.

 While those maybe shoehorned in, it is likely to be 
 impractical to reuse most of this code.

The algorithm (which is not trivial) can be used. The rest is constructing the 
table of dependencies and special cases.

Jul 04 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 7/4/2017 2:25 PM, Stefan Koch wrote:
 I am not sure how much of this really lends itself to be applied on arm.

The code generator started out as 16 bits, and was that way for 10 years or so. 
x87 got added in later. Then it was adapted for 32 bits. Another 10 years went 
by, then 64 bits, and then XMM vector instructions.

So I think it has proven itself to not be horribly locked in to one 
architecture. x86, x87, and XMM are quite different.

There were also at one time back ends for the 68000 and the PowerPC that used 
the same optimizer and IR.

Jul 04 2017

Johan Engelen <j j.nl> writes:

On Tuesday, 4 July 2017 at 21:10:45 UTC, Walter Bright wrote:
 The backend has also been accused of not doing data flow 
 analysis. It does as good a flow analysis as any compiler.

Please...
DMD: https://goo.gl/wHTPzz
GDC & LDC: https://godbolt.org/g/QFSgaX

-Johan

Jul 04 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 7/4/2017 4:09 PM, Johan Engelen wrote:
 On Tuesday, 4 July 2017 at 21:10:45 UTC, Walter Bright wrote:
 The backend has also been accused of not doing data flow analysis. It does as 
 good a flow analysis as any compiler.

 
 Please...
 DMD: https://goo.gl/wHTPzz
 GDC & LDC: https://godbolt.org/g/QFSgaX

With this PR:

   https://github.com/dlang/dmd/pull/6968

The code:

   int basicfunc(int i) {
     return i;
   }

   int dataflow(int b) {
     int ret;

     if (b==4)
       ret = 3;
     else
       ret = 5;

     if (ret == 4)
       return 0;
     else
       return 1;
   }

Produces on Win32:

   _D5test49basicfuncFiZi  comdat
                 ret   // this is not a bug, as `i` is passed in EAX

   _D5test48dataflowFiZi   comdat
                 mov     EAX,1
                 ret

I'm sure you can find a case where LLVM does a better job. But I think I've
made 
the point :-)

Jul 04 2017

"H. S. Teoh via Digitalmars-d-announce" writes:

On Tue, Jul 04, 2017 at 02:10:45PM -0700, Walter Bright via
Digitalmars-d-announce wrote:
 On 7/4/2017 1:15 PM, Stefan Koch wrote:
 Most arm implementation are not as forgiving as contemporary x86
 processors when it comes to bad register scheduling and the like.

 
 The backend's scheduler is actually very effective. It mattered with
 the Pentium and Pentium Pro processors, but not anymore. But the code
 is still there, and still works, and the algorithm is sound.
 
   https://github.com/dlang/dmd/blob/master/src/ddmd/backend/cgsched.c
 
 The backend has also been accused of not doing data flow analysis. It
 does as good a flow analysis as any compiler.
 
 Where the backend has fallen behind are:
 
 1. loop unrolling
 2. better inlining

I'd argue these are most important for output code quality, because
performance bottlenecks are usually found in loops, and inlining is a
key component to enabling further reduction of loop complexity during
loop optimization.  Inlining is also critical in range-based code, which
is fast becoming the de facto D coding style these days.

Also, loop unrolling is only the beginning.  Other loop optimizations
are just as important, like strength reduction, hoisting, etc.. (Caveat:
I haven't checked whether DMD specifically performs these optimizations.
But based on looking at previous dmd output, I'm leaning towards no.)

It would be nice if the dmd backend at least got a facelift in these
areas, even if you didn't have the time to do a full-fledged backend
update...


 3. SROA

This may be important in optimizations of range-based code.


T

-- 
The trouble with TCP jokes is that it's like hearing the same joke over and
over.

Jul 04 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 7/4/2017 4:14 PM, H. S. Teoh via Digitalmars-d-announce wrote:
 Also, loop unrolling is only the beginning.  Other loop optimizations
 are just as important, like strength reduction, hoisting, etc.. (Caveat:
 I haven't checked whether DMD specifically performs these optimizations.

It does.

 But based on looking at previous dmd output, I'm leaning towards no.)

I wish people would look at it before assuming. It's not like it's a secret.

   https://github.com/dlang/dmd/blob/master/src/ddmd/backend/gloop.c

Read the comments.

Jul 04 2017

"H. S. Teoh via Digitalmars-d-announce" writes:

On Tue, Jul 04, 2017 at 05:11:55PM -0700, Walter Bright via
Digitalmars-d-announce wrote:
 On 7/4/2017 4:14 PM, H. S. Teoh via Digitalmars-d-announce wrote:
 Also, loop unrolling is only the beginning.  Other loop
 optimizations are just as important, like strength reduction,
 hoisting, etc.. (Caveat: I haven't checked whether DMD specifically
 performs these optimizations.

 
 It does.
 
 But based on looking at previous dmd output, I'm leaning towards
 no.)

 
 I wish people would look at it before assuming. It's not like it's a
 secret.
 
   https://github.com/dlang/dmd/blob/master/src/ddmd/backend/gloop.c
 
 Read the comments.

I did a simple test to see which loop optimizations dmd did, vs. gdc.
Here's the test code:

------
int func(int[] data)
{
	int i, j;
	for (i = 0; i < 10; i++) {
		data[i*10] = i;
		j = data[0] * 10;
	}
	return j;
}
void main() {
	import std.stdio;
	auto data = new int[100];
	writeln(func(data));
}
------

Here's the output of dmd -O (git HEAD):

------
0000000000046b00 <_D4test4funcFAiZi>:
   46b00:	55                   	push   %rbp
   46b01:	48 8b ec             	mov    %rsp,%rbp
   46b04:	48 89 fa             	mov    %rdi,%rdx
   46b07:	49 89 f1             	mov    %rsi,%r9
   46b0a:	45 31 c0             	xor    %r8d,%r8d
   46b0d:	31 c9                	xor    %ecx,%ecx
   46b0f:	48 63 c1             	movslq %ecx,%rax
   46b12:	48 3b c2             	cmp    %rdx,%rax
   46b15:	72 11                	jb     46b28 <_D4test4funcFAiZi+0x28>
   46b17:	be 05 00 00 00       	mov    $0x5,%esi

   46b23:	e8 64 0a 00 00       	callq  4758c <_d_arrayboundsp>
   46b28:	45 89 04 81          	mov    %r8d,(%r9,%rax,4)
   46b2c:	48 85 d2             	test   %rdx,%rdx
   46b2f:	75 11                	jne    46b42 <_D4test4funcFAiZi+0x42>
   46b31:	be 06 00 00 00       	mov    $0x6,%esi

   46b3d:	e8 4a 0a 00 00       	callq  4758c <_d_arrayboundsp>
   46b42:	41 8b 01             	mov    (%r9),%eax
   46b45:	44 8d 1c 80          	lea    (%rax,%rax,4),%r11d
   46b49:	45 03 db             	add    %r11d,%r11d
   46b4c:	83 c1 0a             	add    $0xa,%ecx
   46b4f:	41 ff c0             	inc    %r8d
   46b52:	41 83 f8 0a          	cmp    $0xa,%r8d
   46b56:	72 b7                	jb     46b0f <_D4test4funcFAiZi+0xf>
   46b58:	41 8b c3             	mov    %r11d,%eax
   46b5b:	5d                   	pop    %rbp
   46b5c:	c3                   	retq   
   46b5d:	00 00                	add    %al,(%rax)
	...
------

Note: for conciseness' sake I omitted the disassembly of main(), since
it's not directly relevant here.

Here are some pertinent points of observation:

- Strength reduction was done, as seen in the line 46b4c: corresponding
  with the array index computation i*10.

- Code hoisting was NOT done (in this case): the second line in the loop
  body does not depend on the loop index, but dmd did not hoist it out
  of the loop. This can be see by the end of loop branch on line 46b56:
  the branch destination is 46b0f, near the beginning of the function,
  and the code path from there includes the code for the assignment to
  j. While some clever tricks were done to avoid using the mul
  instruction for computing data[0]*10, this computation was
  unfortunately repeated 10 times even though it only needed to be
  computed once. In particular, the load of data[0] on line 46b42 is
  repeated 10 times, followed by the *10 computation.

- There are two calls to _d_arrayboundsp inside the loop body, along
  with branches around them. This seems needless, since one bounds check
  ought to be enough to ensure the array lookups are within bounds.
  Also, there are 2 branches within the loop body (not counting the
  end-of-loop branch), whereas it could have been simplified to one
  (less branch hazards on the CPU pipeline).


In comparison, here's the output of gdc -O (gdc 6.3.0):

------
0000000000020080 <_D4test4funcFAiZi>:
   20080:	48 85 ff             	test   %rdi,%rdi
   20083:	74 33                	je     200b8 <_D4test4funcFAiZi+0x38>
   20085:	48 89 f9             	mov    %rdi,%rcx
   20088:	49 89 f0             	mov    %rsi,%r8
   2008b:	c7 06 00 00 00 00    	movl   $0x0,(%rsi)
   20091:	ba 0a 00 00 00       	mov    $0xa,%edx
   20096:	b8 01 00 00 00       	mov    $0x1,%eax
   2009b:	48 39 d1             	cmp    %rdx,%rcx
   2009e:	76 18                	jbe    200b8 <_D4test4funcFAiZi+0x38>
   200a0:	41 89 04 90          	mov    %eax,(%r8,%rdx,4)
   200a4:	83 c0 01             	add    $0x1,%eax
   200a7:	48 83 c2 0a          	add    $0xa,%rdx
   200ab:	83 f8 0a             	cmp    $0xa,%eax
   200ae:	75 eb                	jne    2009b <_D4test4funcFAiZi+0x1b>
   200b0:	8b 06                	mov    (%rsi),%eax
   200b2:	8d 04 80             	lea    (%rax,%rax,4),%eax
   200b5:	01 c0                	add    %eax,%eax
   200b7:	c3                   	retq   
   200b8:	48 83 ec 08          	sub    $0x8,%rsp
   200bc:	ba 05 00 00 00       	mov    $0x5,%edx
   200c1:	bf 06 00 00 00       	mov    $0x6,%edi

<_IO_stdin_used+0x4e>
   200cd:	e8 8e a4 03 00       	callq  5a560 <_d_arraybounds>
------

Comparing this with dmd's output, we see:

- Strength reduction was done on the i*10 computation (line 200a7), just
  as in the dmd output.

- Code hoisting was also done (unlike dmd): the computation data[0]*10 was
  hoisted out of the loop (line 200b2), and only computed once after the
  end of the loop, as opposed to computed 10 times. Notably, we're no
  longer loading data[0] 10 times, but just once at the end of the loop.

- One of the bounds checks is moved out of the loop body, so there is
  only 1 branch inside the loop (less branch hazards on the CPU
  pipeline).

- The function is noticeably smaller than dmd's output, due to gdc
  merging the calls to _d_arraybounds into a single path.


Now, granted, my test case could be construed to be unfair, because the
assignment to j depends on the result of the first loop iteration
(data[0] is assigned to before it's read by the assignment to j). So
it's not truly loop-invariant in the strict sense.  However, as the gdc
output shows, the compiler ought to be able to refactor things so that
the assignment is moved out of the loop.

So while I was wrong about dmd not doing strength reduction, my
conclusion is still that dmd's codegen for loops leaves more to be
desired.  In particular, it doesn't seem to do code hoisting, as least
not for this case, whereas gdc does (and consistently so in other loop
code I've looked at in the past).


T

-- 
An imaginary friend squared is a real enemy.

Jul 05 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 7/5/2017 4:48 PM, H. S. Teoh via Digitalmars-d-announce wrote:
 In particular, it doesn't seem to do code hoisting, as least
 not for this case,

It does not in this case because:

     data[0]

is actually:

     *data.ptr

i.e. a read through a pointer. Inside the loop, there is also:

     data[i * 10] = ...

which is an assignment through a pointer. The assignment through the pointer 
makes a read through a pointer not loop invariant. It's only possible to pull 
out the assignment to j if loop unrolling is done, which as I said is not done 
by DMD.

Loop invariant removal (aka code hoisting) *is* done in the optimizer.

   https://github.com/dlang/dmd/blob/master/src/ddmd/backend/gloop.c#L678


 There are two calls to _d_arrayboundsp inside the loop body, along with 

branches around them. This seems needless, since one bounds check ought to be 
enough to ensure the array lookups are within bounds. Also, there are 2
branches 
within the loop body (not counting the end-of-loop branch), whereas it could 
have been simplified to one (less branch hazards on the CPU pipeline).

Yes, that's true, I'm not sure why that isn't happening. It should be.

Jul 05 2017

Temtaime <temtaime gmail.com> writes:

DMD is a piece of shit, and adding another one ARM backend with 
all those bugs and low performance instead of improving ldc is 
wasting efforts.
The only use of dmd is development because of compilation speed.
But some persons have "cerveau lent" and just cannot realise it.

Jul 07 2017

solidstate1991 <laszloszeremi outlook.com> writes:

On Friday, 7 July 2017 at 11:09:27 UTC, Temtaime wrote:
 DMD is a piece of shit, and adding another one ARM backend with 
 all those bugs and low performance instead of improving ldc is 
 wasting efforts.
 The only use of dmd is development because of compilation speed.
 But some persons have "cerveau lent" and just cannot realise it.

A few things you should be aware before you trash the reference 
compiler for D:

- Most of DMD's frontend and the part of the backend is in D. 
This means better productivity in the long run, especially once 
the whole of the backend is ported to D.
- Well, it's the reference compiler. I understand that you would 
like to see many of the devs on DMD to move towards LDC instead. 
I myself like some healthy competition.
- The performance issues can be fixed in the long run. I myself 
thinking on fixing some of the issues of DMD, like the SIMD 
support (might end up in issuing a DIP for better support the 
hardware functions).

I think first I might learn how the current codegen works, issue 
some improvements, as learning how the arm architecture works is 
a hard work, I don't even know what to do with condition codes 
(ignore them completely, or use them in certain situations to 
save a few conditional jump?), thumb (yet another attribute to 
force the compiler to use thumb for the part of the code?), etc. 
I'll recycle some of the preexisting code which was made by 
another user.

Jul 20 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 7/20/2017 9:22 AM, solidstate1991 wrote:
 A few things you should be aware before you trash the reference compiler for D:

I wouldn't be discouraged by the nay-sayers. If you want to build an ARM back 
end for it, do it! About every project I've ever embarked on, including D, 
started with everyone nay-saying it.

Jul 20 2017

Dejan Lekic <dejan.lekic gmail.com> writes:

On Thursday, 20 July 2017 at 22:08:16 UTC, Walter Bright wrote:
 I wouldn't be discouraged by the nay-sayers. If you want to 
 build an ARM back end for it, do it! About every project I've 
 ever embarked on, including D, started with everyone nay-saying 
 it.

Keep it that way and thanks for it!! :)

Jul 06 2018

Joakim <dlang joakim.fea.st> writes:

On Thursday, 20 July 2017 at 16:22:59 UTC, solidstate1991 wrote:
 On Friday, 7 July 2017 at 11:09:27 UTC, Temtaime wrote:
 [...]

 A few things you should be aware before you trash the reference 
 compiler for D:

 [...]

Btw, if you're still interested in this, AArch64 would be a 
better target, as 32-bit ARM is being replaced by it.

Jul 06 2018

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Work on ARM backend for DMD started