www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Javascript bytecode

reply Walter Bright <newshound2 digitalmars.com> writes:
An interesting datapoint in regards to bytecode is Javascript. Note that 
Javascript is not distributed in bytecode form. There is no Javascript VM. It
is 
distributed as source code. Sometimes, that source code is compressed and 
obfuscated, nevertheless it is still source code.

How the end system chooses to execute the js is up to that end system, and 
indeed there are a great variety of methods in use.

Javascript proves that bytecode is not required for "write once, run 
everywhere", which was one of the pitches for bytecode.

What is required for w.o.r.e. is a specification for the source code that 
precludes undefined and implementation defined behavior.

Note also that Typescript compiles to Javascript. I suspect there are other 
languages that do so, too.
Dec 18 2012
next sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright wrote:
 I suspect there are other languages that do so, too.

Including (a buggy, incomplete subset of) D! https://github.com/adamdruppe/dmd/tree/dtojs
Dec 18 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright wrote:
 An interesting datapoint in regards to bytecode is Javascript. 
 Note that Javascript is not distributed in bytecode form. There 
 is no Javascript VM. It is distributed as source code. 
 Sometimes, that source code is compressed and obfuscated, 
 nevertheless it is still source code.

 How the end system chooses to execute the js is up to that end 
 system, and indeed there are a great variety of methods in use.

 Javascript proves that bytecode is not required for "write 
 once, run everywhere", which was one of the pitches for 
 bytecode.

 What is required for w.o.r.e. is a specification for the source 
 code that precludes undefined and implementation defined 
 behavior.

 Note also that Typescript compiles to Javascript. I suspect 
 there are other languages that do so, too.

Actually, they call JavaScript an IL for the next ten years.
Dec 18 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Tuesday, 18 December 2012 at 18:22:40 UTC, Max Samukha wrote:

 Actually, they call JavaScript an IL for the next ten years.

s/an/the
Dec 18 2012
prev sibling next sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright wrote:
 Javascript proves that bytecode is not required for "write 
 once, run everywhere", which was one of the pitches for 
 bytecode.

 What is required for w.o.r.e. is a specification for the source 
 code that precludes undefined and implementation defined 
 behavior.

Yes, bytecode isn't strictly required, but it's certainly desirable. Bytecode is much easier to interpret, much easier to compile to, and more compact. The downside of bytecode is loss of high-level meaning... but that depends on the bytecode. There's nothing stopping the bytecode from being a serialised AST (actually, that would be ideal).
 Note also that Typescript compiles to Javascript. I suspect 
 there are other languages that do so, too.

There are lots. It's probably the most compiled-to high level language language out there (including C).
Dec 18 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/18/2012 10:29 AM, Peter Alexander wrote:
 On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright wrote:
 Javascript proves that bytecode is not required for "write once, run
 everywhere", which was one of the pitches for bytecode.

 What is required for w.o.r.e. is a specification for the source code that
 precludes undefined and implementation defined behavior.

Yes, bytecode isn't strictly required, but it's certainly desirable. Bytecode is much easier to interpret, much easier to compile to, and more compact.

Bytecode would have added nothing to js but complexity. I think you're seriously overestimating the cost of compilation.
 The downside of bytecode is loss of high-level meaning... but that depends on
 the bytecode. There's nothing stopping the bytecode from being a serialised AST
 (actually, that would be ideal).

As I pointed out to Andrei, Java bytecode *is* a serialized AST.
Dec 18 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/18/2012 12:38 PM, Peter Alexander wrote:
 On Tuesday, 18 December 2012 at 19:25:01 UTC, Walter Bright wrote:
 On 12/18/2012 10:29 AM, Peter Alexander wrote:
 On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright wrote:
 Javascript proves that bytecode is not required for "write once, run
 everywhere", which was one of the pitches for bytecode.

 What is required for w.o.r.e. is a specification for the source code that
 precludes undefined and implementation defined behavior.

Yes, bytecode isn't strictly required, but it's certainly desirable. Bytecode is much easier to interpret, much easier to compile to, and more compact.

Bytecode would have added nothing to js but complexity. I think you're seriously overestimating the cost of compilation.

When I say "easier", I'm talking about implementation cost. Consider how easy it is to write a conforming Java byte code interpreter compared to a conforming Java interpreter/compiler. Parsing and semantic analysis are much easier to get wrong than a byte code spec. At the bytecode level, you don't need to worry about function overloading, symbol tables, variable scoping, type inference, forward references etc. etc. All those things are intentional complexities meant to make life easier for the programmer, not the computer. A bytecode doesn't need them.

D is open source. There is little implementation cost to doing a compiler for it. It's a solved problem. A bytecode requires another spec to be written, and if you think it's easy to make a conformant Java VM bytecode interpreter, think again :-)
 The downside of bytecode is loss of high-level meaning... but that depends on
 the bytecode. There's nothing stopping the bytecode from being a serialised AST
 (actually, that would be ideal).

As I pointed out to Andrei, Java bytecode *is* a serialized AST.

It's not a lossless serialisation -- especially not after optimisation. For example, it's non-trivial to reconstruct the AST of a for loop from optimised bytecode (or even regular bytecode).

Yes, it is trivial. The only thing that is lost are local variable names and comments.
Dec 18 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/18/2012 1:57 PM, deadalnix wrote:
 Yes, it is trivial. The only thing that is lost are local variable names and
 comments.

You'll find tools that compact your whole project, loosing in the process all names.

I believe you're conflating releasing a "whole project" with what I'm talking about, which is releasing modules meant to be incorporated into a user project, which won't work if the names are changed. And besides, changing the names doesn't change the fact that Java .class files include 100% of the type information.
Dec 18 2012
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Dec 18, 2012 at 10:11:37AM -0800, Walter Bright wrote:
 An interesting datapoint in regards to bytecode is Javascript. Note
 that Javascript is not distributed in bytecode form. There is no
 Javascript VM. It is distributed as source code. Sometimes, that
 source code is compressed and obfuscated, nevertheless it is still
 source code.
 
 How the end system chooses to execute the js is up to that end
 system, and indeed there are a great variety of methods in use.
 
 Javascript proves that bytecode is not required for "write once, run
 everywhere", which was one of the pitches for bytecode.

I never liked that motto of Java's. It's one of those things that are too good to be true, and papers over very real, complex cross-platform compatibility issues. I prefer "write once, debug everywhere". :-P
 What is required for w.o.r.e. is a specification for the source code
 that precludes undefined and implementation defined behavior.

What would you do with system-specific things like filesystem manipulation, though? That has to be implementation-defined by definition. And IME, any abstraction that's both (1) completely-defined without any implementation differences and (2) covers every possible platform that ever existed and will exist, is either totally useless from being over-complex and over-engineered, or completely fails to capture the complexity of real-world systems and the details required to work with them efficiently. T -- WINDOWS = Will Install Needless Data On Whole System -- CompuMan
Dec 18 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/18/2012 11:41 AM, H. S. Teoh wrote:
 What is required for w.o.r.e. is a specification for the source code
 that precludes undefined and implementation defined behavior.

What would you do with system-specific things like filesystem manipulation, though? That has to be implementation-defined by definition. And IME, any abstraction that's both (1) completely-defined without any implementation differences and (2) covers every possible platform that ever existed and will exist, is either totally useless from being over-complex and over-engineered, or completely fails to capture the complexity of real-world systems and the details required to work with them efficiently.

Well, I was thinking of the language, not the runtime library, which is a separate issue. And no, I don't think D can be a systems language *and* eliminate all undefined and implementation defined behavior.
Dec 18 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/18/12 3:35 PM, Walter Bright wrote:
 And no, I don't think D can be a systems language *and* eliminate all
 undefined and implementation defined behavior.

The SafeD subset takes care of that. Andrei
Dec 18 2012
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/18/12 7:29 PM, H. S. Teoh wrote:
 On Tue, Dec 18, 2012 at 07:08:04PM -0500, Andrei Alexandrescu wrote:
 On 12/18/12 3:35 PM, Walter Bright wrote:
 And no, I don't think D can be a systems language *and* eliminate all
 undefined and implementation defined behavior.

The SafeD subset takes care of that.

Which right now suffers from some silly things like writefln not being able to be made safe, just because some obscure formatting parameter is un safe. Which is exactly how safe was designed, of course. Except that it makes SafeD ... a bit of a letdown, shall we say? - when it comes to practical real-world applications. (And just to be clear, I'm all for SafeD, but it does still have a ways to go.)

Yes, there are several bugs related to SafeD. Andrei
Dec 18 2012
next sibling parent reply Brad Roberts <braddr slice-2.puremagic.com> writes:
On Tue, 18 Dec 2012, Andrei Alexandrescu wrote:

 On 12/18/12 7:29 PM, H. S. Teoh wrote:
 Which right now suffers from some silly things like writefln not being
 able to be made  safe, just because some obscure formatting parameter is
 un safe. Which is exactly how  safe was designed, of course.  Except
 that it makes SafeD ... a bit of a letdown, shall we say? - when it
 comes to practical real-world applications.
 
 (And just to be clear, I'm all for SafeD, but it does still have a ways
 to go.)

Yes, there are several bugs related to SafeD. Andrei

Are the remaining issues at the compiler, runtime, or phobos levels (or what combination of the three)? Are the bugs filed?
Dec 18 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/19/12 8:13 AM, David Nadlinger wrote:
 On Wednesday, 19 December 2012 at 07:14:30 UTC, Rob T wrote:
 On Wednesday, 19 December 2012 at 01:58:54 UTC, Jonathan M Davis wrote:
 Such operations should be  system but are currently considered  safe.
 Who
 knows how many others we've missed beyond what's currently in bugzilla.

 - Jonathan M Davis

Unfortunately fixing these will break existing code, or can the behavior be depreciated?

We *must* take the liberty to fix them; if SafeD is not sound, it's hardly worth its salt. David

Yes. Andrei
Dec 19 2012
prev sibling next sibling parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Tuesday, December 18, 2012 17:57:50 Brad Roberts wrote:
 On Tue, 18 Dec 2012, Andrei Alexandrescu wrote:
 On 12/18/12 7:29 PM, H. S. Teoh wrote:
 Which right now suffers from some silly things like writefln not being
 able to be made  safe, just because some obscure formatting parameter is
 un safe. Which is exactly how  safe was designed, of course. Except
 that it makes SafeD ... a bit of a letdown, shall we say? - when it
 comes to practical real-world applications.
 
 (And just to be clear, I'm all for SafeD, but it does still have a ways
 to go.)

Yes, there are several bugs related to SafeD. Andrei

Are the remaining issues at the compiler, runtime, or phobos levels (or what combination of the three)? Are the bugs filed?

Quite a few are, but it wouldn't surprise me at all if there are quite a few which aren't. For instance, AFAIK, no one ever brought up the issue of slicing static arrays being unsafe until just a couple of months ago: http://d.puremagic.com/issues/show_bug.cgi?id=8838 Such operations should be system but are currently considered safe. Who knows how many others we've missed beyond what's currently in bugzilla. - Jonathan M Davis
Dec 18 2012
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 19 December 2012 at 01:58:54 UTC, Jonathan M Davis 
wrote:
 Are the remaining issues at the compiler, runtime, or phobos 
 levels (or
 what combination of the three)? Are the bugs filed?

Quite a few are, but it wouldn't surprise me at all if there are quite a few which aren't. For instance, AFAIK, no one ever brought up the issue of slicing static arrays being unsafe until just a couple of months ago: http://d.puremagic.com/issues/show_bug.cgi?id=8838 Such operations should be system but are currently considered safe. Who knows how many others we've missed beyond what's currently in bugzilla.

This is chicken and egg issue. Due to limitations, enforcing safe is hard to do in many code that is safe. So you actually don't notice that some stuff are considered/not considered safe/system when they should.
Dec 18 2012
prev sibling next sibling parent Brad Roberts <braddr puremagic.com> writes:
On 12/18/2012 5:58 PM, Jonathan M Davis wrote:
 On Tuesday, December 18, 2012 17:57:50 Brad Roberts wrote:
 On Tue, 18 Dec 2012, Andrei Alexandrescu wrote:
 On 12/18/12 7:29 PM, H. S. Teoh wrote:
 Which right now suffers from some silly things like writefln not being
 able to be made  safe, just because some obscure formatting parameter is
 un safe. Which is exactly how  safe was designed, of course. Except
 that it makes SafeD ... a bit of a letdown, shall we say? - when it
 comes to practical real-world applications.

 (And just to be clear, I'm all for SafeD, but it does still have a ways
 to go.)

Yes, there are several bugs related to SafeD. Andrei

Are the remaining issues at the compiler, runtime, or phobos levels (or what combination of the three)? Are the bugs filed?

Quite a few are, but it wouldn't surprise me at all if there are quite a few which aren't. For instance, AFAIK, no one ever brought up the issue of slicing static arrays being unsafe until just a couple of months ago: http://d.puremagic.com/issues/show_bug.cgi?id=8838 Such operations should be system but are currently considered safe. Who knows how many others we've missed beyond what's currently in bugzilla. - Jonathan M Davis

The part I'm particularly interested in is the compiler layer.
Dec 18 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 19 December 2012 at 01:58:54 UTC, Jonathan M Davis 
wrote:
 Such operations should be  system but are currently considered 
  safe. Who
 knows how many others we've missed beyond what's currently in 
 bugzilla.

 - Jonathan M Davis

Unfortunately fixing these will break existing code, or can the behavior be depreciated? --rt
Dec 18 2012
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Wednesday, 19 December 2012 at 07:14:30 UTC, Rob T wrote:
 On Wednesday, 19 December 2012 at 01:58:54 UTC, Jonathan M 
 Davis wrote:
 Such operations should be  system but are currently considered 
  safe. Who
 knows how many others we've missed beyond what's currently in 
 bugzilla.

 - Jonathan M Davis

Unfortunately fixing these will break existing code, or can the behavior be depreciated?

We *must* take the liberty to fix them; if SafeD is not sound, it's hardly worth its salt. David
Dec 19 2012
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Wednesday, 19 December 2012 at 07:14:30 UTC, Rob T wrote:
 On Wednesday, 19 December 2012 at 01:58:54 UTC, Jonathan M 
 Davis wrote:
 Such operations should be  system but are currently considered 
  safe. Who
 knows how many others we've missed beyond what's currently in 
 bugzilla.

 - Jonathan M Davis

Unfortunately fixing these will break existing code, or can the behavior be depreciated?

The code is already broken. The compiler detecting more faulty code is a plus.
Dec 19 2012
prev sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 19 December 2012 at 13:13:32 UTC, David Nadlinger 
wrote:
 On Wednesday, 19 December 2012 at 07:14:30 UTC, Rob T wrote:
 On Wednesday, 19 December 2012 at 01:58:54 UTC, Jonathan M 
 Davis wrote:
 Such operations should be  system but are currently 
 considered  safe. Who
 knows how many others we've missed beyond what's currently in 
 bugzilla.

 - Jonathan M Davis

Unfortunately fixing these will break existing code, or can the behavior be depreciated?

We *must* take the liberty to fix them; if SafeD is not sound, it's hardly worth its salt. David

Don't get me wrong, I agree that broken behavior must always be fixed and never left in as a "feature". Probably the priority bugs should be the ones where the fix ends up breaking existing code. The sooner these are gotten rid of the better. --rt
Dec 19 2012
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Dec 18, 2012 at 07:08:04PM -0500, Andrei Alexandrescu wrote:
 On 12/18/12 3:35 PM, Walter Bright wrote:
And no, I don't think D can be a systems language *and* eliminate all
undefined and implementation defined behavior.

The SafeD subset takes care of that.

Which right now suffers from some silly things like writefln not being able to be made safe, just because some obscure formatting parameter is un safe. Which is exactly how safe was designed, of course. Except that it makes SafeD ... a bit of a letdown, shall we say? - when it comes to practical real-world applications. (And just to be clear, I'm all for SafeD, but it does still have a ways to go.) T -- Elegant or ugly code as well as fine or rude sentences have something in common: they don't depend on the language. -- Luca De Vitis
Dec 18 2012
prev sibling next sibling parent "DypthroposTheImposter" <mcbracket gmail.com> writes:
  There is Emscripten which compiles LLVM to javascript, so you 
could probably get D into JS like that also

https://github.com/kripken/emscripten
Dec 18 2012
prev sibling next sibling parent Jacob Carlborg <doob me.com> writes:
On 2012-12-18 19:11, Walter Bright wrote:

 Note also that Typescript compiles to Javascript. I suspect there are
 other languages that do so, too.

CoffeeScript and Dart to mention two other languages that compile to JavaScript. -- /Jacob Carlborg
Dec 18 2012
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright wrote:
 An interesting datapoint in regards to bytecode is Javascript. 
 Note that Javascript is not distributed in bytecode form. There 
 is no Javascript VM. It is distributed as source code. 
 Sometimes, that source code is compressed and obfuscated, 
 nevertheless it is still source code.

 How the end system chooses to execute the js is up to that end 
 system, and indeed there are a great variety of methods in use.

 Javascript proves that bytecode is not required for "write 
 once, run everywhere", which was one of the pitches for 
 bytecode.

Well, my experience is more like write once, debug everywhere. For both java AND javascript.
 What is required for w.o.r.e. is a specification for the source 
 code that precludes undefined and implementation defined 
 behavior.

Isn't safeD supposed to provide that (as long as you never go throw trusted code) ?
 Note also that Typescript compiles to Javascript. I suspect 
 there are other languages that do so, too.

Most thing can compile to javascript. The MOTO right now seems to be that if you can do it in javascript someone will. It don't mean someone should.
Dec 18 2012
prev sibling next sibling parent "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Tuesday, 18 December 2012 at 19:25:01 UTC, Walter Bright wrote:
 On 12/18/2012 10:29 AM, Peter Alexander wrote:
 On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright 
 wrote:
 Javascript proves that bytecode is not required for "write 
 once, run
 everywhere", which was one of the pitches for bytecode.

 What is required for w.o.r.e. is a specification for the 
 source code that
 precludes undefined and implementation defined behavior.

Yes, bytecode isn't strictly required, but it's certainly desirable. Bytecode is much easier to interpret, much easier to compile to, and more compact.

Bytecode would have added nothing to js but complexity. I think you're seriously overestimating the cost of compilation.

When I say "easier", I'm talking about implementation cost. Consider how easy it is to write a conforming Java byte code interpreter compared to a conforming Java interpreter/compiler. Parsing and semantic analysis are much easier to get wrong than a byte code spec. At the bytecode level, you don't need to worry about function overloading, symbol tables, variable scoping, type inference, forward references etc. etc. All those things are intentional complexities meant to make life easier for the programmer, not the computer. A bytecode doesn't need them.
 The downside of bytecode is loss of high-level meaning... but 
 that depends on
 the bytecode. There's nothing stopping the bytecode from being 
 a serialised AST
 (actually, that would be ideal).

As I pointed out to Andrei, Java bytecode *is* a serialized AST.

It's not a lossless serialisation -- especially not after optimisation. For example, it's non-trivial to reconstruct the AST of a for loop from optimised bytecode (or even regular bytecode).
Dec 18 2012
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Tuesday, 18 December 2012 at 21:30:04 UTC, Walter Bright wrote:
 D is open source. There is little implementation cost to doing 
 a compiler for it. It's a solved problem.

Let me emit some doubt about that. First, D is difficult to compile because of the compile time features. DMD frontend is not the best piece of software I've seen on an extensibility point of view. Plus, if D is open source in its license, it isn't in its way of doing things. When you drop functionality into master for reason the community isn't even aware of, you don't act like in an open source project. You'll find a huge gap between adopting a license and adopt the cultural switch that is required to benefit from open source. Right now, it is painfully hard to implement a D compiler, for various reasons : - No language spec exists (and dmd, dlang.org and TDPL often contradict each others). - The language spec change constantly. - Sometime by surprise ! - Many behavior now considered as standard are historical dmd quirks, that are hard to reproduce in any implementation not based on dmd. - Nothing can be anticipated because goals are not publics. - Sometime you'll find 2 specs (-property) for the same thing. - Many things are deprecated but it is pretty hard to know which one. - It in unknown how to resolve paradoxes created by compile time features. - I can go on and on. Right now only dmd based front end are production quality, and almost no tooling exists around the language. You'll find very good reasons for that in the points listed above.
 A bytecode requires another spec to be written, and if you 
 think it's easy to make a conformant Java VM bytecode 
 interpreter, think again :-)

Nobody ever said that.
 Yes, it is trivial. The only thing that is lost are local 
 variable names and comments.

You'll find tools that compact your whole project, loosing in the process all names.
Dec 18 2012
prev sibling next sibling parent reply "F i L" <witte2008 gmail.com> writes:
Without bytecode, the entire compiler becomes a dependency of a 
AOT/JIT compiled program.. not only does bytecode allow for 
faster on-site compilations, it also means half the compiler can 
be stripped away (so i'm told, i'm not claiming to be an expert 
here).

I'm actually kinda surprised there hasn't been more of a AOT/JIT 
compiling push within the D community.. D's the best there is at 
code specialization, but half of that battle seems to be hardware 
specifics only really known on-site... like SIMD for example. 
I've been told many game companies compile against SIMD 3.1 
because that's the base-line x64 instruction set. If you could 
query the hardware post-distribution (vs pre-distribution) 
without any performance loss or code complication (to the 
developer), that would be incredibly idea. (ps. I acknowledge 
that this would probably _require_ the full compiler, so there's 
probably not be to much value in a D-bytecode).

The D compiler is small enough for distribution I think (only 
~10mb compressed?), but the back-end license restricts it right?
Dec 18 2012
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/18/2012 11:04 PM, Rob T wrote:
 I'm not claiming to be an expert in this area either, however it seems obvious
 that there are significant theoretical and practical advantages with using the
 bytecode concept.

Evidently you've dismissed all of my posts in this thread on that topic :-)
Dec 18 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/19/2012 12:19 AM, Max Samukha wrote:
 Evidently you've dismissed all of my posts in this thread on that topic :-)


And I gave detailed reasons why.
 Such as it being a
 standardized AST representation for multiple languages. CLI is all about that,
 which is reflected in its name. LLVM is used almost exclusively for that
purpose
 (clang is great).

My arguments were all based on the idea of distributing "compiled" source code in bytecode format. The idea of using some common intermediate format to tie together multiple front ends and multiple back ends is something completely different. And, surprise (!), I've done that, too. The original C compiler I wrote for many years was a multipass affair, that communicated the data from one pass to the next via an intermediate file. I was forced into such a system because DOS just didn't have enough memory to combine the passes. I dumped it when more memory became available, as it was the source of major slowdowns in the compilation process. Note that such a system need not be *bytecode* at all, it can just hand the data structure off from one pass to the next. In fact, an actual bytecode requires a serialization of the data structures and then a reconstruction of them - rather pointless.
 Not advocating bytecode here but you claiming it is completely useless is so
 D-ish :).

I'm not without experience doing everything bytecode is allegedly good at. As for CLI, it is great for implementing C#. For other languages, not so much. There turned out to be no way to efficiently represent D slices in it, for example.
Dec 19 2012
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/19/2012 1:10 AM, Rob T wrote:
 Using the JS code as an example, you are stating that the JS source code itself
 could just as well be viewed as the "bytecode", and therefore given what I
 previously wrote concerning the "advantages", I could replace "bytecode" with
 "JS source code" and achieve the exact same result. Am I Correct?

Yes.
 I thought that transforming source code into bytecode was an optimization
 technique intended to improve interpretation performance while preserving
 portability across architectures, i.e., the bytecode language was designed
 specifically to improve interpretation performance - but you say that the costs
 of performing the transformations from a high-level language into the optimized
 bytecode language far outweigh the advantages of leaving it as-is, i.e.,
 whatever performance gains you get through the transformation is not
significant
 enough to justify the costs of performing the transformation.

 Is my understanding of your POV correct?

Mostly. If you use bytecode, you have Yet Another Spec that has to be defined and conformed to. This has a lot of costs.
 What I'm having trouble understanding is this:

 If the intention of something like the Java VM was to create a portable
 virtualized machine that could be used to execute any language, then would it
 not make sense to select a common bytecode language that was optimized for
 execution performance, rather than using another common language that was not
 specifically designed for that purpose?

Java as we know it evolved from a language that (as I recall) used bytecode to run on embedded systems of very low power. This use failed, and Java was re-imagined to be a network language that transmitted the bytecode over the internet. The rest was attempts to justify the investment in bytecode, or perhaps it simply never occurred to the Java engineers that the bytecode was redundant. (Bytecode can make sense on 8 bit machines where the target machine simply has no capacity to run even a simple compiler. Such underpowered machines were obsolete by the time Java was developed, but the old ideas died hard.)
 Do you have a theory or insight that can explain why a situation like the Java
 bytecode VM came to be and why it persists despite your suggestion that it is
 not required or of enough advantage to justify using it (may as well use Java
 source directly)?

Consider the US space shuttle design. It's probably the most wrong-headed engineering design ever, and it persisted because too many billions of dollars and careers were invested into it. Nobody could admit that it was an extremely inefficient and rather crazy design. A couple NASA engineers have admitted to me privately that they knew this, but to keep their careers they kept their mouths shut.
Dec 19 2012
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/19/12 4:25 AM, Walter Bright wrote:
 On 12/19/2012 1:10 AM, Rob T wrote:
 Using the JS code as an example, you are stating that the JS source
 code itself
 could just as well be viewed as the "bytecode", and therefore given
 what I
 previously wrote concerning the "advantages", I could replace
 "bytecode" with
 "JS source code" and achieve the exact same result. Am I Correct?

Yes.

I thought the claim was about ASTs vs. bytecode, which slowly segued into source code vs. byte code. Are e in agreement there is a cost of translating JS source code to AST format? (The cost may be negligible to some applications but it's there.) There's also the serialization aspect. Serializing and deserializing an AST takes extra effort because pointers must be fixed. Bytecode can be designed to avoid most of that cost. On these two accounts alone, one may as well choose bytecode if it ever needs to be read and written. Defining a strategy for pointer serialization is comparable work.
 Do you have a theory or insight that can explain why a situation like
 the Java
 bytecode VM came to be and why it persists despite your suggestion
 that it is
 not required or of enough advantage to justify using it (may as well
 use Java
 source directly)?

Consider the US space shuttle design. It's probably the most wrong-headed engineering design ever, and it persisted because too many billions of dollars and careers were invested into it. Nobody could admit that it was an extremely inefficient and rather crazy design. A couple NASA engineers have admitted to me privately that they knew this, but to keep their careers they kept their mouths shut.

That's not answering the question. Andrei
Dec 19 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/19/2012 7:44 AM, Andrei Alexandrescu wrote:
 I thought the claim was about ASTs vs. bytecode, which slowly segued into
source
 code vs. byte code.

Originally, the claim was how modules should be imported in some binary format rather than as source code.
 Are e in agreement there is a cost of translating JS source
 code to AST format? (The cost may be negligible to some applications but it's
 there.)

There is a cost, and it is a killer if you've got an 8 bit CPU with a 2K EPROM as a target. This is no longer relevant.
 There's also the serialization aspect. Serializing and deserializing an AST
 takes extra effort because pointers must be fixed. Bytecode can be designed to
 avoid most of that cost.

Bytecode does not avoid that cost, in fact, bytecode *is* a serialized AST format. (And, btw, the first thing a JIT compiler does is convert bytecode back into an AST.)
 On these two accounts alone, one may as well choose bytecode if it ever needs
to
 be read and written. Defining a strategy for pointer serialization is
comparable
 work.

You're not saving anything.
 That's not answering the question.

Analogies are legitimate answers to questions about motivation.
Dec 19 2012
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/19/2012 2:05 AM, eles wrote:
 Consider the US space shuttle design. It's probably the most wrong-headed
 engineering design ever, and it persisted because too many billions of dollars
 and careers were invested into it. Nobody could admit that it was an extremely
 inefficient and rather crazy design.

Hey, this is really OT, but I'm interested in. Why do you consider it is such a bad design? Because the shuttle is intended to be reentrant and this is costly? Some other issue? Is about the design or about the entire idea?

It boils down to the overriding expense in spaceflight is weight. There's the notion of "payload", which is the weight of whatever does something useful in space - the whole point of the mission. Every bit of weight adds a great deal of more weight in terms of cost to push it all into orbit. To make the shuttle return and land, you've got wings, rudder, landing gear, flight control system, basically a huge amount of weight devoted to that. That weight subtracts from what you can push up as payload. All of the lifting capability for that also must be insanely reliable. (And never mind needing things like a custom 747 to transport it around because it's too big to go on the roads, all that money spent trying to make a reusable heat shield, etc.) Now consider the only thing that actually has to return are the astronauts. And all they actually need to return is a heatshield and a parachute - i.e. an Apollo capsule. Thinking about it from basic principles, you need: 1. astronauts 2. payload 3. a way to get the astronauts back So the idea then is to have two launches. 1. an insanely reliable rocket to push the astronauts up, and nothing else 2. a less reliable (and hence cheap) heavy lift rocket to push the payload up The two launches dock in space, astronauts do their job, astronauts return via their Apollo-style capsule. Mission accomplished at far, far less cost.
Dec 19 2012
prev sibling next sibling parent reply David Gileadi <gileadis NSPMgmail.com> writes:
On 12/19/12 3:05 AM, eles wrote:
 Consider the US space shuttle design. It's probably the most
 wrong-headed engineering design ever, and it persisted because too
 many billions of dollars and careers were invested into it. Nobody
 could admit that it was an extremely inefficient and rather crazy design.

Hey, this is really OT, but I'm interested in. Why do you consider it is such a bad design? Because the shuttle is intended to be reentrant and this is costly? Some other issue? Is about the design or about the entire idea? Thanks for answering and I promise to not further hijack this thread.

I had the same question, and Google found me a 2003 article http://www.spacedaily.com/news/oped-03l.html which in the wake of Columbia is largely about safety but also about efficiency. Interestingly the article claims that the shuttle flaws were largely the result of a) the desire to carry large payloads along with astronauts (as Walter mentions) and b) the choice of fuel, which led to several other expensive and dangerous design choices.
Dec 19 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/19/2012 4:09 PM, Rob T wrote:
 As always the answer is never as simple as it seems (just as it is with
bytecode
 if I'm to attempt to stay on topic). One of subgoals of the space shuttle was
 for it to be able to return not just people back, but also to capture and
return
 back to earth an orbiting payload. It also carnied along instrumentation such
as
 the Canadarm, a very expensive device that you normally would not want to throw
 away. The arm was used for deploying the payload and also for performing repair
 work. It is hard to imagine a throw away rocket booster approach meeting all of
 these design goals, and I'm leaving out other abilities you cannot get from a
 simple return capsule approach.

I find it hard to believe that the Canadarm cost more than wings, landing gear, a custom 747, etc. (That custom 747 probably cost a cool billion all by itself.) Secondly, the cost of the Canadarm consists of two parts: engineering design, and construction. Once the design is done, the incremental construction cost of making multiple ones is way, way, way cheaper. As for returning an orbiting payload, has that ever happened? And still, one could launch a shell with a heatshield and parachute on it, put that payload into the shell, and drop it into the atmosphere. I can see needing to return spy satellites with their film canisters, but film is hopelessly obsolete now, and I can't see any such satellites these days. The shuttle concept was so expensive that it severely stunted what we could do in space, and finally sank the whole manned space program.
Dec 19 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/19/2012 4:54 PM, Rob T wrote:
 One question I have for you, is what percentage performance gain can you expect
 to get by using a well chosen bytecode-like language verses interpreting
 directly from source code?

I know of zero claims that making a bytecode standard for javascript will improve performance.
 The other question, is are there better alternative techniques? For example,
 compiling regular source directly to native using a JIT approach. In many ways,
 this seems like the very best approach, which I suppose is precisely what
you've
 been arguing about all this time. So perhaps I've managed to convince myself
 that you are indeed correct. I'll take that stance and see if it sticks.

Not exactly, I argue that having a bytecode standard is useless. How a compiler works internally is fairly irrelevant.
Dec 19 2012
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2012 1:30 PM, deadalnix wrote:
 Note that in the first place, bytecode discussion has started with the need of
 provide a CTFEable module that do not contains more information that what is in
 a DI file, as it is a concern for some companies.

 Bytecode can solve that problem nicely IMO. You mentioned that DI is superior
 here, but I don't really understand how.

No, it doesn't solve that problem at all. I explained why repeatedly.
Dec 20 2012
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2012 10:05 PM, deadalnix wrote:
 On Friday, 21 December 2012 at 05:43:18 UTC, Walter Bright wrote:
 On 12/20/2012 1:30 PM, deadalnix wrote:
 Note that in the first place, bytecode discussion has started with the need of
 provide a CTFEable module that do not contains more information that what is in
 a DI file, as it is a concern for some companies.

 Bytecode can solve that problem nicely IMO. You mentioned that DI is superior
 here, but I don't really understand how.

No, it doesn't solve that problem at all. I explained why repeatedly.

No you explained that java's bytecode doesn't solve that problem. Which is quite different.

Please reread all of my messages in the thread. I addressed this.
Dec 20 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/20/2012 10:05 PM, deadalnix wrote:
 No you explained that java's bytecode doesn't solve that problem. Which is
quite
 different.

I did, but obviously you did not find that satisfactory. Let me put it this way: Design a bytecode format, and present it here, that is CTFEable and is not able to be automatically decompiled.
Dec 20 2012
next sibling parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 21.12.2012 08:02, Walter Bright wrote:
 On 12/20/2012 10:05 PM, deadalnix wrote:
 No you explained that java's bytecode doesn't solve that problem.
 Which is quite
 different.

I did, but obviously you did not find that satisfactory. Let me put it this way: Design a bytecode format, and present it here, that is CTFEable and is not able to be automatically decompiled.

Sorry, can't resist: How about feeding the x86 machine byte code (including some fixup information) into an interpreter? Maybe not realistic, but a data point in the field of possible "byte codes". The interpreter might even enjoy hardware support ;-) That might not cover all possible architectures, but if the distributed library is compiled for one platform only, CTFEing against another won't make much sense anyway.
Dec 21 2012
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 12/21/2012 09:37 AM, Rainer Schuetze wrote:
 On 21.12.2012 08:02, Walter Bright wrote:
 On 12/20/2012 10:05 PM, deadalnix wrote:
 No you explained that java's bytecode doesn't solve that problem.
 Which is quite
 different.

I did, but obviously you did not find that satisfactory. Let me put it this way: Design a bytecode format, and present it here, that is CTFEable and is not able to be automatically decompiled.

Sorry, can't resist: How about feeding the x86 machine byte code (including some fixup information) into an interpreter? Maybe not realistic,

http://bellard.org/jslinux/
 but a data point in the field of possible "byte codes". The
 interpreter might even enjoy hardware support ;-)

Direct hardware support is not achievable because CTFE needs to be pure and safe.
 That might not cover all possible architectures, but if the distributed
 library is compiled for one platform only, CTFEing against another won't
 make much sense anyway.

Dec 21 2012
parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 21.12.2012 10:20, Timon Gehr wrote:
 On 12/21/2012 09:37 AM, Rainer Schuetze wrote:
 On 21.12.2012 08:02, Walter Bright wrote:
 On 12/20/2012 10:05 PM, deadalnix wrote:
 No you explained that java's bytecode doesn't solve that problem.
 Which is quite
 different.

I did, but obviously you did not find that satisfactory. Let me put it this way: Design a bytecode format, and present it here, that is CTFEable and is not able to be automatically decompiled.

Sorry, can't resist: How about feeding the x86 machine byte code (including some fixup information) into an interpreter? Maybe not realistic,

http://bellard.org/jslinux/

Incredible ;-)
 but a data point in the field of possible "byte codes". The
 interpreter might even enjoy hardware support ;-)

Direct hardware support is not achievable because CTFE needs to be pure and safe.

True, you would have to trust the library code not to do unpure/unsafe operations. Some of this might be verifiable, e.g. not allowing fixups into mutable global memory.
 That might not cover all possible architectures, but if the distributed
 library is compiled for one platform only, CTFEing against another won't
 make much sense anyway.


Dec 21 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/21/2012 12:37 AM, Rainer Schuetze wrote:
 On 21.12.2012 08:02, Walter Bright wrote:
 On 12/20/2012 10:05 PM, deadalnix wrote:
 No you explained that java's bytecode doesn't solve that problem.
 Which is quite
 different.

I did, but obviously you did not find that satisfactory. Let me put it this way: Design a bytecode format, and present it here, that is CTFEable and is not able to be automatically decompiled.

Sorry, can't resist: How about feeding the x86 machine byte code (including some fixup information) into an interpreter? Maybe not realistic, but a data point in the field of possible "byte codes". The interpreter might even enjoy hardware support ;-)

Not going to work, as CTFE needs type information. CTFE needs to interact with the current symbols and types in the compilation unit. Just think about what you'd need to do to get CTFE to read the object file for a module and try to execute the code in it, feeding it data and types and symbols from the current compilation unit? Consider: add EAX,37 mov [EAX],EBX what the heck is EAX pointing at?
Dec 21 2012
parent reply Rainer Schuetze <r.sagitario gmx.de> writes:
On 21.12.2012 11:28, Walter Bright wrote:
 On 12/21/2012 12:37 AM, Rainer Schuetze wrote:
 On 21.12.2012 08:02, Walter Bright wrote:
 On 12/20/2012 10:05 PM, deadalnix wrote:
 No you explained that java's bytecode doesn't solve that problem.
 Which is quite
 different.

I did, but obviously you did not find that satisfactory. Let me put it this way: Design a bytecode format, and present it here, that is CTFEable and is not able to be automatically decompiled.

Sorry, can't resist: How about feeding the x86 machine byte code (including some fixup information) into an interpreter? Maybe not realistic, but a data point in the field of possible "byte codes". The interpreter might even enjoy hardware support ;-)

Not going to work, as CTFE needs type information. CTFE needs to interact with the current symbols and types in the compilation unit. Just think about what you'd need to do to get CTFE to read the object file for a module and try to execute the code in it, feeding it data and types and symbols from the current compilation unit? Consider: add EAX,37 mov [EAX],EBX what the heck is EAX pointing at?

I think you don't need to care. The CPU can execute it as well without type information. If the data layout of the interpreter values is the same as for the interpreted architecture, all you need to know is the calling convention and the types of the arguments to the function to be executed and the return type. I'd intercept calls to other functions because the interpreter might want to replace them with non-native versions (e.g. new or functions where the source code exists). The types of the data passed when executing these calls are known as well.
Dec 21 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/21/2012 3:52 AM, Rainer Schuetze wrote:
 I think you don't need to care. The CPU can execute it as well without type
 information.
 If the data layout of the interpreter values is the same as for the interpreted
 architecture, all you need to know is the calling convention and the types of
 the arguments to the function to be executed and the return type.

CPU instructions are as unportable as you can get. All type information is lost, as well as all information as to where the values for things come from. Hence, such a format has dependencies on every module and switch it was compiled with, dependencies that cannot be accounted for if they change. It cannot be inlined, no inferences can be made as to purity, and it's hard to see how CTFE could determine if a particular path through that code is supported or not by the CTFE. In fact, it is useless as a means of importing module information - you might as well just link to that object code at link time.
Dec 21 2012
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/21/2012 12:07 PM, jerro wrote:
 Optimized LLVM bytecode look like a good candidate for the job. Note that I'm
 not suggesting this as a spec, but as an example of possible solution.

It's true that it couldn't be automatically decompiled to something equivalent to the original D source, but it does contain type information. Its human readable form (llvm assembly language) is easier to understand than assembly.

I haven't looked at the format, but if it's got type information, that goes quite a long way towards supporting automatic decompilation.
Dec 21 2012
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/21/2012 2:13 AM, Max Samukha wrote:
 What Walter is wrong about is that bytecode is entirely pointless.

I'll bite. What is its advantage over source code?
Dec 21 2012
parent Walter Bright <newshound2 digitalmars.com> writes:
On 12/21/2012 2:37 AM, Araq wrote:
 On Friday, 21 December 2012 at 10:30:21 UTC, Walter Bright wrote:
 On 12/21/2012 2:13 AM, Max Samukha wrote:
 What Walter is wrong about is that bytecode is entirely pointless.

I'll bite. What is its advantage over source code?

Interpreting the AST directly: Requires recursion. Interpreting a (stack based) bytecode: Does not require recursion. That's what an AST to bytecode tranformation does; it eliminates the recursion. And that is far from being useless.

Sorry, I don't get this at all. Every bytecode scheme I've seen had a stack and recursion. Furthermore, that's not an argument that transmission of code (and importation of modules) is better done as bytecode than source code.
Dec 21 2012
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 12/19/12 4:10 AM, Rob T wrote:
 Do you have a theory or insight that can explain why a situation like
 the Java bytecode VM came to be and why it persists despite your
 suggestion that it is not required or of enough advantage to justify
 using it (may as well use Java source directly)?

I think the important claim here is that an AST and a bytecode have the same "power". Clearly to parse source code into AST form has a cost, which is clearly understood by everyone in this discussion. Andrei
Dec 19 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 19 December 2012 at 01:09:14 UTC, F i L wrote:
 Without bytecode, the entire compiler becomes a dependency of a 
 AOT/JIT compiled program.. not only does bytecode allow for 
 faster on-site compilations, it also means half the compiler 
 can be stripped away (so i'm told, i'm not claiming to be an 
 expert here).

 I'm actually kinda surprised there hasn't been more of a 
 AOT/JIT compiling push within the D community.. D's the best 
 there is at code specialization, but half of that battle seems 
 to be hardware specifics only really known on-site... like SIMD 
 for example. I've been told many game companies compile against 
 SIMD 3.1 because that's the base-line x64 instruction set. If 
 you could query the hardware post-distribution (vs 
 pre-distribution) without any performance loss or code 
 complication (to the developer), that would be incredibly idea. 
 (ps. I acknowledge that this would probably _require_ the full 
 compiler, so there's probably not be to much value in a 
 D-bytecode).

 The D compiler is small enough for distribution I think (only 
 ~10mb compressed?), but the back-end license restricts it right?

I'm not claiming to be an expert in this area either, however it seems obvious that there are significant theoretical and practical advantages with using the bytecode concept. My understanding is that with byte code and a suitable VM to process it, one can abstract away the underlying high-level language that was used to produce the byte code, therefore it is possible to use alternate high level languages with front-ends that compile to the same common bytecode instruction set. This is exactly the same as what is being done with the D front end, and other front ends for the GCC, except for the difference that the machine code produced needs a physical cpu to process it, and there is no machine code instruction set that is common across all architectures. Effectively, the bytecode serves as the common native machine code for a standardized virtualized cpu (the VM) and the VM can sit on top of any given architecture (more or less). Of course there are significant execution inefficiencies with this method, however bytecode can be compiled into native code - keeping in mind that you did not have to transport whatever the high level language was that was compiled into the byte code for this to be possible. So in summary, the primary purpose of byte code is to serve as an intermediate common language that can be run directly on a VM, or compiled directly into native machine code. There's no need to transport or even know what language was used for producing the byte code. As a reminder, this is what "my understanding is", which may be incorrect in one or more areas, so if I'm wrong, I'd like to be corrected. Thanks --rt
Dec 18 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Tuesday, 18 December 2012 at 18:11:37 UTC, Walter Bright wrote:
 An interesting datapoint in regards to bytecode is Javascript. 
 Note that Javascript is not distributed in bytecode form. There 
 is no Javascript VM. It is distributed as source code. 
 Sometimes, that source code is compressed and obfuscated, 
 nevertheless it is still source code.

 How the end system chooses to execute the js is up to that end 
 system, and indeed there are a great variety of methods in use.

 Javascript proves that bytecode is not required for "write 
 once, run everywhere", which was one of the pitches for 
 bytecode.

 What is required for w.o.r.e. is a specification for the source 
 code that precludes undefined and implementation defined 
 behavior.

 Note also that Typescript compiles to Javascript. I suspect 
 there are other languages that do so, too.

True, however JavaScript's case is similar to C. Many compilers make use of C as an high level assembler and JavaScript, like it or not, is the C of Internet. -- Paulo
Dec 18 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Wednesday, 19 December 2012 at 07:22:45 UTC, Walter Bright 
wrote:
 On 12/18/2012 11:04 PM, Rob T wrote:
 I'm not claiming to be an expert in this area either, however 
 it seems obvious
 that there are significant theoretical and practical 
 advantages with using the
 bytecode concept.

Evidently you've dismissed all of my posts in this thread on that topic :-)

As you dismissed all points in favor of bytecode. Such as it being a standardized AST representation for multiple languages. CLI is all about that, which is reflected in its name. LLVM is used almost exclusively for that purpose (clang is great). Not advocating bytecode here but you claiming it is completely useless is so D-ish :).
Dec 19 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 19 December 2012 at 07:22:45 UTC, Walter Bright 
wrote:
 On 12/18/2012 11:04 PM, Rob T wrote:
 I'm not claiming to be an expert in this area either, however 
 it seems obvious
 that there are significant theoretical and practical 
 advantages with using the
 bytecode concept.

Evidently you've dismissed all of my posts in this thread on that topic :-)

I really am trying to understand your POV, but I'm having a difficult time with the point concerning performance. Using the JS code as an example, you are stating that the JS source code itself could just as well be viewed as the "bytecode", and therefore given what I previously wrote concerning the "advantages", I could replace "bytecode" with "JS source code" and achieve the exact same result. Am I Correct? I will agree that the bytecode could be encoded as JS (or as another language) and used as a common base for interpretation or compilation to machine code. I can also agree that other languages can be "compiled" into the common "bytecode" language provided that it is versatile enough, so from that POV I will agree that you are correct. I thought that transforming source code into bytecode was an optimization technique intended to improve interpretation performance while preserving portability across architectures, i.e., the bytecode language was designed specifically to improve interpretation performance - but you say that the costs of performing the transformations from a high-level language into the optimized bytecode language far outweigh the advantages of leaving it as-is, i.e., whatever performance gains you get through the transformation is not significant enough to justify the costs of performing the transformation. Is my understanding of your POV correct? What I'm having trouble understanding is this: If the intention of something like the Java VM was to create a portable virtualized machine that could be used to execute any language, then would it not make sense to select a common bytecode language that was optimized for execution performance, rather than using another common language that was not specifically designed for that purpose? Do you have a theory or insight that can explain why a situation like the Java bytecode VM came to be and why it persists despite your suggestion that it is not required or of enough advantage to justify using it (may as well use Java source directly)? --rt
Dec 19 2012
prev sibling next sibling parent "eles" <eles eles.com> writes:
 Consider the US space shuttle design. It's probably the most 
 wrong-headed engineering design ever, and it persisted because 
 too many billions of dollars and careers were invested into it. 
 Nobody could admit that it was an extremely inefficient and 
 rather crazy design.

Hey, this is really OT, but I'm interested in. Why do you consider it is such a bad design? Because the shuttle is intended to be reentrant and this is costly? Some other issue? Is about the design or about the entire idea? Thanks for answering and I promise to not further hijack this thread.
Dec 19 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Wednesday, 19 December 2012 at 08:45:20 UTC, Walter Bright 
wrote:
 On 12/19/2012 12:19 AM, Max Samukha wrote:
 Evidently you've dismissed all of my posts in this thread on 
 that topic :-)


And I gave detailed reasons why.
 Such as it being a
 standardized AST representation for multiple languages. CLI is 
 all about that,
 which is reflected in its name. LLVM is used almost 
 exclusively for that purpose
 (clang is great).

My arguments were all based on the idea of distributing "compiled" source code in bytecode format. The idea of using some common intermediate format to tie together multiple front ends and multiple back ends is something completely different. And, surprise (!), I've done that, too. The original C compiler I wrote for many years was a multipass affair, that communicated the data from one pass to the next via an intermediate file. I was forced into such a system because DOS just didn't have enough memory to combine the passes. I dumped it when more memory became available, as it was the source of major slowdowns in the compilation process. Note that such a system need not be *bytecode* at all, it can just hand the data structure off from one pass to the next. In fact, an actual bytecode requires a serialization of the data structures and then a reconstruction of them - rather pointless.

I understand that but can not fully agree. The problem is the components of such a system are distributed and not binary-compatible. The data structures are intended to be transferred over a stream and you *have* to serialize at one end and deserialize at the the other. For example, we serialize a D host app and a C library into portable pnacl bitcode and transfer it to Chrome for compilation and execution. There is no point in having C, D (or whatever other languages people are going to invent) front-ends on the receiving side. The same applies to JS - people "serialize" ASTs generated from, say, a CoffeeScript source into JS, transfer that to the browser, which "deserializes" JS into an internal AST representation. Note that I am not arguing that bytecode is the best kind of standard AST representation. I am arguing that there *is* a point in such serialized representation. Hence, your claim that ILs are *completely* useless is not quite convincing. When we have a single God language (I wouldn't object if it were D but it is not yet ;)), then there would be no need in complications like ILs.
 Not advocating bytecode here but you claiming it is completely 
 useless is so
 D-ish :).

I'm not without experience doing everything bytecode is allegedly good at.

I am not doubting your experience but that might be an authoritative argument.
 As for CLI, it is great for implementing C#. For other 
 languages, not so much. There turned out to be no way to 
 efficiently represent D slices in it, for example.

That is the limitation of CLI, not the concept. LLVM does not have that problem.
Dec 19 2012
prev sibling next sibling parent "foobar" <foo bar.com> writes:
On Wednesday, 19 December 2012 at 08:45:20 UTC, Walter Bright 
wrote:
 On 12/19/2012 12:19 AM, Max Samukha wrote:
 Evidently you've dismissed all of my posts in this thread on 
 that topic :-)


And I gave detailed reasons why.
 Such as it being a
 standardized AST representation for multiple languages. CLI is 
 all about that,
 which is reflected in its name. LLVM is used almost 
 exclusively for that purpose
 (clang is great).

My arguments were all based on the idea of distributing "compiled" source code in bytecode format. The idea of using some common intermediate format to tie together multiple front ends and multiple back ends is something completely different. And, surprise (!), I've done that, too. The original C compiler I wrote for many years was a multipass affair, that communicated the data from one pass to the next via an intermediate file. I was forced into such a system because DOS just didn't have enough memory to combine the passes. I dumped it when more memory became available, as it was the source of major slowdowns in the compilation process. Note that such a system need not be *bytecode* at all, it can just hand the data structure off from one pass to the next. In fact, an actual bytecode requires a serialization of the data structures and then a reconstruction of them - rather pointless.
 Not advocating bytecode here but you claiming it is completely 
 useless is so
 D-ish :).

I'm not without experience doing everything bytecode is allegedly good at. As for CLI, it is great for implementing C#. For other languages, not so much. There turned out to be no way to efficiently represent D slices in it, for example.

There other part of an intermediate representation which you ignored is attaching *multiple backends* which is important for portability and the web. Applications could be written in safeD (a subset that is supposed to have no implementation defined or undefined behaviors) and compiled to such an intermediate representation (let's call it common-IR since you don't like "bytecode"). Now, each client platform has its own backend for our common-IR. We can have install-time compilation like in .NET or JIT as in Java or both, or maybe some other such method. Having such format allows to add distribution to the system. The serialization and de-serialization is only pointless when done on the same machine. Another usecase could be a compilation server - we can put only the front-end on client machines and do the optimizations and native code generation on the server. This can be used for example in a browser to allow D scripting. Think for instant about smart-phone browsers. the dreaded "bytecode" helps to solve all those use cases.
Dec 19 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 19 December 2012 at 10:55:43 UTC, foobar wrote:
 There other part of an intermediate representation which you 
 ignored is attaching *multiple backends* which is important for 
 portability and the web.
 Applications could be written in safeD (a subset that is 
 supposed to have no implementation defined or undefined 
 behaviors) and compiled to such an intermediate representation 
 (let's call it common-IR since you don't like "bytecode"). Now, 
 each client platform has its own backend for our common-IR. We 
 can have install-time compilation like in .NET or JIT as in 
 Java or both, or maybe some other such method.
 Having such format allows to add distribution to the system. 
 The serialization and de-serialization is only pointless when 
 done on the same machine.

 Another usecase could be a compilation server - we can put only 
 the front-end on client machines and do the optimizations and 
 native code generation on the server. This can be used for 
 example in a browser to allow D scripting. Think for instant 
 about smart-phone browsers.

 the dreaded "bytecode" helps to solve all those use cases.

Imagine if the compiler were built in a user extensible way, such that one could write a plugin module that outputs the compiled code in the form of JVM bytecode which could run directly on the JVM. It doesn't really matter if Walter is correct or not concerning his views, what matters is that people want X, Y and Z, and no matter how silly it may seem to some people, the silliness is never going to change. Why fight it, why not do ourselves a favor and embrace it. We don't have to do anything silly, just provide the means to allow people to do what they want to do. Most of it will be bad, which is the case right now and always will be, but every so often someone will do something brilliant. Why not make D the platform of choice, meaning you really do have a choice? --rt
Dec 19 2012
prev sibling next sibling parent "ixid" <nuaccount gmail.com> writes:
On Wednesday, 19 December 2012 at 21:00:20 UTC, Walter Bright 
wrote:
 On 12/19/2012 2:05 AM, eles wrote:
 Consider the US space shuttle design. It's probably the most 
 wrong-headed
 engineering design ever, and it persisted because too many 
 billions of dollars
 and careers were invested into it. Nobody could admit that it 
 was an extremely
 inefficient and rather crazy design.

Hey, this is really OT, but I'm interested in. Why do you consider it is such a bad design? Because the shuttle is intended to be reentrant and this is costly? Some other issue? Is about the design or about the entire idea?

It boils down to the overriding expense in spaceflight is weight. There's the notion of "payload", which is the weight of whatever does something useful in space - the whole point of the mission. Every bit of weight adds a great deal of more weight in terms of cost to push it all into orbit. To make the shuttle return and land, you've got wings, rudder, landing gear, flight control system, basically a huge amount of weight devoted to that. That weight subtracts from what you can push up as payload. All of the lifting capability for that also must be insanely reliable. (And never mind needing things like a custom 747 to transport it around because it's too big to go on the roads, all that money spent trying to make a reusable heat shield, etc.) Now consider the only thing that actually has to return are the astronauts. And all they actually need to return is a heatshield and a parachute - i.e. an Apollo capsule. Thinking about it from basic principles, you need: 1. astronauts 2. payload 3. a way to get the astronauts back So the idea then is to have two launches. 1. an insanely reliable rocket to push the astronauts up, and nothing else 2. a less reliable (and hence cheap) heavy lift rocket to push the payload up The two launches dock in space, astronauts do their job, astronauts return via their Apollo-style capsule. Mission accomplished at far, far less cost.

The shuttle was originally intended to be a lot smaller and sit atop the central booster, avoiding the issues that caused the Columbia disaster. I believe that design may have been intended to operate in the manner you suggest, however the CIA demanded that the shuttle be made much larger to accommodate large military satellites, distorting the design and making it a lot less efficient.
Dec 19 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 19 December 2012 at 21:24:46 UTC, David Gileadi 
wrote:
 I had the same question, and Google found me a 2003 article
 http://www.spacedaily.com/news/oped-03l.html
 which in the wake of Columbia is largely about safety but also 
 about efficiency. Interestingly the article claims that the 
 shuttle flaws were largely the result of a) the desire to carry 
 large payloads along with astronauts (as Walter mentions) and 
 b) the choice of fuel, which led to several other expensive and 
 dangerous design choices.

As always the answer is never as simple as it seems (just as it is with bytecode if I'm to attempt to stay on topic). One of subgoals of the space shuttle was for it to be able to return not just people back, but also to capture and return back to earth an orbiting payload. It also carnied along instrumentation such as the Canadarm, a very expensive device that you normally would not want to throw away. The arm was used for deploying the payload and also for performing repair work. It is hard to imagine a throw away rocket booster approach meeting all of these design goals, and I'm leaving out other abilities you cannot get from a simple return capsule approach. A mistake would be to use the shuttle for purposes that it was not suitable for, such as situations that did not need its unique abilities and could be done more cheaply. --rt
Dec 19 2012
prev sibling next sibling parent "Rob T" <rob ucora.com> writes:
On Wednesday, 19 December 2012 at 09:25:54 UTC, Walter Bright 
wrote:
 Mostly. If you use bytecode, you have Yet Another Spec that has 
 to be defined and conformed to. This has a lot of costs.

But those are mostly one-time costs, and for software that has to run millions of times over, if there are enough performance gains by first compiling to bytecode, it could be worth the costs over the long term. If there may be better methods of producing he same or better results that are not strictly bytecode, then that's another story, however one goal is to have a common language that amalgamates everything together under a common roof. One question I have for you, is what percentage performance gain can you expect to get by using a well chosen bytecode-like language verses interpreting directly from source code? The other question, is are there better alternative techniques? For example, compiling regular source directly to native using a JIT approach. In many ways, this seems like the very best approach, which I suppose is precisely what you've been arguing about all this time. So perhaps I've managed to convince myself that you are indeed correct. I'll take that stance and see if it sticks. BTW I'm not a fan of interpreted languages, except for situations where you want to transport code in the form of data, or be able to store it for later portable execution. LUA embedded into a game engine is an good use case example (although why not D!). --rt
Dec 19 2012
prev sibling next sibling parent "Peter Sommerfeld" <noreply rubrica.at> writes:
Am 20.12.2012, 01:54 Uhr, schrieb Rob T <rob ucora.com>:

 I'm not a fan of interpreted languages, except for situationswhere you  
 want to transport code in the form of data, or beable to store it for  
 later portable execution. LUA embeddedinto  a game engine is an good use  
 case example (although why
  not D!).

Because you need a D-Programmer to program in D. ;) Scripting languages like Lua reduce the complexity of programming to fit the needs of its users, which are often often not programmer. There is a lot more needed to programm in D then in Lua. BTW: LuaJIT uses the source code, not Luas byte code. Peter
Dec 19 2012
prev sibling next sibling parent "eles" <eles eles.com> writes:
 The shuttle concept was so expensive that it severely stunted 
 what we could do in space, and finally sank the whole manned 
 space program.

Thank you to all of you that expressed viewpoints on this issue. I found the discussion valuable and reasonable arguments were made (both sides). Anyway, it is too off-topic, so I consider this hijack ended.
Dec 20 2012
prev sibling next sibling parent "Joakim" <joakim airpost.net> writes:
On Thursday, 20 December 2012 at 01:41:38 UTC, Walter Bright 
wrote:
 On 12/19/2012 4:54 PM, Rob T wrote:
 One question I have for you, is what percentage performance 
 gain can you expect
 to get by using a well chosen bytecode-like language verses 
 interpreting
 directly from source code?

I know of zero claims that making a bytecode standard for javascript will improve performance.

implementation these days is v8, which is implemented as a javascript compiler, not with bytecode: http://wingolog.org/archives/2011/08/02/a-closer-look-at-crankshaft-v8s-optimizing-compiler More interesting posts about v8 on that blog: http://wingolog.org/tags/v8
Dec 20 2012
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Thursday, 20 December 2012 at 01:41:38 UTC, Walter Bright 
wrote:
 Not exactly, I argue that having a bytecode standard is 
 useless. How a compiler works internally is fairly irrelevant.

Note that in the first place, bytecode discussion has started with the need of provide a CTFEable module that do not contains more information that what is in a DI file, as it is a concern for some companies. Bytecode can solve that problem nicely IMO. You mentioned that DI is superior here, but I don't really understand how.
Dec 20 2012
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Friday, 21 December 2012 at 05:43:18 UTC, Walter Bright wrote:
 On 12/20/2012 1:30 PM, deadalnix wrote:
 Note that in the first place, bytecode discussion has started 
 with the need of
 provide a CTFEable module that do not contains more 
 information that what is in
 a DI file, as it is a concern for some companies.

 Bytecode can solve that problem nicely IMO. You mentioned that 
 DI is superior
 here, but I don't really understand how.

No, it doesn't solve that problem at all. I explained why repeatedly.

No you explained that java's bytecode doesn't solve that problem. Which is quite different.
Dec 20 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Thursday, 20 December 2012 at 21:30:44 UTC, deadalnix wrote:
 On Thursday, 20 December 2012 at 01:41:38 UTC, Walter Bright 
 wrote:
 Not exactly, I argue that having a bytecode standard is 
 useless. How a compiler works internally is fairly irrelevant.

Note that in the first place, bytecode discussion has started with the need of provide a CTFEable module that do not contains more information that what is in a DI file, as it is a concern for some companies. Bytecode can solve that problem nicely IMO. You mentioned that DI is superior here, but I don't really understand how.

Walter is right that bytecode doesn't solve that problem at all. High level bytecodes like Microsoft IL are trivially decompiled into very readable source code. I did that frequently at one of my jobs when I needed to debug third-party .NET libraries that we didn't have source code for. The advantage of bytecode is not in obfuscation. What Walter is wrong about is that bytecode is entirely pointless.
Dec 21 2012
prev sibling next sibling parent "Araq" <rumpf_a web.de> writes:
On Friday, 21 December 2012 at 10:30:21 UTC, Walter Bright wrote:
 On 12/21/2012 2:13 AM, Max Samukha wrote:
 What Walter is wrong about is that bytecode is entirely 
 pointless.

I'll bite. What is its advantage over source code?

Interpreting the AST directly: Requires recursion. Interpreting a (stack based) bytecode: Does not require recursion. That's what an AST to bytecode tranformation does; it eliminates the recursion. And that is far from being useless.
Dec 21 2012
prev sibling next sibling parent "Mafi" <mafi example.org> writes:
On Friday, 21 December 2012 at 10:37:05 UTC, Araq wrote:
 On Friday, 21 December 2012 at 10:30:21 UTC, Walter Bright 
 wrote:
 On 12/21/2012 2:13 AM, Max Samukha wrote:
 What Walter is wrong about is that bytecode is entirely 
 pointless.

I'll bite. What is its advantage over source code?

Interpreting the AST directly: Requires recursion. Interpreting a (stack based) bytecode: Does not require recursion. That's what an AST to bytecode tranformation does; it eliminates the recursion. And that is far from being useless.

It don't think that this is such a big deal. Either way you need one stack: either the call stack or the stack machine's stack. It doesn't seem to make a big difference. Am I wrong?
Dec 21 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Friday, 21 December 2012 at 10:30:21 UTC, Walter Bright wrote:
 On 12/21/2012 2:13 AM, Max Samukha wrote:
 What Walter is wrong about is that bytecode is entirely 
 pointless.

I'll bite. What is its advantage over source code?

It is not about bytecode vs source code. It is about a common platform-independent intermediate representation for multiple languages. JS is such a representation in the browsers and it is widely used. It it entirely pointless? I am not convinced it is.
Dec 21 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Friday, 21 December 2012 at 11:00:01 UTC, Max Samukha wrote:
 On Friday, 21 December 2012 at 10:30:21 UTC, Walter Bright 
 wrote:
 On 12/21/2012 2:13 AM, Max Samukha wrote:
 What Walter is wrong about is that bytecode is entirely 
 pointless.

I'll bite. What is its advantage over source code?

It is not about bytecode vs source code. It is about a common platform-independent intermediate representation for multiple languages. JS is such a representation in the browsers and it is widely used. It it entirely pointless? I am not convinced it is.

Another example. Many of us here are talking in an intermediate language, which not English :) The concept of a common representation works pretty well here.
Dec 21 2012
prev sibling next sibling parent "Simen Kjaeraas" <simen.kjaras gmail.com> writes:
On 2012-00-21 12:12, Max Samukha <maxsamukha gmail.com> wrote:

 On Friday, 21 December 2012 at 10:30:21 UTC, Walter Bright wrote:
 On 12/21/2012 2:13 AM, Max Samukha wrote:
 What Walter is wrong about is that bytecode is entirely pointless.

I'll bite. What is its advantage over source code?

It is not about bytecode vs source code. It is about a common platform-independent intermediate representation for multiple languages. JS is such a representation in the browsers and it is widely used. It it entirely pointless? I am not convinced it is.

But Walter has said that for exactly this purpose, bytecode is useful. What he's said is that in the case proposed (using bytecode instead of source code for CTFE), bytecode offers absolutely no advantage over source. Now can we move on? It's been said so many times now, and we all know Walter is not a pushover. If nobody can present irrefutable, solid, peer-reviewed, and definite proof that bytecode has significant advantages over source code for the purpose of CTFE, such an implementation will never be done, and the world will be better off for it. -- Simen
Dec 21 2012
prev sibling next sibling parent "Max Samukha" <maxsamukha gmail.com> writes:
On Friday, 21 December 2012 at 17:08:28 UTC, Simen Kjaeraas wrote:
 On 2012-00-21 12:12, Max Samukha <maxsamukha gmail.com> wrote:

 On Friday, 21 December 2012 at 10:30:21 UTC, Walter Bright 
 wrote:
 On 12/21/2012 2:13 AM, Max Samukha wrote:
 What Walter is wrong about is that bytecode is entirely 
 pointless.

I'll bite. What is its advantage over source code?

It is not about bytecode vs source code. It is about a common platform-independent intermediate representation for multiple languages. JS is such a representation in the browsers and it is widely used. It it entirely pointless? I am not convinced it is.

But Walter has said that for exactly this purpose, bytecode is useful.

Really? He sounded like the whole world should repent for using IRs. Maybe I misunderstood.
 What he's said is that in the case proposed (using bytecode 
 instead of
 source code for CTFE), bytecode offers absolutely no advantage 
 over
 source.

 Now can we move on? It's been said so many times now, and we 
 all know
 Walter is not a pushover. If nobody can present irrefutable, 
 solid,
 peer-reviewed, and definite proof that bytecode has significant
 advantages over source code for the purpose of CTFE, such an
 implementation will never be done, and the world will be better 
 off
 for it.

I am not arguing that.
Dec 21 2012
prev sibling next sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Friday, 21 December 2012 at 07:03:13 UTC, Walter Bright wrote:
 On 12/20/2012 10:05 PM, deadalnix wrote:
 No you explained that java's bytecode doesn't solve that 
 problem. Which is quite
 different.

I did, but obviously you did not find that satisfactory. Let me put it this way: Design a bytecode format, and present it here, that is CTFEable and is not able to be automatically decompiled.

Optimized LLVM bytecode look like a good candidate for the job. Note that I'm not suggesting this as a spec, but as an example of possible solution.
Dec 21 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
 Optimized LLVM bytecode look like a good candidate for the job. 
 Note that I'm not suggesting this as a spec, but as an example 
 of possible solution.

It's true that it couldn't be automatically decompiled to something equivalent to the original D source, but it does contain type information. Its human readable form (llvm assembly language) is easier to understand than assembly.
Dec 21 2012
prev sibling parent "deadalnix" <deadalnix gmail.com> writes:
On Friday, 21 December 2012 at 20:08:00 UTC, jerro wrote:
 Optimized LLVM bytecode look like a good candidate for the 
 job. Note that I'm not suggesting this as a spec, but as an 
 example of possible solution.

It's true that it couldn't be automatically decompiled to something equivalent to the original D source, but it does contain type information. Its human readable form (llvm assembly language) is easier to understand than assembly.

Once the optimizer is passed, a lot of it is lost. It is easier to understand than pure x86 assembly, but it is clearly opaque.
Dec 21 2012