www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Rich Hickey's slides from jvm lang summit - worth a read?

reply language_fan <foo bar.com.invalid> writes:
http://wiki.jvmlangsummit.com/images/a/ab/HickeyJVMSummit2009.pdf
Sep 19 2009
parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Hell yeah, super interesting! Also extremely well presented.

I found the conclusion of the presentation on youtube, anybody knows if the 
full presentation is (or will be) on the internet?

http://www.youtube.com/watch?v=zRTx1oGG_1Y&feature=channel_page
Sep 20 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Executive summary: pure functions and immutable data structures help 
manage program complexity.
Sep 24 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 Executive summary: pure functions and immutable data structures help 
 manage program complexity.

At the moment in D there aren't many immutable data structures available, but of course they can be written. Such data structures often put the GC under some/high pressure. I don't know if the current D GC is able to cope. Bye, bearophile
Sep 24 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Walter Bright:
 
 Executive summary: pure functions and immutable data structures help 
 manage program complexity.

At the moment in D there aren't many immutable data structures available, but of course they can be written. Such data structures often put the GC under some/high pressure. I don't know if the current D GC is able to cope.

I don't understand your comment. With immutable being transitive, any data structure can be made immutable. The GC doesn't care if a data structure is immutable or not, I don't see how that affects its operation.
Sep 25 2009
parent reply Jason House <jason.james.house gmail.com> writes:
Walter Bright Wrote:

 bearophile wrote:
 Walter Bright:
 
 Executive summary: pure functions and immutable data structures help 
 manage program complexity.

At the moment in D there aren't many immutable data structures available, but of course they can be written. Such data structures often put the GC under some/high pressure. I don't know if the current D GC is able to cope.

I don't understand your comment. With immutable being transitive, any data structure can be made immutable. The GC doesn't care if a data structure is immutable or not, I don't see how that affects its operation.

Think about what happens when you want to mutate an immutable data structure. You must create one or more new objects. That increases the number of objects for the GC to handle
Sep 25 2009
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Jason House wrote:
 Walter Bright Wrote:
 
 bearophile wrote:
 Walter Bright:
 
 Executive summary: pure functions and immutable data structures
 help manage program complexity.

available, but of course they can be written. Such data structures often put the GC under some/high pressure. I don't know if the current D GC is able to cope.

any data structure can be made immutable. The GC doesn't care if a data structure is immutable or not, I don't see how that affects its operation.

Think about what happens when you want to mutate an immutable data structure. You must create one or more new objects. That increases the number of objects for the GC to handle

On the other hand, immutability means that diverse data structures can share parts of them, reducing the number of objects. I find that immutable strings means my programs allocate far fewer strings, as it is no longer necessary to defensively make copies "just in case" something else changes them. I've been thinking of transitioning dmd's semantic analysis to using immutable data structures because it will reduce allocations, not increase them.
Sep 25 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

I've been thinking of transitioning dmd's semantic analysis to using immutable
data structures because it will reduce allocations, not increase them.<

As usual I am ignorant about such matters, so I can't give you a good answer. But from what I've seen in immutable-based languages, they produce a sustained flux of small/little immutable objects that the GC has to free and allocate all the time. This stresses the GC. Surely people here that know Clojure or Scala, Haskell or F# may give you a better answer. Maybe even numbers of the amount of object flux they face in normal programs. So the situation with strings may be not generalizable to the other immutable data structures a program may need to use. Bye, bearophile
Sep 25 2009
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Walter Bright:
 
 I've been thinking of transitioning dmd's semantic analysis to
 using immutable data structures because it will reduce allocations,
 not increase them.<

As usual I am ignorant about such matters, so I can't give you a good answer. But from what I've seen in immutable-based languages, they produce a sustained flux of small/little immutable objects that the GC has to free and allocate all the time. This stresses the GC. Surely people here that know Clojure or Scala, Haskell or F# may give you a better answer. Maybe even numbers of the amount of object flux they face in normal programs. So the situation with strings may be not generalizable to the other immutable data structures a program may need to use.

Some languages generate lots of allocations for other reasons, such as lack of value aggregates that can be put on the stack.
Sep 25 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 Some languages generate lots of allocations for other reasons, such as 
 lack of value aggregates that can be put on the stack.

This is true. But I have no idea how much Haskell/Clojure/F#/OCaML/etc use the stack to allocate temporary things. I'd like to know more about this. I'll have to ask. Bye, bearophile
Sep 25 2009
prev sibling parent reply Chris Nicholson-Sauls <ibisbasenji gmail.com> writes:
bearophile wrote:
 Walter Bright:
 
 I've been thinking of transitioning dmd's semantic analysis to using immutable
data structures because it will reduce allocations, not increase them.<

As usual I am ignorant about such matters, so I can't give you a good answer. But from what I've seen in immutable-based languages, they produce a sustained flux of small/little immutable objects that the GC has to free and allocate all the time. This stresses the GC. Surely people here that know Clojure or Scala, Haskell or F# may give you a better answer. Maybe even numbers of the amount of object flux they face in normal programs. So the situation with strings may be not generalizable to the other immutable data structures a program may need to use. Bye, bearophile

My personal experience with Scala, at least, has been that it doesn't hurt anything. Even just considering the most common kinds of operations: ( 1 :: 2 :: 3 :: 4 :: 5 :: Nil ) creates a Scala list of five values... but in the process creates five different lists! Consider it "unrolled" to something like this: tmp = 5 :: Nil tmp = 4 :: tmp tmp = 3 :: tmp tmp = 2 :: tmp tmp = 1 :: tmp tmp Yipes. BUT... think of it as a single-linked-list structure, and its not really that bad, because each of those objects is quite small, and all five of them are preserved in the original list. Compare: A = 3 :: 4 :: Nil B = 2 :: a C = 1 :: a Here we have two lists (B & C) being made, each with four items... but only four items total, not eight, because the '4' and '3' cells in A are being reused between them. I think that kind of capability is what has Walter interested. -- Chris Nicholson-Sauls P.S. -- Just to make your head hurt... "::" isn't even an operator, its the name of a class (& a singleton too...) defined in the stdlib as a derivative of List. Scala is just weird like that.
Sep 29 2009
next sibling parent Rainer Deyke <rainerd eldwood.com> writes:
Chris Nicholson-Sauls wrote:
 My personal experience with Scala, at least, has been that it doesn't
 hurt anything.

Scala uses the Java garbage collector, which can take all kinds of abuse without flinching. -- Rainer Deyke - rainerd eldwood.com
Sep 29 2009
prev sibling parent Justin Johansson <procode adam-dott-com.au> writes:
Chris Nicholson-Sauls Wrote:

 bearophile wrote:
 Walter Bright:
 
 I've been thinking of transitioning dmd's semantic analysis to using immutable
data structures because it will reduce allocations, not increase them.<

As usual I am ignorant about such matters, so I can't give you a good answer. But from what I've seen in immutable-based languages, they produce a sustained flux of small/little immutable objects that the GC has to free and allocate all the time. This stresses the GC. Surely people here that know Clojure or Scala, Haskell or F# may give you a better answer. Maybe even numbers of the amount of object flux they face in normal programs. So the situation with strings may be not generalizable to the other immutable data structures a program may need to use. Bye, bearophile

My personal experience with Scala, at least, has been that it doesn't hurt anything. Even just considering the most common kinds of operations: ( 1 :: 2 :: 3 :: 4 :: 5 :: Nil ) creates a Scala list of five values... but in the process creates five different lists! Consider it "unrolled" to something like this: tmp = 5 :: Nil tmp = 4 :: tmp tmp = 3 :: tmp tmp = 2 :: tmp tmp = 1 :: tmp tmp Yipes. BUT... think of it as a single-linked-list structure, and its not really that bad, because each of those objects is quite small, and all five of them are preserved in the original list. Compare: A = 3 :: 4 :: Nil B = 2 :: a C = 1 :: a Here we have two lists (B & C) being made, each with four items... but only four items total, not eight, because the '4' and '3' cells in A are being reused between them. I think that kind of capability is what has Walter interested. -- Chris Nicholson-Sauls P.S. -- Just to make your head hurt... "::" isn't even an operator, its the name of a class (& a singleton too...) defined in the stdlib as a derivative of List. Scala is just weird like that.

<<<< "But from what I've seen in immutable-based languages, they produce a sustained flux of small/little immutable objects that the GC"




It's not just small/little fragments .. often the entire list that the system/library/language just made a copy of in order to preserve immutability. Take a classical Lisp cons list for example. Its a singly linked list referenced by just a single pointer to the head of the list, and with each cell comprising of a datum and a link to the next cell in the list. Now appending a single item to the head of the list happens in O(n) time and preserves immutability (by default) of the original list by virtue of the intrinsic recursive structure of a cons list. OTOH, and assuming you want list immutability, appending items to the end of the list happens in quadratic time since a copy must be made of the list that you are appending to. By this rational, it's not just GC of individual (as in small/little) cons cells, it's garbage collection of entire lists of the tiny tots to worry about. btw. It wasn't that long ago that I wrote an immutable list library in JavaScript using JS dynamic arrays and taking advantage of common structure. My goal was to make prepending items to front of the list and appending items to end of the list happen in more-or-less the same linear time. Glad to say that I succeeded and did a benchmark comparison against Python. My JS implementation (yes, non-native list processing written in JS) beat Python hands down in an overall performance benchmark for both prepend and append operations. From what I've read of Clojure, found out by playing with it, and perusing its source, it's no slouch either when it comes to making as much use of common data as possible in order to make for immutability without performance-cripple. Ah, there's a glimmer of hope with D dynamic arrays to achieve similarly. -- Justin Johansson
Sep 29 2009
prev sibling next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
Walter Bright wrote:

 Executive summary: pure functions and immutable data structures help
 manage program complexity.

I think so too, but you left out the time and identity part related to stm and multiversion concurrency. You could argue these notions are a possible consequence of immutable data structures and pure functions. But the time thingy somehow seems more fundamental than that. Anyhow it nicely hints at what we can do with these concepts. It seems to me that this is the fundamental part of functional programming, and not functions as first class citizens. This is also the part that most modern languages that claim to be hybrid OO / functional do not support.
Sep 24 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 Executive summary: pure functions and immutable data structures help 
 manage program complexity.

There's something missing in most of the articles I've read that praise pure functions and immutable data structures. When I write a 500-lines long Python program I often start using almost pure data structures and pure functions, they help me avoid bugs and keep things tidy. Then if the performance is good enough I may stop (I add more tests, docs, etc). If the performance isn't good enough I often profile the code and (beside trying to improve algorithms and data structures, catching and pre-processing, etc), I also sometimes reduce the purity of some performance critical functions/methods and change the code in some spots so it modifies some big data structures in place, to avoid repeated allocations and reduce GC activity. This speeds up the code. I'd like a language that helps me perform such changes from almost purity to a code that in certain spots is less functional/pure. I'd like such language to give me quick ways to go back to a more pure code, to help me modify/debug the code further, because during program design or during debugging purity/immutability help. But once I have optimized my Python code, I have lost some of such simplicity, and it requires work if you want to go back to a more pure code to perform more debugging/code improvements. I think such ideas may be applied to D programs too. Bye, bearophile
Sep 24 2009
parent reply Rich Hickey <richhickey gmail.com> writes:
bearophile Wrote:

 Walter Bright:
 
 Executive summary: pure functions and immutable data structures help 
 manage program complexity.

There's something missing in most of the articles I've read that praise pure functions and immutable data structures. When I write a 500-lines long Python program I often start using almost pure data structures and pure functions, they help me avoid bugs and keep things tidy. Then if the performance is good enough I may stop (I add more tests, docs, etc). If the performance isn't good enough I often profile the code and (beside trying to improve algorithms and data structures, catching and pre-processing, etc), I also sometimes reduce the purity of some performance critical functions/methods and change the code in some spots so it modifies some big data structures in place, to avoid repeated allocations and reduce GC activity. This speeds up the code. I'd like a language that helps me perform such changes from almost purity to a code that in certain spots is less functional/pure. I'd like such language to give me quick ways to go back to a more pure code, to help me modify/debug the code further, because during program design or during debugging purity/immutability help. But once I have optimized my Python code, I have lost some of such simplicity, and it requires work if you want to go back to a more pure code to perform more debugging/code improvements. I think such ideas may be applied to D programs too.

You might find this interesting: http://clojure.org/transients It supports using the exact same immutable data structures in/out of your (externally pure) function, as well as the exact same functional "shape" of your code when using transients (roughly, mutables) internally. Rich
Sep 25 2009
parent reply bearophile <bearophileHUGS lycos.com> writes:
Rich Hickey:

 You might find this interesting:
 http://clojure.org/transients
 It supports using the exact same immutable data structures in/out of your
(externally pure) function, as well as the exact same functional "shape" of
your code when using transients (roughly, mutables) internally.

Yes, it's interesting and cute, thank you for the link. D2 too is leading in that direction, allowing mutables and imperative code inside pure functions. (That's also why I have asked Don to fix pure/notpure functions inside pure functions, so now inside pure functions you can have complex stuff (inner functions are a bit slower, but probably that's often acceptable)). Near the end of that page there's written:
(time (def v (vrange 1000000)))

(time (def v2 (vrange2 1000000))) "Elapsed time: 34.428 msecs" Oh, yeah, transients are fast!< 34 ms is the timing with transients. The problem is that "fast" is almost meaningless. When you give a timing you have to always give ways to perform an absolute comparison too, relative comparisons aren't enough. When you show Clojure timings you have to specify the CPU and give timings for equivalent C and Java code. For example on a ~2GHz Celeron in D that allocation + initialization loop takes about 7.3 ms with DMD. If you use memory from the C heap (avoiding the initial useless clearing of the array of integers) that code runs in about 3.6 ms. Almost ten times faster than that "fast" Clojure version. (I'd like the D compiler to recognize the basic situations where the code clears a vector of values that are then immediately fully initialized, and avoid the clearing, to save some time. It's a very basic optimization, it's a very common situation, and avoids the programmer to manually write such boring optimization. Thank you.) Bye, bearophile
Sep 25 2009
parent reply Rich Hickey <richhickey gmail.com> writes:
bearophile Wrote:

 Rich Hickey:
 
 You might find this interesting:
 http://clojure.org/transients
 It supports using the exact same immutable data structures in/out of your
(externally pure) function, as well as the exact same functional "shape" of
your code when using transients (roughly, mutables) internally.

Yes, it's interesting and cute, thank you for the link. D2 too is leading in that direction, allowing mutables and imperative code inside pure functions. (That's also why I have asked Don to fix pure/notpure functions inside pure functions, so now inside pure functions you can have complex stuff (inner functions are a bit slower, but probably that's often acceptable)). Near the end of that page there's written:
(time (def v (vrange 1000000)))

(time (def v2 (vrange2 1000000))) "Elapsed time: 34.428 msecs" Oh, yeah, transients are fast!< 34 ms is the timing with transients. The problem is that "fast" is almost meaningless.

"fast" is always a subjective term.
When you give a timing you have to always give ways to perform an absolute
comparison too, relative comparisons aren't enough. When you show Clojure
timings you have to specify the CPU and give timings for equivalent C and Java
code.

No, I don't. The post is for Clojure users and shows them a relative comparison they care about, persistent vs transient.
 For example on a ~2GHz Celeron in D that allocation + initialization loop
takes about 7.3 ms with DMD. If you use memory from the C heap (avoiding the
initial useless clearing of the array of integers) that code runs in about 3.6
ms. Almost ten times faster than that "fast" Clojure version.
 

Unless that code is returning a persistent vector and provides thread isolation, it is not doing the same job. I don't have time to spend coding up a persistent vector and thread-isolated transients in C/C++/D. It might be faster. I didn't say this was "fastest", that wasn't the point. Best of luck with D. Rich (Early Zortech user and fan, btw)
Sep 25 2009
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Rich Hickey wrote:
 Rich (Early Zortech user and fan, btw)

Thanks, and thanks for stopping by and commenting in this thread.
Sep 25 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Rich Hickey:

No, I don't. The post is for Clojure users and shows them a relative comparison
they care about, persistent vs transient.<
Unless that code is returning a persistent vector and provides thread
isolation, it is not doing the same job.<

Sorry, I didn't mean to offend you in any way. Sometimes I am not gentle enough. I am sorry. You are right that my comparison with D was wrong because the semantics of that Clojure code and data structures is richer. On the other hand a comparison with a simpler/basic implementation (like just a C array) can be quite useful anyway, to know how much you have to pay for such useful extra semantics. I have seen several online benchmarks of many languages, Clojure too. Often they show only numbers for a single language. In such situations a comparison with other languages is useful to put numbers in perspective. So a comparison with (for example) a baseline C program is useful (hopefully with the same semantics, where possible). For Clojure a comparison with Java can be equally useful. Bye, bearophile
Sep 25 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
language_fan wrote:
 Fri, 25 Sep 2009 20:17:12 -0400, bearophile thusly wrote:
 
 Rich Hickey:

 No, I don't. The post is for Clojure users and shows them a relative
 comparison they care about, persistent vs transient.< Unless that code
 is returning a persistent vector and provides thread isolation, it is
 not doing the same job.<

enough. I am sorry. You are right that my comparison with D was wrong because the semantics of that Clojure code and data structures is richer. On the other hand a comparison with a simpler/basic implementation (like just a C array) can be quite useful anyway, to know how much you have to pay for such useful extra semantics. I have seen several online benchmarks of many languages, Clojure too. Often they show only numbers for a single language. In such situations a comparison with other languages is useful to put numbers in perspective. So a comparison with (for example) a baseline C program is useful (hopefully with the same semantics, where possible). For Clojure a comparison with Java can be equally useful.

Often the geniuses behind the best languages are not interested in implementing everything themselves, instead they focus on building a framework for others to build on and specifying the semantics of the language. The skill for designing a good language is a gift, and not everyone has it. On the other hand, if you hand out instructions, even uneducated novice coders from the 3rd world's high schools can implement the best compiler in the world. So fact that the language is fast and can be made faster (if not optimally fast) with more effort, often suffices.

I disagree about it being a gift, someone may just enjoy coming up with ideas but not enjoy developing them, that doesn't mean he doesn't have a gift for development, he just doesn't enjoy it. What most people see as a "gift" I see as just enjoying what you do. Just like when I was in music school, I saw people whom everyone thought had no talent, something believed to be genetic and a gift, rise up to become much better musicians than people who had the "gift". It really just comes down to enjoying youself, and that's something anyone can learn.
 This is similar to gotos and higher order functions / dynamic dispatch. 
 The low level language designers focus on optimizing the goto statement 
 on all possible architectures using the fastest sse4 instructions. 
 However, their language does not scale to 500+ LOC programs because it is 
 totally unreadable. Each time they manage to cut off yet another 12 ns 
 from an inner loop, they post the news to reddit and praise their savior 
 who never listens to their ideas. The higher level language designers 
 focus on designing new language theory and complex control and data 
 structures that make the language more usable in the future.

Optimizing goto's using SSE4? Thank god you're not optimizing languages :p The balance lies between low level optimizations and high level structures. The best way to develop is often to focus on that balance, and then come back after profiling to optimize the bottlenecks. If you focus on complex structures to make your life simpler only, you will make it hell to optimize so you're not winning anything in the end, if you focus only on low level optimizations you get code that makes you go nuts when you read it back 3 months later and is therefore harder to modify. Jeremie
Sep 26 2009
prev sibling next sibling parent language_fan <foo bar.com.invalid> writes:
Thu, 24 Sep 2009 18:00:56 -0400, bearophile thusly wrote:

 Walter Bright:
 
 Executive summary: pure functions and immutable data structures help
 manage program complexity.

There's something missing in most of the articles I've read that praise pure functions and immutable data structures. When I write a 500-lines long Python program I often start using almost pure data structures and pure functions, they help me avoid bugs and keep things tidy. Then if the performance is good enough I may stop (I add more tests, docs, etc). If the performance isn't good enough I often profile the code and (beside trying to improve algorithms and data structures, catching and pre-processing, etc), I also sometimes reduce the purity of some performance critical functions/methods and change the code in some spots so it modifies some big data structures in place, to avoid repeated allocations and reduce GC activity. This speeds up the code.

Well, there are type and effect systems that allow in-place modifications without breaking the type rules, but many common functional languages just do not use these systems. If you break free from the safe part of the language, it is like driving a car without seat belt and/or brakes. If you are a hardcore professional, you know how to avoid some of the dangers, but often it is not possible to avoid them all. Static type checking has a crucial property: it can prove some properties (like the absence of some errors) without decreasing the runtime performance. Another alternative are the unit tests, but no matter how many tests you add to your test suite, you can never prove the absence of errors, only their presence. Switching to Python is in one way a step in the wrong direction - you lose something you already had for free - instead you need to emulate it with tons of additional unit tests to achieve acceptable levels of quality. Type systems, like the pure data structures, have their roots in formal logic. I recommend taking a cursory look at the history of types. It all started in 1870s, and even though there were functional languages like Lisp already in 1970, it was only then that someone (Reynolds, Milner) came up with modern concepts like polymorphisms. E.g. the type system for stuff like int get_length_of_a_list(T)(List!(T) list); was invented in the 70s. Lisp could not provide this kind of safety for lists, there was only the core stuff for lambda calculus ready back then. As another example I could mention typesafe printf that was invented in the 80s. The good thing is, some serious advancements have happened since then, but more is definitely yet to come. It would be foolish to think that functional languages have somehow remained stagnant since the introduction of Lisp. On the contrary, the mainstream C-like languages have refused to take advantage of type systems developed in the 90s or in the 21th century. Now that multi-cores are becoming the norm, a lot of research is happening also on that domain.
Sep 24 2009
prev sibling parent language_fan <foo bar.com.invalid> writes:
Fri, 25 Sep 2009 20:17:12 -0400, bearophile thusly wrote:

 Rich Hickey:
 
No, I don't. The post is for Clojure users and shows them a relative
comparison they care about, persistent vs transient.< Unless that code
is returning a persistent vector and provides thread isolation, it is
not doing the same job.<

Sorry, I didn't mean to offend you in any way. Sometimes I am not gentle enough. I am sorry. You are right that my comparison with D was wrong because the semantics of that Clojure code and data structures is richer. On the other hand a comparison with a simpler/basic implementation (like just a C array) can be quite useful anyway, to know how much you have to pay for such useful extra semantics. I have seen several online benchmarks of many languages, Clojure too. Often they show only numbers for a single language. In such situations a comparison with other languages is useful to put numbers in perspective. So a comparison with (for example) a baseline C program is useful (hopefully with the same semantics, where possible). For Clojure a comparison with Java can be equally useful.

Often the geniuses behind the best languages are not interested in implementing everything themselves, instead they focus on building a framework for others to build on and specifying the semantics of the language. The skill for designing a good language is a gift, and not everyone has it. On the other hand, if you hand out instructions, even uneducated novice coders from the 3rd world's high schools can implement the best compiler in the world. So fact that the language is fast and can be made faster (if not optimally fast) with more effort, often suffices. This is similar to gotos and higher order functions / dynamic dispatch. The low level language designers focus on optimizing the goto statement on all possible architectures using the fastest sse4 instructions. However, their language does not scale to 500+ LOC programs because it is totally unreadable. Each time they manage to cut off yet another 12 ns from an inner loop, they post the news to reddit and praise their savior who never listens to their ideas. The higher level language designers focus on designing new language theory and complex control and data structures that make the language more usable in the future.
Sep 26 2009