CodeHappy

July 4, 2007

Lambdas, closures, and the risks of Ruby Magic

Filed under: Ruby On Rails — pwrighta @ 6:59 pm

811116_line_up.jpg

I’ll admit it. I’m somewhat privileged to work with a bunch of techie geniuses that from time to time make me feel a little inferior. I am, after all, a further education drop out. I don’t have a degree, and was to busy writing code to pass my college (high school) exams. Most of the people I work with went far beyond me, and many of them have a very good academic understanding of programming principles and theory. When I mentioned ‘closures’ to a couple of them last week I got a very knowing look from them. I hadn’t even heard of ‘closures’ before last week. So, presuming there are a lot of people out there like me, people that came up through the ranks from Assembler, through Basic, C and Pascal and then found themselves in a world of dynamic, highly object oriented scripting languages, I set about trying to find out more in order to blog it.

The whole subject came up when I was browsing the source code for Rails. I’m a language and frameworks guy. I love to understand how a framework does it’s job, and I revel in learning the inner workings of language compilers and interpreters. So, my interest got piqued when I came across this, the source behind Rails 1.2’s new respond_to method.

def respond_to(*types, &block)
  raise ArgumentError, "..." unless types.any? ^ block
  block ||= lambda { |responder| types.each { |type| responder.send(type) } }
  responder = Responder.new(self)
  block.call(responder)
  responder.respond
end

The bit that caught my eye was the ‘lambda’ call.

lambda { |responder| types.each { |type| responder.send(type) } }

I know about blocks in Ruby, but I hadn’t seen lambda used before so I started to dig. In digging, I came across the subject of closures, and how they are a solution to the FUNARG problem. Huh? Ok, back to basics we go.

First up. Lambda. You can safely ignore this one. lambda is basically a synonym for Proc.new. It does exactly the same thing, returning a function or block that you can assign to a variable, and pass around. For example, the above code could equally well have been written

Proc.new { |responder| types.each { |type| responder.send(type) } }

In Ruby you can then assign these anonymous functions to variables and call them at will. For example

say_hi = Proc.new { puts "Hi there" }
say_hi.call

If you wanted to pass this block to a method, then you can simply prefix the variable name with an ampersand

some_method &say_hi

def some_method ( &the_block )
	the_block.call
end

Notice how, in the method that gets passed the block, the variable is used without the leading ampersand. The ampersand is only used to show that a variable holds a function when you pass it, and when you define the parameter that will catch it on a function definition.

Proc objects are used throughout Rails as a way for developers to define arbitrary code and pass it to the framework, effectively providing a way for the framework to be customized on the fly without the need for inheritance, overloading and all the other traditional clunkiness.

In researching these things though, I found out that lambda/Proc.new returns a Proc object which is a ‘closure’. That’s not something I’d heard of before. And thus the research began.

Way back in the mists of time (yes, even before the XBOX was invented), most of us were familiar with stack frames. Working with the stack was an integral part of any assembler programming (that’s how I got started) and it ran things behind the scenes in most of the languages-du-jour, such as Basic and C. For example, in C, if you defined a bunch of local variables in a function and then called another function, the variables would be stored in a ’stack frame’. When the called function returned, the frame would be popped off the stack in order for the local variables to be re-hydrated. Simple enough.

The problem, which came to be known as the FUNARG (or Function Argument) problem reared it’s head when you started thinking about implementing functions as first class objects. To put it another way, you had a problem with the traditional stack-frame/dynamic scoping way of doing things if you wanted to pass a function around inside a variable (as we do in the Ruby code above).

For example, if you define a function inline, within another function, what happens when that new anonymous function decides to access a local variable. The correct thing to happen is that the new anonymous function would have access to the same local variables as the function or scope defining it. The Funarg problem then has two flavors; upwards funarg, and downards funarg.

Upwards Funarg is the problem language implementors have to overcome when a function is returned from a called function. A calls B, B returns a new function C, inside a variable. C then accesses B’s local variables. That’s a problem. Typically in a dynamic scoping language, the stack frame (and local variables) in B get trashed as soon as B returns. So, for C to work the language needs to keep those variables around potentially for quite a long time after B returns. The problem is exacerbated if B is repeatedly called. You could end up with a bunch of C functions each of which needs access to the local variables in B as they were when the function was defined in the first place. You can see why this is a problem I’m sure.

Downards funarg is the reverse. If A defines B and passes it to C, then B somehow needs to be able to trawl through the stack and find A’s local variables in order for it to execute on demand. Not quite as complex as an upwards funarg, but nasty none the less. In fact, funargs are generally nasty enough that as more and more developers demanded functions as first class objects, the whole concept of dynamic scoping gave way to lexical scoping as we find it in most modern languages (C#, Python, and of course Ruby, for example).

Lexical scoping brings with it ‘closures’ as the solution to the problem.

Simply put, a closure is the combination of an anonymous function and the variables existing in the scope that defined it. So, if A creates B to pass on, a closure is created including the function, ‘B’, and all the variables in ‘A’ that it needs access to. If ‘A’ were called multiple times, you’d get multiple closures. Each ‘instance’ of B would be a closure with access to the variables in ‘A’ that were in existence at the time the instance of ‘B’ was created. Simple really.

Well, theoretically. Closures are a great idea, and they solve the funarg problem. In a lexical scoping based language that supports functions as first class objects, you have to have closures. The problem comes from not understanding them. In Ruby, the problem is exacerbated by Ruby Magic; closures enable much of the Ruby magic that made programmers fall in love with Ruby, but with great power…yadda yadda.

Take a look at this.

def method_that_takes_a_block( &the_block )
  local_variable = 12
  the_block[ "inside a method" ]
end

local_variable = 13
local_block = Proc.new { |where| puts "Block called from #{where}. Local variable is #{local_variable}" }

local_block.call( "inline")

method_that_takes_a_block( &local_block )

This is basically the Ruby solution to the downwards funarg. In both cases, when the block is called local_variable is shown to contain 13. The closure is in the outer scope, where local_variable is assigned the value 13. So, the proc always sees that value.

The risk here is the age old Ruby conciseness versus explicitness problem. It would be very easy to see read local_variable reassigned in the method and assume that the block will pick that up. It won’t, because of the closure.

That’s a minor problem though. Take a look at this one, Ruby’s solution to the upwards funarg.

def method_that_returns_a_block( x )
  some_value = x * 12

  return Proc.new { puts "The value of X *was* #{x}, causing some_value to be #{some_value}"}

end

block = method_that_returns_a_block(5)
block.call

Looks simple doesn’t it. The method returns a closure that allows it access to the method’s local variable ’some_value’. Even though the method returns, the variables in scope when it was executing are still available to the block. This is much more insidious. Imagine if that method contained an instance of ActiveRecord which the block modified. That would be a hard problem to chase down; why is an out of scope object being modified? How?

Closures allow us to use functions as first class objects, and provide Ruby with much of its power. In addition, closures are what makes much of the Rails framework, and all its magic, possible. But, be careful when you work with these things. It’s so incredibly easy to tie yourself up in knots and not realize it. It’s that great power, great responsibility thing all over again.

11 Comments »

  1. The #lambda kernel method is not the same as Proc#new. It’s basically the same, but not exactly. Here’s the documentation for lambda:

    Equivalent to Proc.new, except the resulting Proc objects check the number of parameters passed when called.

    Example:

    $ irb
    irb(main):001:0> Proc.new{|x,y| }.call 1,2,3
    => nil
    irb(main):002:0> lambda {|x,y| }.call 1,2,3
    ArgumentError: wrong number of arguments (3 for 2)
    from (irb):2
    from (irb):2:in `call'
    from (irb):2
    from :0
    irb(main):003:0>

    Comment by Simen — July 6, 2007 @ 6:51 am | Reply

  2. That’s a good point – thanks for pointing it out.

    In practice (in terms of the type of object it returns, the way it manages closures, and so on) Lambda is identical to Proc.new, but yes, you are right, lambda gives you more runtime safety because of the parameter count checking.

    Comment by pwrighta — July 6, 2007 @ 8:03 am | Reply

  3. [...] the piece I wrote on lambdas and closures in Ruby struck a nerve. Thanks to those responsible for getting it onto the front page of the mighty [...]

    Pingback by Lambdas and closures piece hits Reddit « CodeHappy — July 6, 2007 @ 8:49 am | Reply

  4. While the latter issue you bring up is somewhat understandable, I don’t think the former one is a problem at all. Within the function definition, you invoke the block as a black box. Thus, it seems fairly clear (at least to me) that it behaves as a black box — i.e., acquires no behavior from the current function. Similarly, when you define the block and pass it to the function, the function acts basically as a black box — you don’t see that code in the immediate vicinity, so there’s no reason to think it would affect the block’s code.

    I do think the latter problem can be fixed by using good judgement. Returning blocks isn’t something that’s often seen, simply because it can be somewhat confusing — amongst other things, for precisely the reason you mention. Blocks represent a nice way to encapsulate functionality you need immediately in the current scope and pass it elsewhere where it proves useful. Returning a block sort of goes against that, as it means that you haven’t defined it locally — you’re using someone else’s block, with all the odd potential side-effects that brings with it. More dangerous, less used :)

    Comment by Shadowfiend — July 6, 2007 @ 9:30 am | Reply

  5. its not the only difference.

    def try_ret_procnew
    ret = Proc.new { return “Baaam” }
    ret.call
    “This is not reached”
    end

    # prints “Baaam”
    puts try_ret_procnew

    While return from lambda acts more conventionally, returning to its caller:

    def try_ret_lambda
    ret = lambda { return “Baaam” }
    ret.call
    “This is printed”
    end

    # prints “This is printed”
    puts try_ret_lambda

    Comment by Pedro — July 6, 2007 @ 10:50 am | Reply

  6. Another great point. I’ll write up an addendum post to this later. Thanks for all the great feedback

    Comment by pwrighta — July 6, 2007 @ 11:34 am | Reply

  7. Found this and sounds like JavaScript journey to me, but I know little of it so irrelevant..

    Out of interest, those “Assembler, through Basic, C and Pascal and then found themselves in a world of dynamic, highly object oriented scripting languages” guys.. What are their beliefs today?

    Have they seen any light out in either lambda or Ruby or whatever icecream comes next. I mean do they believe that they have seen some persuasive, enabling, machine abstraction that wasn’t provided for them before in their extensive journey? (apart from that productivity boost that trades some fragility potential in return)

    Cheers

    http://sixyears.wordpress.com

    Comment by sixyears — July 6, 2007 @ 4:28 pm | Reply

  8. [...] read more | digg story [...]

    Pingback by Great explination of lambdas and closures in ruby. — July 6, 2007 @ 8:31 pm | Reply

  9. That’s a nice post, congratulations!

    Erratum: the 3rd line of method_that_takes_a_block definition should use parenthesis instead of brackets in the the_block call.

    Finally, for those interested in the origin of the term “closure”, this is an extract from the mathematical definition of closure:

    “Given an operation on a set X, one can define the closure C(S) of a subset S in X to be the smallest subset closed under that operation that contains S as a subset. For example, the closure of a subset of a group is the subgroup generated by that set.”

    source: http://en.wikipedia.org/wiki/Closure_%28mathematics%29

    Comment by Adriano — July 31, 2007 @ 1:15 am | Reply

  10. Nice post. For me as a guy with a C background that cleared up the whole “mess” (at least in my mind) about the FUNARGS problem.
    On the other hand, after you’ve grokked how closures work, what you call problems or behaviour to be wary of is just…normal. No?

    Next I should read the lambda papers…

    Comment by sys — July 29, 2008 @ 8:22 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

Blog at WordPress.com.