A PlanarFe Adventure

LearnLoveCode

Tap, Inject, and Each_With_Object

http://live2eat.typepad.com/.a/6a014e8968623c970d017d42a40c16970c-pi

I realized the other day that, while I’ve been using Object#tap, Enumerator#each#with_object, and Enumerable#each_with_object for some time now, I wasn’t completely clear on the difference between these methods or how best to employ them. Mostly I just threw in Object#tap wherever I saw sandwich code and called it a day.

Time for a…

http://40.media.tumblr.com/33dded80b572fc3f9733ba5616b7e6ed/tumblr_mv4yonGTMV1rh0tlxo2_1280.jpg

A good place to start is Ruby Docs for the method definitions:

Object#tap -
Yields self to the block, and then returns self. The primary purpose of this method is to “tap into” a method chain, in order to perform operations on intermediate results within the chain.

Enumerable#each_with_object -
Iterates the given block for each element with an arbitrary object given, and returns the initially given object.
If no block is given, returns an enumerator.

Enumerable#inject -
Combines all elements of enum by applying a binary operation, specified by a block or a symbol that names a method or operator.

If you specify a block, then for each element in enum the block is passed an accumulator value (memo) and the element. If you specify a symbol instead, then each element in the collection will be passed to the named method of memo. In either case, the result becomes the new value for memo. At the end of the iteration, the final value of memo is the return value for the method.

If you do not explicitly specify an initial value for memo, then the first element of collection is used as the initial value of memo.

Object#tap

I’ll start with this one, mostly because it’s the one that I have thrown around the most with the least amount of thought. So far I’ve mostly just used it for refactors like this: (cue extremely contrived example…)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  class Human
    attr_accessor :age
    def can_drive?
      age >= 16
    end
  end

  # Sandwich code
  mason = Human.new
  mason.age = rand(1..100)
  mason

  # tapped
  mason = Human.new.tap { |person| person.age = rand(1..100) }

On a personal note, I’m feeling conflicted about this method lately. I like that it eliminates the need to explicitly return the ‘Human’ object. I like it aesthetically, it reads cleanly to my eye, but sometimes it feels a bit like using it just to use it. Maybe even a bit trivial. Not everyone instantly recognizes it and it actually seems to obfuscate your intended return value a bit. Anyway…

While the above implementation certainly works, reading the documentation it isn’t quite the use for which the method was designed.

"The primary purpose of this method is to “tap into” a method chain."

It may clean up the code but maybe the #tap method is better reserved for cases where returning the origional object provides a bit more function:

1
2
3
4
5
6
7
  mason = Human.new
  mason.age = rand(1..100)
  mason.can_drive?

  # or

  mason = Human.new.tap { |person| person.age = rand(1..100) }.can_drive?

Here the return value of person.age = rand(1..100) is an integer and #can_drive? is a method on an instance of the class Human. Without #tap you would not be able to chain these methods together. This feels like a better use case to me; the #tap method serves a more significant purpose, allowing you to assign a variable name, an age, and call an instance method on a class all in one shot.

Enumerable#each_with_object

So this one is a bit different but kind of in the same vein. It’s in the Enumerable module so we know that #each_with_object is called on some kind of collection, like an array or a hash, and iterates through the items in the collection, just like the plain-old #each method. What puts #each_with_object in the family of #tap is the ...with_object bit. It allows us to pass in an arbitrary object and return that object from the method. Where #tap is called on the object you wish to return, #each_with_object takes the return object as an argument. For me the importance is the concept. It couples the iteration and the return.

1
2
3
4
5
6
7
8
9
10
11
12
13
  my_strings = ['hello', 'destiny', 'how', 'are', 'the', 'wife', 'and', 'kids']

  # with tap
  [].tap { |collector| my_strings.each { |word| collector << word.upcase } }
  # or for better readability
  [].tap do |collector|
    my_strings.each do |word|
      collector << word.upcase
    end
  end

  # with each_with_object
  my_strings.each_with_object([]) { |word, collector| collector << word.upcase }

So, rather than iterating through the collection with #tap’s block we implicitly iterate through the collection. I’ve seen #tap used this way but to my eye #each_with_object more clearly and succinctly communicates what you seek to accomplish. If performance is critical, go ahead and use #tap this way, otherwise maybe #each_with_object might help out those developers coming behind you.

Alternately Enumberable.each.with_object functions exactly the same as #each_with_object and might even be more clear in its' link with iterating over a collection.

Enumerable#inject aka Enumerable#Reduce

This is the odd-ball of the group. The idea is that you have an object (noted as the memo_object in the documentation) to which some sort of changes are applied based on the objects over which you are iterating. The simplest examples is addition:

1
2
  [1, 2, 3, 4, 5].inject { |memo_object, number | memo_object = memo_object + number }
  # => 15

#inject comes in two flavors, taking a symbol or a block, and each of those flavors can also optionally take an initial value as an argument. If no initial value is passed, the first object in the collection becomes the initial value by default.

Passed a symbol:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  [2,4,6,8].inject(:+)
  # => 20
  # or equivalently -> [2,4,6,8].reduce(:+)
  # Takes 2 as the initial value,
  # add 4 to it, returns 6 to the next
  # iteration. Takes the 6 returned to the
  # second iteration, adds 6 to it, returns
  # 12 to the next iteration...

  # Passed a symbol and an initial value
  [2,4,6,8].inject(10, :+)
  # => 30
  # Takes 10 as the initial value,
  # adds 2 to it, and returns 12 to
  # the next iteration...

Passed a block:

1
2
3
4
5
6
7
8
9
  my_strings = ['hello ', 'destiny ', 'how ', 'are ', 'the ', 'wife ', 'and ', 'kids?']

  # Passed a block
  my_strings.inject { |sentence, word| sentence + word.capitalize }
  # => "hello Destiny How Are The Wife And Kids?"

  # Passed a block and an initial value
  my_strings.inject("Oh, ") { |sentence, word| sentence + word.capitalize }
  # => "Oh, Hello Destiny How Are The Wife And Kids?"

So I accidentally came up with a better example than I thought for passing a block without an initial value since it demonstrates that nothing is done to the first object in the collection. It is passed straight to the block execution for the next object in the collection as the memo object.

There’s one more layer to #inject. The return value of the method execution is the return value of the last execution of the block, not the memo object.

1
2
3
4
5
6
  [1, 2, 3, 4, 5].inject do |sum, number|
    if number != 5
      sum += number
    end
  end
  # => nil

Since if number != 5; sum +=5; end returns nil when number == 5, the last object in the collection, the return value of the entire method is nil. You could do something kinda ugly like this:

1
2
3
4
5
6
7
  [1, 2, 3, 4, 5].inject do |sum, number|
    if number != 5
      sum += number
    end
    sum
  end
  # => 10

but at that point maybe you’re better off just using a different method like:

1
2
3
4
5
6
7
8
9
10
  sum = 0

  [1, 2, 3, 4, 5].each do |number|
    if number != 5
      sum += number
    end
  end

  sum
  # => 10

At least it’s a bit more obvious at first glance what you want back.

The Inspiration

Here’s the cool bit that got me thinking about these method, specifically #inject. I ran into a problem where I had a hash and I needed to return a hash with mutations applied to both the keys and values in the original hash. Changing values is easy. Changing keys takes a bit more work.

You could update the keys in place by iterate through the keys, create new mutated key and value pairs, and delete the old pair:

1
2
3
4
5
6
7
8
9
10
11
12
  markup = 1.1
  toys = {
          "rubber_ducky" => 5,
          "slinky"       => 3,
          "lawn_dart"    => 7
         }
  toys.keys.each do |name|
    toys[name.upcase.to_sym] = toys[name] * markup
    toys.delete(name)
  end
  toys
  # => {:RUBBER_DUCKY=>5.5, :SLINKY=>3.30..., :LAWN_DART=>7.70...}

You could use the Enumerable#map method to pass arrays of key value pairs to the Hash literal:

1
2
3
4
5
  new_prices = Hash[ toys.map do |name, price|
                      [name.upcase.to_sym, price * markup]
                     end
                    ]
  # => {:RUBBER_DUCKY=>5.5, :SLINKY=>3.30..., :LAWN_DART=>7.70...}

Or you could use the inject method to pass around a hash and merge new key-value pairs into it.

1
2
3
4
  toys.inject({}) do |newly_priced, (name, price)|
    newly_priced.merge( name.upcase.to_sym => price * markup )
  end
  # => {:RUBBER_DUCKY=>5.5, :SLINKY=>3.30..., :LAWN_DART=>7.70...}

I’d be wary of mutating the original array by mutating the keys, assigning the new keys values, and deleting the old keys. If I were to use simple iteration to solve this problem I would probably instantiate a new hash and assign that hash key value pairs from within the each block. Nothing fancy, but it works.

Using the Hash literal and #map looks a bit odd to me. I haven’t seen anyone build hashes this way. The #map inside of the Hash literal is confusing unless you recall that #map always returns an array and that passing arrays of two objects to the Hash literal creates key-value pairs. This solution feels like it is asking a bit more of the next developer to come along than the other options.

Now #inject… I like this one. I like it despite the fact that IT ASKS A LOT!!! of the developers to follow and many things happens in very few lines, buuuuut it’s a pretty slick piece of code. And sometimes I am a sucker for a slick piece of code. So, what does it assume? Well, you have to know:

1. What #inject is.

2. That with #inject you have a memo object to pass to each execution of the block.

3. That the key-value pairs in toys are passed into the block as an array.

4. That you can use parenthesis and multiple assignment to make the block take only two arguments (the memo and the array containing the key and value) and instantly set the value of name and price.

5. That the return value of the #inject method is the return value of the last execution of the block, so you have to return the hash object from the block.

6. And finally, that you can add key-value pairs to a hash using the Hash#merge method, essentially taking the key-value assignment as a mini-hash and merging it into the existing hash. Maybe this would be a bit more clear: newly_priced.merge( {name.upcase.to_sym => price * markup} )

Performance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
N = 1_000_000

markup = 1.1
toys = {
        "rubber_ducky" => 5,
        "slinky"       => 3,
        "lawn_dart"    => 7
       }

require 'benchmark'
Benchmark.bmbm do |x|
  x.report('#each'){ N.times{
                              toys.keys.each do |name|
                                toys[name.upcase.to_sym] = toys[name] * markup
                                toys.delete(name)
                              end
                              toys
                            }
                    }

  x.report('#each.with_object'){ N.times{
                                          toys.each.with_object({}) do |(toy, price), newly_priced|
                                            newly_priced[toy] = price * markup
                                          end
                                        }
                                }

  x.report('#map/Hash Literal'){ N.times{
                                          new_prices = Hash[
                                                            toys.map do |name, price|
                                                              [name.upcase.to_sym, price * markup]
                                                            end
                                                           ]
                                        }
                                }

  x.report('#inject'){ N.times{
                              toys.inject({}) do |newly_priced, (name, price)|
                                newly_priced.merge( name.upcase.to_sym => price * markup )
                              end
                              }
                    }
end

#  Rehearsal -----------------------------------------------------
#  #each               0.200000   0.000000   0.200000 (  0.198086)
#  #each.with_object   0.590000   0.010000   0.600000 (  0.594112)
#  #map/Hash Literal   0.380000   0.000000   0.380000 (  0.384121)
#  #inject             0.300000   0.000000   0.300000 (  0.299615)
#  -------------------------------------------- total: 1.480000sec
#
#                          user     system      total        real
#  #each               0.180000   0.000000   0.180000 (  0.180199)
#  #each.with_object   0.550000   0.000000   0.550000 (  0.553506)
#  #map/Hash Literal   0.370000   0.000000   0.370000 (  0.373649)
#  #inject             0.290000   0.000000   0.290000 (  0.295530)

Ranked by quickest performance:

1. #each

2. #inject (1.6x #each)

3. #map/Hash literal (2.1x #each)

4. #each.with_object (3.1x #each)

The #each statement takes it! But I didn’t expect to incur such a performance penalty using #each.with_object. 3.1x! Ouch. The slick looking #inject maybe isn’t so slick, finishing in 1.6x what it took #each to complete. Maybe because #each is such a common method it has been highly optimized? I don’t have the answer. If speed is your game maybe the obvious route is the best.

Sources:

Ruby Docs - Tap Ruby Docs - Each with Object Ruby Docs - Inject “Ruby’s inject/reduce and each_with_object” - Keith R Bennet “tap vs. each_with_object: tap is faster and less typing” - Gavin Kistner “Inject vs. Each_With_Object” - Alex Wilkinson

“Ruby - #tap that!” - John Crepezzi (just glanced at this one but it looks worth a read)