Posts tagged with ruby

High-Low Testing

High-low testing has changed the way I build software. When I first started using Rails five years ago, the paradigm of choice was fat model, skinny controller. This guidance is well-intentioned, but the more I worked within its bounds, the more frustrated I became. In particular, with fat models. Why should anything in software be fat?

Fat and complex models spread their entanglements likes roots of an aspen. When models adopt every responsibility, they become intertwined and cumbersome to work with. Development slows and happiness dwindles. The code not only becomes more difficult to reason about, but is more arduous to test. Developer happiness should be our primary concern, but when code is slow and poorly factored, we loosen our grasp on that ideal.

So how can we change the way we build software to optimize for developer happiness? There are a number of valuable techniques, but I've coalesced on one. The immediate win is a significantly faster unit test suite, running at thousands of tests per second. As a guiding principle, the implementation should not suffer while striving to improve testability. Further, though I'll use Rails as an example, such a technique should be transferable to multiple domains, languages and frameworks.

The approach I'm describing is what I call high-low testing. The name says it all; the goal is to test at a high, acceptance level and the low, unit level. The terms "high" and "low" come from the levels of test abstraction in an application. See the diagram below. In a web app, a higher test abstraction executes code that drives browser interactions. Lower test abstractions work with the pieces of logic that eventually compose a functioning system.

This article outlines the practices and generalized approaches to implementing high-low testing in Rails apps of any size. We'll discuss the roadmap to fast tests and how this mindset promotes better object orientation. Hopefully after considering high-low testing, you'll understand how to increase your developer happiness.

Aside: this article is broken into two major sections: the primary content and the appendix. Feel free to ignore the appendix, though it contains many real-world discussion points.

Seeking a better workflow

Before explaining how to apply high-low testing, let's look at the driving forces behind it.

You may be working in a large legacy codebase. Worst case scenario, there are zero tests and the work you're doing needs to build on existing functionality. Since the tests are either slow or nonexistent, you would like a rapid feedback cycle while adding new features. Or, you're refactoring and need to protect against regressions. How do you improve the feedback cycle while working in a legacy codebase?

You may be working on a medium-sized app whose entire test suite could benefit from improved feedback cycles. You'll incrementally move towards a high-low approach and continually reap the benefits along the way. How do you introduce high-low testing with the goal of easy maintainability as the app ages?

If you're working with a greenfield app, you may not benefit immediately from high-low testing. It requires discipline and can potentially be detrimental the code. Nonetheless, if you plan on maintaining the project for the long haul, you'll see near constant test suite times and improved productivity. How do you apply high-low testing without inhibiting the software design?

Regardless of application size, our primary goal is to integrate a sane testing philosophy that doesn't constrain the feedback-driven workflow we practice daily. Fast tests are great for building and refactoring, but we also want additional feedback mechanisms that highlight smells in our code.

Dependencies are a necessary evil

Developer happiness is important. When I'm unhappy, it's likely due to intermingled dependencies or a deranged pair. I can't fix crazy. Anyone who's taken on a rescue project knows the first thing you do is upgrade dependencies (maybe the very first thing you do is write some tests.) As a first step, it's also the most demoralizing because you don't just "upgrade dependencies," you trudge through the spaghetti, ripping things out, rewriting others, and you crawl out the other side in hopes that everything still works. This is an extreme case of exactly what happens in many Rails applications.

Dependencies are evil. I mean both third party dependencies and those you write yourself. Any time one piece of code depends on another, there's an immediate contingency on maintaining the relationship. Realistically, there's no way to remove all dependencies, not that you'd want to anyway. Building maintainable software is primarily about organizing dependencies so they don't pollute each other. They're a necessary and unavoidable evil, but left unkept, they'll eventually erode the codebase.

With Rails in particular, it's almost brainless to introduce dependencies. Just drop a line in the Gemfile. Or, just reference a constant by name. When the app needs it, it's ready and available or is immediately autoloaded. Such dependency management propagates itself throughout the framework and the applications built on top. Most apps I've worked on buy into this convenience to maximize developer productivity, but this productivity boost early in a project becomes a hindrance as it comes of age.

Unsurprisingly, the problem with dependencies is much more prolific than loading the world via a Gemfile. It works its way through every aspect of every piece of software. Rails unfortunately encourages dependency coupling through eagerloading and reliance on the framework. The goal of high-low testing is to remove the discomfort of intermingled dependencies. The resulting code is often better factored, easier and faster to test, and more pleasurable to work with.

Getting started

High-low testing is the practice of working at extreme poles in your test suite. You'll work at the acceptance level to validate happy and questionable paths, and you'll work at the lowest level possible, with the units of your code. See the below diagram. Most Rails applications test across all layers. The integration layer (and acceptance layer) tends to be slow because it's evaluating many parts. When most of an application's business logic is tied to Rails, it must be tested at the integration layer. Depending on your style, you may continue to test at the integration layer, but high-low testing promotes doing so without depending on Rails.

The end result is a full test suite composed of two main characteristics: a speedy unit suite for rapid feedback, and a slow acceptance suite to guard against regressions. I've found the ratio of acceptance to unit tests to be largely weighted towards unit tests. The more tests that can be pushed to the unit level, the more rapid and rewarding the development cycle becomes. The most poignant benefits surface at the unit level, so that's where we'll focus our efforts.

Let's decompress high-low testing.

Testing high

Testing high means working at an acceptance level and driving the browser to verify most things are working as intended. You may cover around half of your application code with acceptance tests, but that's OK, since you'll also be writing unit tests to guard against the edge cases.

Acceptance testing can be done with any tool that drives the browser, but I prefer Capybara on Poltergeist.

In any project, using high-low testing or otherwise, acceptance tests should not exhaustively cover all possible use cases. It should cover basic cases, with further emphasis on features that provide high business value or expose risk. For instance, login and registration should be tested thoroughly, but testing validations on every form is superfluous.

Acceptance tests will always be slow. It's the product of many layers of implementation working together to produce an anthropomorphic interaction with the software. While unit tests are much more akin to computer-computer interaction, acceptance tests give us an extra bit of confidence because they're driving a real browser.

This extra bit of confidence goes a long way to building trust in your test suite, which is the primary reason we test in the first place. We must ensure the software works, with the goal of safe deployment when all dots are green. Few other forms of testing can induce such confidence, which is why acceptance tests are invaluable.

High level tests also execute the entire stack. Everything from the HTTP request, through the router, controller, model and database, and back up to the browser's rendering engine. This is particularly important because acceptance tests can cover parts of the Rails stack that are painful to test. I'm looking at views and controllers.

A single Cucumber feature could cover multiple views and controllers. Testing high means covering pieces of the application that would otherwise burden a developer with excessive maintenance. Unit testing views and controllers introduces unnecessary test churn, so the bang for your buck with acceptance tests is enormous.

Really, high-low testing doesn't change the way you write acceptance tests. Life carries on as usual. You may add some additional acceptance tests to guard against things that would otherwise be tested at the integration layer, but the style is largely the same. The significant changes to a traditional Rails testing approach come in the form of unit tests.

Testing low

Testing low means working at a unit level. When most people think about unit testing Rails apps, it often involves testing methods of models. By "models," I'm referring to classes that inherit from ActiveRecord::Base. In a fat model/skinny controller setup, unit tests would definitely include models. But I avoid fat models like the plague. I want a fast testing environment that encourages better object orientation. To do this, we need to separate ourselves from Rails. It might seem extreme, but it's the most successful way I've scaled projects with high complexity.

This all stems from the nature of dependencies. They (should) always flow downwards. Higher abstractions depend on lower abstractions. As a result, the most dependency-free code resides at the bottom of the stack. The closer we can push code into this dependency-free zone, the easier it is to maintain.

By decoupling our business logic from the framework, we can independently test the logic without loading the framework into memory. The acceptance suite continues to depend on the framework, but by keeping the unit suite separate from the acceptance suite, we can keep our units fast.

Let's look at some code.

Consider an app that employed the fat model, skinny controller paradigm where all of our business logic resides in models inheriting from ActiveRecord::Base. Let's look at a simple User class that knows how to concatenate a first and last name into a full name:

class User < ActiveRecord::Base
  def full_name
    [first_name, last_name].joins(' ')
  end
end

And the tests for this class:

describe User do
  describe '#full_name' do
    user = User.new(first_name: 'Mike', last_name: 'Pack')
    expect(user.full_name).to eq('Mike Pack')
  end
end

What's painful about this? Primarily, it requires the entire Rails environment be loaded into memory before executing the tests. Sure, early in a project's lifetime loading the Rails environment is snappy, but this utopia fades quickly as dependencies are added. Additionally, tools like Spring are brash reactions to slow Rails boot times, and can introduce substantial complexity. The increasingly slow boot times are largely a factor of increasing dependencies.

High-low testing says to move methods out of classes that depend on Rails. What might traditionally fall into an ActiveRecord model can be moved into a decorator, for instance.

require 'delegate'

class UserDecorator < SimpleDelegator
  def full_name
    [first_name, last_name].joins(' ')
  end
end

Notice how we've manually required the delegate library. We'll investigate the benefits of this practice later in the article. The body of the #full_name method is identical, we really didn't need to be anywhere close to Rails for this behavior.

The UserDecorator class would likely be instantiated within the controller, though it could be used anywhere in the application. Here, we allow the controller to prepare the User instance for the view by wrapping it with the decorator:

class UsersController < ApplicationController
  def show
    @user = UserDecorator.new(User.find(params[:id]))
  end
end

Our view can then call @user.full_name to concatenate the first name with the last name.

The test for the UserDecorator class remains similar, with a couple differences. We'll mock the user object and refer to our new UserDecorator class:

require 'spec_helper'
require 'models/user_decorator'

describe UserDecorator do
  describe '#full_name' do
    user = double('User', first_name: 'Mike', last_name: 'Pack')
    decorator = UserDecorator.new(user)

    expect(decorator.full_name).to eq('Mike Pack')
  end
end

There are a few important things to note here. Again, we're explicitly requiring our dependencies: the spec helper and the class under test. We're also mocking the user object (using a double). This creates a boundary between the framework and our application code. Essentially, any time our application integrates with a piece of Rails, the Rails portion is mocked. In large applications, mocking the framework is a relatively infrequent practice. It's also surprising how little the framework does for us. Compared to the business logic that makes our apps unique, the framework provides little more than helpers and conventions.

“Rails is a delivery mechanism, not an architecture.” – Uncle Bob Martin

It's worth noting that mocking can be a powerful feedback mechanism. In the ideal world, we wouldn't have to mock at all. But mocking in this instance tells us, "hey, there's an object I don't want to integrate with right now." This extra feedback can be a helpful indicator when striving to reduce the touchpoints with the framework.

Requiring spec_helper is likely not new to you, but the contents of our new spec_helper will be. Let's look at a single line from a default spec_helper, generated by rspec-rails:

require File.expand_path("../../config/environment", __FILE__)

This is the devil, holding your kitchen sink. This single line will load most of Rails, all your Gemfile dependencies, and all your eager loaded files. It's the "magic" behind being able to freely access any autoloaded class within your tests. It comes with a cost: speed. This is the style of dependency loading that must be avoided. Favor fine-grained loading when speed is of the essence.

Here is what a baseline spec_helper would look like in a high-low setup. We require the test support files (helpers for our tests), we add the app/ directory to the load path for convenience when requiring classes in our tests, and we configure RSpec to randomly order the tests:

Dir[File.expand_path("../support/**/*.rb", __FILE__)].each { |f| require f }

$:.unshift File.expand_path("../../app", __FILE__)

RSpec.configure do |config|
  config.order = "random"
end

Nothing more is needed to get started.

With this setup, you could be testing hundreds or thousands of classes and modules in sub-second suite times. As I've already expressed, the key is separating our tests from Rails. It turns out, most of the business logic in a typical Rails application is plain old Ruby. Rails provides some accessible helpers, but the bulk of logic can be brainlessly extracted into classes that are oblivious to the framework.

Isolating the framework

I want to be clear about something. For the purposes of this article, I'm referring to "integration tests" as those that utilize the framework. Integration means nothing more than two pieces working together, and integration testing is strongly encouraged. The style of integration testing that's discouraged is that which integrates with the framework. I am not advocating that you mock anything but the framework. Heavy mocking has known drawbacks, and should be avoided.

Mock only the pieces that rely on the framework. Integration between low-level business objects can often be safely tested in conjunction without mocking. What's important is isolating the framework from the business logic. The below diagram illustrates this separation.

The important thing to take away from this diagram is that below the mocking layer, you can test in whatever style is most comfortable. If you want to test multiple objects as they execute in symphony, introduce an integration test. In the cases where you're testing simple business rules that have no dependencies, unit test them normally. In either case, each file will explicitly require its dependencies.

If the mocking layer becomes cumbersome, additional tooling can be leveraged to more easily work with it. For instance, factory_girl can be used to encapsulate the data required of the mocks and separate the framework from the business logic. Notice this factory uses an OpenStruct instead of the implicit User class, which forms the moat needed to keep Rails out of the tests:

require 'ostruct'

FactoryGirl.define do
  factory :user, class: OpenStruct do
    first_name 'Mike'
  end
end

By using OpenStruct, instances created by FactoryGirl will be basic OpenStruct objects instead of objects relying on Rails. We can use these surrogate factories as you would a normal factory:

it 'has a first name' do
  expect(build(:user).first_name).to eq('Mike')
end

Example refactor to isolate the framework

This isolation from Rails encourages better object orientation. Let's refactor a fat model into the more ideal high-low testing structure. The example below is inspired by an actual class in a large client project. I've rewritten it here for anonymity and brevity. The class encapsulates a cooking recipe. It performs two main behaviors:

  • calculate the total calories in the recipe
  • persist a value when someone favorites the recipe (using Redis::Objects)
class Recipe < ActiveRecord::Base
  has_many :ingredients
  has_many :foods, through: :ingredients

  def total_calories
    foods.map(&:calories).inject(0, &:+)
  end

  def favorite
    Redis::Counter.new("favorites:recipes:#{id}").increment
  end
end

A naive programmer would consider this sufficient. It works, right? There's one argument I find compelling in favor of keeping it this way: "if I needed to find something about recipes, the Recipe model is the first place I'd look." But this doesn't scale with domain fragmentation and comprehension. At some point, the model contains many thousands of lines of code, becoming very tiresome to work with. Thousands of small classes with hundreds of lines of code each is much more comprehensible than hundreds of classes with thousands of lines of code.

An experienced programmer will instantly spot the problems with this class. Namely, it performs disparate roles. In addition to the hundreds of methods Rails provides through ActiveRecord::Base, it can also calculate the total calories for the recipe and persist a tangential piece of information regarding user favorites.

If the programmer were cognizant of high-low testing, their instincts would tell them there's a better way to structure this code to enable faster and improved tests. As an important side effect, we also produce a more object-oriented structure. Let's pull the core business logic out of the framework-enabled Recipe model and into their own standalone classes:

class Recipe < ActiveRecord::Base
  has_many :ingredients
  has_many :foods, through: :ingredients

  def total_calories
    CalorieCalculator.new(foods).total
  end

  def favorite
    FavoritesTracker.new(self).favorite
  end
end

By making this minor change, we gain a slew of benefits:

  • The code is self documenting and easier to reason about without having to understand the underlying implementation.
  • The multiple responsibilities of calculating the total calories or persisting favorites has been moved into discrete classes.
  • The newly introduced classes can be tested in isolation of the Recipe class and Rails.
  • Dependencies have been illuminated in require statements and inputs into methods.

Before this improvement, to test the #total_calories method, we were required to load Rails. After the refactor, we can test this logic in isolation. The original implementation calculated total calories from foods fetched through its has_many :foods association. Though not impossible, under test, it would have been nasty to calculate the total calories from something other than the association. In the below code, we see it's quite easy to test this logic without Rails:

require 'spec_helper'
require 'models/calorie_calculator'

describe CalorieCalculator do
  describe '#total' do
    it 'sums the total calories for all foods' do
      foods = [build(:food, calories: 50), build(:food, calories: 25)]
      calc = CalorieCalculator.new(foods)

      expect(calc.total).to eq(75)
    end
  end
end

The original code encouraged us to use the database to test its logic, which is unnecessary since the logic works purely in memory. The improved code breaks this coupling by making implicit dependencies explicit. Now, we can more easily test the logic without a database.

We're one step closer to removing Rails from the picture; let's realize this goal now. Everywhere that Recipe#total_calories was being called can be changed to refer to the new CalorieCalculator class. Similarly for the #favorite method. This refactor would likely be done in the controller, but maybe in a presenter object. Below, the new classes are used in the controller and removed from the model:

class RecipesController < ApplicationController
  before_filter :find_recipe

  def show
    @total_calories = CalorieCalculator.new(@recipe.foods).total
  end

  def favorite
    FavoritesTracker.new(@recipe).favorite

    redirect_to :show
  end
end

Now our Recipe class looks like this:

class Recipe < ActiveRecord::Base
  has_many :ingredients
  has_many :foods, through: :ingredients
end

An end-to-end test should be added to round out the coverage. This will ensure that the controller truly does call our newly created classes.

As you can see from the above example, high-low testing encourages better object orientation by separating the business logic from the framework. Our first step was to pull the logic out of the model to make it easier and faster to test. This not only improved testability but drove us towards a design that was easier to reason about.

Our second step reintroduced the procedural nature of the program. Logic flows downwards in the same sense that computers execute instructions sequentially. Large Rails models often contain significant amounts of redirection as models talk amongst each other. By moving our code back into the controller, we've reduced this redirection and exposed its procedural nature, making it easier to understand. As we continue through the article, we'll look at even more ways high-low testing can improve the quality of a codebase.

Feedback, feedback, feedback

High-low testing is all about providing the right feedback at the right times. Do you need immediate feedback about application health as you work your way through a refactor? Keep your unit tests running on every change with Guard. Do you need a sanity check that everything is safe to deploy? Run your acceptance suite.

It's been said many times by others, but feedback cycles are pertinent to productive software engineering. The goal of high-low testing is to provide quick feedback where you need it, and defer slow feedback for sanity checks. The shortcoming of many current Rails setups is they don't delineate these two desires. Inevitably, everything eventually falls into the slow feedback cycle and it becomes increasingly difficult to recuperate.

Let's look at how we can embrace short feedback cycles in applications of all sizes. Unless you're curious, feel free to skip sections that don't apply to your current projects. We'll talk about working in legacy, middle-aged and greenfield applications.

Working with legacy codebases

The truth about legacy codebases is they will always be legacy. No matter how aggressively we refactor, there will always be a sense of technical debt. The goal is to eliminate future debt.

It's easy to apply high-low testing principles to legacy codebases, but it's done in addition to existing tests. If acceptance tests are already in place, continue to acceptance test in the prescribed fashion. If they're not in place, begin adding them as new features are built. Backfilling tests for existing features is also a great step towards refactoring.

You won't be able to immediately affect existing unit tests. These will require refactoring. But you can introduce new functionality as a suite that runs independent from any existing tests. To do so, create a new thin spec_helper, as described above, and begin requiring that instead of your normal spec_helper that loads Rails. You will now have three test suites:

  • Your normal acceptance suite
  • Your existing unit suite that loads Rails
  • Your new unit suite that does not load Rails

So, how does it fit into your workflow? The way I've done it is to keep my Rails-less unit suite continuously running with Guard. New functionality will be tested in the new manner, and the rapid feedback cycles will be immediately beneficial.

Run your acceptance suite as you would normally, but break your unit tests into separate directories. my_app/spec/slow and my_app/spec/fast would be one way to accomplish this. You'll then run your unit suites independently:

rspec spec/slow
rspec spec/fast

The latter should be run with Guard. If you want to run all units, just truncate the directory:

rspec spec

Of course, this will load the Rails environment and you'll lose the benefits of speedy tests, but it can be done as a sanity check before a deploy, for instance.

The unit suite that requires Rails will continue to be slow, as will the acceptance tests. The slow unit suite will act as checks against regressions, much like the acceptance suite. The new, faster suite will keep you productive moving forward and act as a goalpost for refactorings. To maximize the benefits of the new suite, existing unit tests and code can be reengineered to use it. In the ideal sense, all units of code would be refactored away from Rails, but I won't blow smoke up your ass.

Legacy codebases aren't the most conducive to high-low testing but medium-sized codebases are a perfect fit. Let's look at those next.

Working with medium-sized codebases

High-low testing feels most natural while working in semi-mature codebases which exhibit continual growth yet don't feel bogged down by past decisions. The business domain within codebases of this size is well understood, which means the abstractions encouraged by high-low testing can be expressed easily. Further, while medium-sized codebases still progress regularly, there's often enough time and resources to thoroughly consider design decisions for the sake of longevity.

To be clear, when I use the terms "legacy" and "medium-sized" codebases, I mean for them to be mutually exclusive. A medium-sized codebase can often feel legacy, but I'm referring to medium-sized codebases who have either employed high-low testing from the get-go or have refactored into a high-low testing paradigm.

As it pertains to medium-sized codebases, high-low testing exudes its worth ten times over. The ability to add functionality, refactor, and delete features becomes significantly faster than if a traditional fat model, skinny controller approach was taken. The primary benefit is, of course, the speed of the unit suite.

Consider the case where you're asked to share a feature amongst varying but not dissimilar parts of an application, say billing. If originally the billing code was constructed to work in only one part of the app and now needs to be shared across multiple features, this would generally be considered a serious effort. If the test suite loads Rails, you could be looking at a 3-5 second delay every time the suite is run. Likely, these tests are also slow as they depend on many things and utilize sluggish and unnecessary IO-bound operations.

With high-low testing, hundreds or thousands of specs can be run in under a second with little to no boot time. The feedback is immediate. This is achieved exclusively by mocking slow dependencies. Mocking the database, API responses, file reads, and otherwise slow procedures keeps the suite fast.

Throughout a refactor, the test suite should always be a measure of progress. Normally, you begin by ripping out existing functionality, breaking a ton of tests. You sort of eyeball it from there and begin constructing the necessary changes. You update the code and the tests, and there's a large degree of uncertainty. Don't worry about proceeding while the tests are red. You'll be red, and you'll remain red until you're done refactoring.

By continually running the test suite on every change (because it's so fast), you'll always have a sense of how far you've strayed. Are things breaking that you didn't expect? Is the whole suite failing or only the related segments? Continually running the tests is an excellent reminder of the progress of your refactor. If you bite off too much, your tests are there to tell you. This rapid workflow is unachievable with most suites I've seen.

While high-low testing is a perfect match for medium-sized apps, it can and should be applied throughout the project's lifetime. Let's next look at the implications throughout an app's inception.

Working with greenfield apps

Admittedly, high-low testing is not entirely natural in greenfield apps. It works best when the abstractions are well understood. In a greenfield app, the abstractions are often impossible to see. These apps are changing so rapidly that it would be more harmful to introduce an abstraction than to carry on blindly. Abstractions are generalizations of implementation. With rapid additions, the generalizations being made are full of folly.

But there is a way to apply high-low testing to such a situation, and it requires careful forethought and a bit of patience. The end result is almost always a better design and a stronger basis for refactoring. Let's look at a situation where it's difficult to see abstractions, and therefore difficult to decouple the framework from the business logic.

Say you're building a blog, and you need to list all the posts. Using the Rails way, you'd find yourself constructing a query in a controller actions:

class PostsController < ApplicationController
  def index
    @posts = Post.published
  end
end

Surely you either want to test that .published was called on Post or that .published returns the correct set which excludes unpublished posts. It feels awkward to abstract this single line into its own class. If you were to abstract it, what would you call the class? PostFinder? ViewingPostsService? You don't yet have enough context to know where this belongs and what it will include in the future.

One might say you should extract it anyway under the assumption that it will shortly change. Since the unit tests are so fast, it will be an easy refactor. This is a fair assessment. I don't have a silver bullet, but I will say this. Hold off on the abstraction. Don't create another class, and don't unit test this logic. Build a barebones acceptance test to cover this. A simple "the posts index page should have only published posts" acceptance test will often suffice. The point at which you better understand how this logic will change over time, extract it into its own class and write unit tests.

I'm still experimenting in this area, but I think it's worth practicing to form a discipline and awareness of the implications. The goal, of course, is to find a healthy balance of introducing the right abstractions at the right time. With patience, high-low testing is an effective way of achieving that goal.

Ancillary feedback

Testing high-low exposes some fantastic ancillary benefits that sort of just fall into your lap. Foremost is the constant feedback from transparent dependency resolution.

In most Rails apps, you see a lot of code that utilizes autoloading. It's a load-the-world approach that we're trying to avoid. For instance, we may have one class that refers to another:

class User < ActiveRecord::Base
end
class WelcomeUserService
  def send_email(id)
    @user = User.find(id)
    # Send the user an email
  end
end

How did the WelcomeUserService class find the User class? Through conventions and a little setup, Ruby and Rails was able to autoload the User class on demand, as it was referenced. In short, Rails uses #const_missing to watch for when the User class is referenced, and dynamically loads it at runtime.

In the above example, there is no reason for the WelcomeUserService class to have to manually require the User class. This is a nice convenience, but imagine your class is growing over time and eventually autoloads 10 dependent classes. How will you know how many dependencies this class has? I'll say it again, dependencies are evil.

With high-low testing, you're removing yourself from Rails' autoload behavior. You'll no longer be able to benefit from autoloading, which is a good thing. If you can't live without autoloading, you could construct your own autoloading setup, but I would encourage you to consider the following.

Every time you build a class or test disjunct from Rails, you must explicitly enumerate all of your requirements. The WelcomeUserService class would look more like this:

require 'models/user'

class WelcomeUserService
  def send_email(id)
    @user = User.find(id)
    # Send the user an email
  end
end

Notice we're explicitly requiring the User class at the top. Any other dependencies would be required in a similar fashion. For our tests, we would also require the class under test:

require 'spec_helper'
require 'services/welcome_user_service'

describe WelcomeUserService do
  ...
end

This might appear to be a step backwards, but it's most certainly not. Manually requiring your dependencies is another feedback mechanism for code quality. If the number of dependencies grows large for any given class, not only is it an eyesore, but it's an immediate reminder that something may be awry. If a class depends on too many things, it will become increasingly difficult to setup for testing, and may be an indicator of too many responsibilities. Manually requiring dependencies keeps this in check.

In general, the more feedback mechanisms you can create around dependency coupling, the higher the object cohesion. Such feedback mechanisms encourage better object-oriented design.

Know thy dependencies

Through practicing this technique, one thing has become overwhelmingly apparent: working at the poles produces more dependency revealing code. That's what creating fast tests is all about. If the dependencies are transparent, obvious and managed, building fast tests is simple. If a system's unit tests avoid wandering towards higher abstractions, the dependencies can be managed.

What we're really trying to do is wrangle dependencies in a way that doesn't inhibit future growth. There are two options: boot-the-world style eager loading or independently requiring where necessary. In the interest of fast tests and providing additional feedback mechanisms, high-low testing encourages the latter.

Finding a balance

Testing high means you can feel confident that the individual pieces of your software are glued together and functional. This confidence goes a long way when you're shipping to production multiple times per day. You'll feel more comfortable in your own skin and vehemently reassured that the work you're doing is productive.

Testing low means you can focus on the individual components that comprise the functional system. The rapid feedback mechanisms provided by high-low testing aid in forging well-crafted software. The speed of the unit suite can be a precious asset when refactoring or quickly check the health of the system. Fast tests that encourage better object-orientation helps release the endorphins needed to keep your team happy and stable.

Like everything else in software engineering, deciding whether to apply high-low testing is full of tradeoffs. When is the right time to create abstractions? How important are fast tests? How easy is it for the team to understand the goals? These are all highly relevant questions that each organization must discuss.

If a team decides to practice high-low testing, the benefits are invaluable. Near-instantaneous unit tests at scale, better object-oriented design, and a keen understanding of system dependencies are amongst the perks. Such perks can prevent code cruft, especially in larger codebases. Since the principles of high-low testing can be easily conveyed to others, it becomes a nice framework for discussion.

Even if the approaches discussed in this article don't fit your style, the core concepts can be the foundation for further enlightenment. It may seem academic, but I will attest that over the past four years of applying high-low testing, I've found my code to be better factored and easier to maintain. Your mileage may vary, but with enough persistence, I believe most teams will largely prosper both in productivity and happiness.

This concludes the explanation of high-low testing, so feel free to finish reading. If you'd like to explore more real-world tradeoffs, please continue reading the following sections.






The Appendix — Some unanswered questions

This article has mostly outlined some high level details about high-low testing. We haven't addressed many common issues and how to accomplish fast tests within the confines of a real project. Let's address that now.

High-low is not a prescription

What's being described here is not a prescription for success, but an outline for fast tests. Developer happiness is the primary goal. Applying these techniques to your codebase does not guarantee you will produce better object-oriented code. Though alone it will likely encourage better choices, additional architectural paradigms are required to create well-designed and maintainable code.

There are plenty such architectural paradigms that accomplish the goal of decoupling the application from the framework. Two such examples are DCI and hexagonal architecture. Other less-comprehensive examples can be found in this excellent article on decomposing ActiveRecord models. In this article, we've used decorators and service objects as examples of framework decoupling.

Why is testing Rails unique?

In just about all the apps I've worked on, I can say that I've felt happy testing only a handful. As a community growing out of it's infancy, we're beginning to realize that the prescribed testing patterns passed down from previous generations are inadequately accommodating the problems we're facing today. Particularly, problems of scale. As compared to even five years ago, Rails has grown, dependencies have grown, apps have grown and teams have grown.

The same principles from the Java world we so vehemently defied are the only principles that will save this community. No, we don't want Rails to become J2EE. Introducing new and impressionable blood to the Ruby world *should* be done in such a way that encourages ease of use. Ease of use often comes from few abstractions, which is why we're here today.

Rails hates business abstractions. Coincidentally, framework abstractions are OK, like routing constraints. You're encouraged to use three containers: models, views and controllers. Within those containers, you're welcome to use concerns, which are a displacement of logic. 

The problem is probably better described as frameworks who discourage abstractions. This is littered throughout Rails. Take, for example, custom validators. You'd think to yourself, "if this system is properly constructed, I'd be able to use standard validators in my custom validator." You'd be partially right.

Let me show you an example. Say I want to use a presence validator in my custom validator:

class MyCustomValidator < ActiveModel::Validator
  def validate(record)
    validator = ActiveModel::Validations::PresenceValidator.new(attributes: [:my_field])
    validator.validate(record)
  end
end

This works, if you expect your model to be intrinsically tied to your validations. The above code will properly validate the presence of :my_field, but when the validation fails, errors will be added to the model. What if your model doesn't have an errors field? The PresenceValidator assumes you're working with an ActiveModel model, as if no other types of models exist. Not only that, but your model also needs to respond to various other methods, like #read_attribute_for_validation.

I want something more akin to this API:

class MyCustomValidator < ActiveModel::Validator
  def validate(record)
    validator = ActiveModel::Validations::PresenceValidator.new
    validator.validate('A value') #=> true
    validator.validate(nil) #=> false
    validator.validate('') #=> false
  end
end

There is no need for the PresenceValidator to be tied to ActiveModel, but it is and so are many other things in Rails. Everyone knows, if you don't follow The Rails Way, you'll likely experience some turbulence. Yes, there should be guidelines, it should not be the frameworks job to mandate how you architect your software.

Isn't the resulting code harder to understand?

An argument could certainly be made to illustrate the deficiencies of code produced as a result of high-low testing. The argument would likely include rhetoric around a mockist style of testing. Indeed, a mockist style can certainly encourage some odd approaches, including dependency injection and superclass inheritance mocking.

These arguments are valid and exhibit important trade offs to weigh. However, if the mocking boundaries are well-defined and controlled, they can be minimized.

The most prominent mocking territory in high-low tested apps is the boundary between Rails and the application logic. In most large applications I've seen, the application logic comprises the majority of the system. The framework leverages the application on the web. It's important to clearly define this boundary so that the framework does not bleed into the domain.

If this separation is properly maintained, the amount of additional mockist style techniques can be nonexistent, if desired. The application logic can be tested in a classical, mockist or other style, depending on the preferences of the team.

Don't fall down the slippery slope

You may be inclined to load a single Rails dependency because you're used to various APIs. ActiveSupport, for instance, contains some useful tools and you may feel inclined to allow your application to rely on such a library. Adding it seems harmless at first, but beware of the slippery slope of dependencies.

You might think to yourself, "adding ActiveSupport will add, what, 1 second to my test suite boot time?" And you may be correct. But when you consider that ActiveSupport comprises about 43K LOC or 37% of the entire Rails codebase as of 4.1.1, and that you'll likely be using a very very small fraction of the packaged code, the tradeoffs feel far too imbalanced.

Instead, build your own micro-library to handle the conveniences you need. Maybe this request seems ludicrous, but I guarantee you'll be surprised by how few methods you actually need from ActiveSupport. The tradeoff for speed is compelling and warranted.

What about validations, associations and scopes?

You have a few options, none of which are ideal.

Firstly, you shouldn't be testing associations. Let's be serious, if your associations aren't in place, a lot is broken.

If you must test your validations because they're complex or you don't believe the single line it took to write it is enough of an assurance, one option is to test them at the acceptance level. This is a slow means of verifying validations are in place, but it is effective:

Scenario: User signs up with incomplete data
  Given I am on the registration page
  When I press "Submit"
  Then I should see "Username can't be blank"
  And I should see "Email can't be blank"

Slow and cumbersome, but effective.

If you're testing a complex validation, it's probably best to move the validation into a designated ActiveModel::Validator class. Since this class depends on Rails, you'll need to compose some objects so Rails isn't loaded within your tests. It works well to have a mixin that does the true validation, which gets included in your ActiveModel::Validator class:

module ComplexValidations
  def validate(record)
    record.errors[:base] = 'Failed validations'
  end
end

This would be mixed into the validator:

class ComplexValidator < ActiveModel::Validator
  include ComplexValidations
end

Testing the mixin doesn't require Rails:

require 'spec_helper'
require 'validations/complex_validations'

describe ComplexValidations do
  let(:validator) do
    Class.new.include(ComplexValidations).new
  end

  it 'validates complexities' do
    errors = {}
    record = double('AR Model', errors: errors)

    validator.validate(record)

    expect(errors[:base]).to eq('Failed validations')
  end
end

Another technique, and my preferred approach, is to introduce a third suite, similar to an existing unit suite in a legacy codebase. The third suite would live exclusively for testing things of this nature and comprised of slow unit tests. Of course, this suite should not be continually run with Guard, but will exist for situations that need it, like testing validations, associations and scopes. Be careful though, this introduces a slippery slope that makes it easy to neglect high-low testing and throw everything into the slow unit suite.

Scopes are tricky because they do contain valuable business logic that does warrant tests. Just like validations, verifying scopes can be moved to the acceptance level, but my recommended approach is to introduce a slow unit suite to test these.

If like me, you cringe at the thought of introducing a slow unit suite, one option is to stub the scopes. Set up a mixin like the previous example and use normal Rails APIs:

module PostScopes
  def published
    where(published: true)
  end
end

When testing, just assert that the correct query methods are called:

describe PostScopes do
  subject do
    Class.new.extend(PostScopes)
  end

  describe '.published' do
    it 'queries for published posts' do
      expect(subject).to receive(:where).with(published: true)

      subject.published
    end
  end
end

More complex mocking is necessary if the scope performs multiple queries:

def latest
  where(published: true).order('published_at DESC').first
end
describe '.latest' do
  it 'queries for the last published post' do
    relation = double('AR::Relation')

    expect(subject).to receive(:where).with(published: true).and_return(relation)
    expect(relation).to receive(:order).with('published_at DESC').and_return(relation)
    expect(relation).to receive(:first)

    subject.latest
  end
end

This level of mocking is too aggressive for my taste, but it works and allows for all the convenient scoping behavior we're used to. It's a little brittle, but damn fast.

Posted by Mike Pack on 07/15/2014 at 10:45AM

Tags: testing, rails, ruby, high-low, rspec


The First Step to Applying Design Patterns: Don't

Design patterns are awesome. The more we build software with them in mind, the better off we'll be as a community. They can help us elegantly construct solutions which can be readily discussed with peers. They're common solutions to common problems. They're not just common solutions, however. They're battle tested, proven, performant and generally considered "the best" solution. Design patterns are the apotheosis, the epitome, of solution.

In this article, I'll look at varying levels of design pattern application, starting from worse to better, and ultimately landing on what I would consider the utopia of software engineering. The ideas in this article are largely derived from what I've observed, devoted and reasoned about.

Working from Scratch

Personally, one of the most compelling exercises in software engineering is exploration. Just like in any other engineering field, the problem set expands indefinitely, and thusly, our solution set. As businesses strive to keep a competitive edge, engineers must continue to solve problems which are both new and challenging. Through the process of solving new problems, we manage to come up with some not-so-pleasant-to-work-with solutions. Doing so is natural and healthy and is just about the only way we can continue to improve, especially when first learning. In fact, we're going to create one of those not-so-good solutions right now.

Take, for example, a simple arithmetic problem:

1 + 1 = 2

Imagine modern programming languages didn't have a + operator. Knowing the result is 2, how would you prove the Left Hand Side (1 + 1)? Well, let's briefly explore one option. For all intents and purposes, the following could be written in pseudocodde. It's not the running code that matters, it's the exploration process which invokes an active mind.

Let's stick to Ruby idioms and define a class, with a method, +:

class Fixnum
  def +(other)
    if self.value == 1 and other.value == 1
      2
    end
  end
end

Ruby's + oprator aids in this process, but let's call the + method directly:

1.+(1) #=> should == 2

Nothing tricky going on here. What if we want to evaluate 1 + 2? The most obvious thing is to add some conditional branching to our + method:

class Fixnum
  def +(other)
    if self.value == 1 and other.value == 1
      2
    elsif self.value == 1 and other.value == 2
      3
    end
  end
end

This is where our solution starts to fall apart. While this would work with a minimal set of operands, as our set grows, our conditional logic grows linearly, if not exponentially. At this point in the exploration process, we probably want to reconsider our solution. It's easy to recognize this first iteration is heading down the wrong path. Given some background in computer science, you might try refactoring this solution to use binary instead.

Feel free to skip the following code, it's not the destination that matters, but the journey by which we got there. Here's our final binary addition code:

class BinaryPlus
  def initialize(first, second)
    @first, @second = first, second
    # to_s accepts a base to convert to. In this case, base 2.
    @first_bin  = @first.to_s(2)
    @second_bin = @second.to_s(2)
    normalize
  end

  def +
    carry = '0'
    result_bin = ''

    @max_size.times do |i|
      # We want to work in reverse, from the rightmost bit
      index = @max_size - i - 1
      first_bit, second_bit = @first_bin[index], @second_bin[index]

      if first_bit == '1' and second_bit == '1'
        result_bin << carry
        carry = '1'
      else
        if first_bit == '1' or second_bit == '1'
          if carry == '1'
            result_bin << '0'
            # carry remains 1
          else
            result_bin << '1'
            carry = '0'
          end
        else
          result_bin << carry
          carry = '0'
        end
      end
    end

    # Is there still a carry hangin' around?
    result_bin << '1' if carry == '1'

    result_bin.reverse.to_i(2)
  end

  private

  def normalize
    # We want both binary numbers to have the same length
    @max_size   = @first_bin.size < @second_bin.size ? @second_bin.size : @first_bin.size
    @first_bin  = @first_bin.rjust(@max_size, '0')
    @second_bin = @second_bin.rjust(@max_size, '0')
  end
end

For which we would call with:

BinaryPlus.new(3, 4).+ #=> should == 7

At this point, we've managed to weave our way through a forest of solutions to land on one that doesn't require us to change the code to accommodate new operands. Aside from increasing the maintainability of the code, going through this process has likely taught us a few things about doing basic arithmetic in Ruby:

  • The underlying principles are not as simple as the syntax leads us to believe.
  • Considering potential operands is likely something we should do before writing code (TDD).
  • Ruby has built-in methods for base conversion.
  • (The list goes on depending on the explorer.)

This process is both fruitful and enlightening. It's one of beauty and purity. Only by actually solving a problem can we truly say we've conquered it. This is the sensation I seek every day. That of utter accomplishment. This is software engineering, and only through time can be become better at finding maintainable solutions.

There's a catch. I'm not the best problem solver in the world, and neither are you. Individually, we simply can't grasp the vast landscape of problems, much less solve them all enough times that we can confidently say we have the best solution. Collectively, we all strive for the best solutions and combine our results. It's called the Gang of Four, not the Gang of One.

Aside: For those interested in the actual source of Fixnum#+, check out the Ruby source. Also, more on binary arithmetic.

Applying Design Patterns

We live in a beautiful age where all problems are already solved. We can thank Leonard Euler, Carl Guass and Isaac Newton for advanced mathematics and forming the foundation of computer science. We can thank Ewald Christian von Kleist, Benjamin Franklin and Alessandro Volta for their work in electricity so we can program on the airplane. We can thank Alan Turing and Donald Knuth for modern computer science and Dennis Ritchie for C. We can thank Matz for Ruby. And we can thank the Gang of Four for design patterns.

Design patterns help us do one thing really well: think and speak in the abstract. Given a problem with input i1, i2 and i3, design patterns can help us elegantly solve such a problem by correct association of i1, i2 and i3. They're generic solutions to generic problems. By its very definition, a (software) pattern is a theme of recurring solutions. The primary benefit of using patterns is we can circumvent a large degree of work. We no longer have to reinvent the "undo" button, the command pattern has already been discussed and documented.

It's very easy to apply design patterns. All you have to do is know they exist. If I know the factory pattern exists and it's a tried-and-true technique of generating objects, all I have to do is research the pattern and follow the steps. With the resources available to us today, we can read about common problems and their resolutions with a single google. There's really no excuse for not applying design patterns at work every day. They can drastically simplify code, increase modularity, increase legibility, decrease duplication, improve translation to English, and the list goes on.

If I had to draw a conclusion right now, I would say "use design patterns." But I don't, so I would rather say something a little more robust.

Design patterns can be extremely helpful in crafting beautiful code, but the way in which they're applied often determines their usefulness. Applying a design pattern in the wrong scenario can push you into a corner, ultimately leading to more disarray than would have been present if it weren't for the design pattern. I'm going to pick on the singleton pattern a bit.

Singletons get a bad rep. In my opinion, rightfully so. Let's look at a situation where "applying a design pattern" can be discouraging.

We only have one file system, right? Naively, I'm thinking, "I know the singleton pattern, that would be a great fit here!" Let's create a file system singleton that writes some text to /dev/null:

require 'singleton'

class DevNullSingleton
  include Singleton

  def write(text)
    File.open('/dev/null', 'w') do |file|
      file.write text
    end
  end
end

We can use the singleton by referencing its instance:

DevNullSingleton.instance.write('Something to dev null')

Realistically, this has the same semantics as setting a global constant if we didn't want to use Ruby's singleton library:

DEV_NULL = DevNullSingleton.new
# ... later in the code ...
DEV_NULL.write('Something to dev null')

So, now our application grows, and we need another file system writer that outputs to /tmp. We're posed with a few options.

We can rename our singleton and allow the #write method to accept a path. The API looks like this:

FileSystemSingleton.instance.write('/dev/null', 'Something to dev null')
FileSystemSingleton.instance.write('/tmp', 'Something to tmp')

This is bad. If we want to write numerous things to /dev/null, we have a large degree of duplication:

FileSystemSingleton.instance.write('/dev/null', 'Something to dev null')
FileSystemSingleton.instance.write('/dev/null', 'Something ele to dev null')
FileSystemSingleton.instance.write('/dev/null', 'Another thing to dev null')

Alternatively, we can create a new singleton class that writes to /tmp:

class TmpSingleton
  include Singleton

  def write(text)
    # ...
  end
end

But now, every time we want to write to a different location on the file system, we need to create a new singleton class. Not great, either.

Probably, the better option is to break ties with the singleton and start instantiating classes normally:

class FileSystem
  def initialize(path)
    @path = path
  end

  def write(text)
    File.open(@path, 'w') do |file|
      file.write text
    end
  end
end

Now, when we want to write multiple times to /dev/null, we instantiate only once and use it as we would any other class:

dev_null = FileSystem.new('/dev/null')
dev_null.write('Something to dev null')
dev_null.write('Something ele to dev null')
dev_null.write('Another thing to dev null')

Here's My Gripe

I don't have anything against the singleton pattern, per se. I have issues with the process by which it's applied. In a number of cases, I've seen design patterns applied in the following steps:

  1. Read up on design patterns.
  2. Think in terms of design patterns.
  3. Apply design patterns.

It's really awesome to read as much as possible, but things start to fall apart around Step 2. If design patterns become the only lens by which you see your software, you'll inevitably end up pigeonholed like the singleton situation above.

Don't name your designs after patterns. This often happens because early in the design process you say, "I can use a singleton here!" So you go about defining classes as you're elaborating the design. Early, it makes sense to name something "FileSystemSingleton" so you can follow the design as it's being built. It acts as a form of documentation. However, it does that, and only that. "FileSystemSingleton" is no more descriptive or expressive than "FileSystem." In fact, it just adds noise. If you name something "BubbleSortStrategy" to denote the strategy pattern, but later compositionally apply subsequent "strategies", is it still technically a strategy? Is it a component of an overall strategy? Drop the "Strategy" and just call it "BubbleSort." That way, no matter whether your design is in fact the strategy pattern, a derivation thereof, or something completely different, it doesn't add clutter or confusion.

Don't design around patterns. Although it would be nice, we can't trust design patterns as the correct solution. For a majority of patterns, I would speculate that only a small amount of problems fit directly in the mold. In the above example, the singleton is not what we ultimately needed. If we hadn't been thinking "singleton, singleton, singleton" early in the design process, we probably wouldn't have ended up with that design. If we had taken a TDD approach to building out the file system writer, we would have likely just ended up with a normal Ruby class, no singletons involved. As software grows and changes, don't get pigeonholed by a design pattern.

Learning from Design Patterns

In the previous section on Applying Design Patterns, I said that all problems have been solved. This is, of course, not true. One of my primary fuel sources is solving problems that neither I've solved nor have I seen solved. That's not to say they haven't been solved, however. My problems are not unique snowflakes. The difference between normal problems and problems which can be readily solved with design patterns is a matter of exposure. We're not exposed to problems for which we've never seen, and therefore we do not readily have a solution. We must compose our own.

When I encounter new problems, I never think in terms of design patterns. I often think in terms of domain. My utopic engineering process consists of a boundless array of knowledge from which I comprise my own solution. I don't rely on one tool, methodology, or process to drive my software. I consume the problem and attempt to make educated decisions. This is the "engineering" part of software engineering. It's not the languages you know, the frameworks you use, or how retina-enabled your computer is. It's your ability to become completely engulfed in a problem, enough to sense its anatomy.

Take, for example, a recent project of mine: Pipes. Pipes evolved organically through deep discussion around the domain. Why does the probem exist, what are the currently known solutions, and how can we derive the best possible outcome? The question of "what design pattern should we use" never arrose. Design patterns should always be at the forefront of the discussion, however. Some of the architectural motivation was taken from the pipeline processing pattern. Studying the pipeline processing pattern evoked new ideas for which to draw transient conclusions. Ultimately, it was the exploration process combined with studying the pipeline processing pattern that lead to a solution I was happy to write home about.

Be a part of the exploration process. Discover how your solution fits into your domain, and your domain into your problem. It's more time consuming than jumping to a cookie-cutter solution, but it's lightyears more glorifying. The exploration process is what leads to interesting and eloquent implementations; ones that can be easily changed, apply to the domain, and have a dash of humanism. No matter how you program, being cognizant of design patterns is always desirable. Learning as much as possible and having varying perspectives is crucial.

Create your own design patterns. Solve problems how you would solve them, not how the Gang of Four solves them. Stay as well-informed as you can on known solutions and reflect on them regularly. Use design patterns as inspiration for better, more applicable solutions to your specific problem. Do not blindly apply them. Think first, then consider design patterns. This is software engineering.

Posted by Mike Pack on 10/02/2012 at 09:10AM

Tags: design patterns, ruby


DCI with Ruby Refinements

TL;DR - Have your cake and eat it too. Ruby refinements, currently in 2.0 trunk, can cleanly convey DCI role injection and performs right on par with #include-based composition. However, there's some serious caveats to using refinements over #extend.

Recently, refinements was added to Ruby trunk. If you aren't yet familiar with refinements, read Yahuda's positive opinion as well as Charles Nutter's negative opinion. The idea is simple:

module RefinedString
  refine String do
    def some_method
      puts "I'm a refined string!"
    end
  end
end

class User
  using RefinedString

  def to_s
    ''.some_method #=> "I'm a refined string!"
  end
end

''.some_method #=> NoMethodError: undefined method `some_method' for "":String

It's just a means of monkeypatching methods into a class, a (still) controversial topic. In the above example, the User class can access the #some_method method on strings, while this method is non-existent outside the lexical scope of User.

Using Refinements in DCI

Refinements can be used as a means of role-injection in DCI, amongst the many other techniques. I personally like this technique because the intention of the code is clear to the reader. However, it has some serious drawbacks which we'll address a bit later.

Let's say we want to add the method #run to all Users in a given context.

Our User class:

class User; end

Our refinement of the User class:

module Runner
  refine User do
    def run
      puts "I'm running!"
    end
  end
end

In the above refinement, we are adding the #run method to the User class. This method won't be available unless we specifically designate its presence.

Our DCI context:

class UserRunsContext
  using Runner

  def self.call
    User.new.run    
  end
end

Here, we're designating that we would like to use the refinement by saying using Runner. The #run method is then available for us to use within the context trigger, #call.

Pretty clear what's happening, yeah?

I wouldn't go as far as saying it carries the expressiveness of calling #extend on a user object, but it gets pretty darn close. To reiterate, the technique I'm referring to looks like the following, without using refinements:

user = User.new
user.extend Runner
user.run

Benchmarking Refinements

I'm actually pretty impressed on this front. Refinements perform quite well under test. Let's observe a few of role injection: using inclusions, refinements and extensions.

I ran these benchmarks using Ruby 2.0dev (revision 35783) on a MacBook Pro - 2.2 GHz - 8 GB ram.

Check out the source for these benchmarks to see how the data was derived.

#include (source)

Example

class User
  include Runner
end

Benchmarks

> ruby include_bm.rb
         user       system     total       real
include  0.560000   0.000000   0.560000 (  0.564124)
include  0.570000   0.000000   0.570000 (  0.565348)
include  0.560000   0.000000   0.560000 (  0.563516)

#refine (source)

Example

class User; end
class Context
  using Runner
  ...
end

Benchmarks

> ruby refinement_bm.rb
        user       system     total       real
refine  0.570000   0.000000   0.570000 (  0.566701)
refine  0.580000   0.000000   0.580000 (  0.582464)
refine  0.570000   0.000000   0.570000 (  0.572335)

#extend (source)

Example

user = User.new
user.extend Runner

Benchmarks

> ruby dci_bm.rb
     user       system     total       real
DCI  2.740000   0.000000   2.740000 (  2.738293)
DCI  2.720000   0.000000   2.720000 (  2.721334)
DCI  2.720000   0.000000   2.720000 (  2.720715)

The take home message here is simple: #refine performs equally as well as #include although significantly better than #extend. To no surprise, #extend performs worse than both #refine and #include because it's injecting functionality into objects instead of classes, for which we have 1,000,000 and 1, respectively.

Note: You would never use #include in a DCI environment, namely because it's a class-oriented approach.

Separation of Data and Roles

What I enjoy most about the marriage of refinements and DCI is that we still keep the separation between data (User) and roles (Runner). A critical pillar of DCI is the delineation of data and roles, and refinements ensure the sanctity of this concern. The only component in our system that should know about both data and roles is the context. By calling using Runner from within our UserRunsContext, we've joined our data with its given role in that context.

An example of when we break this delineation can be expressed via a more compositional approach, using include:

class User
  include Runner
end

The problem with this approach is the timing in which the data is joined with its role. It gets defined during the class definition, and therefore breaks the runtime-only prescription mandated by DCI. Furthermore, the include-based approach is a class-oriented technique and can easily lead us down a road to fat models. Consider if a User class had all its possible roles defined right there inline:

class User
  include Runner
  include Jogger
  include Walker
  include Crawler
  ...SNIP...
end

It's easy to see how this could grow unwieldy.

Object-Level Interactions and Polymorphism

Another pillar of DCI is the object-level, runtime interactions. Put another way, a DCI system must exhibit object message passing in communication with other objects at runtime. Intrinsically, these objects change roles depending on the particular context invoked. A User might be a Runner in one context (late for work) and a Crawler in another (infant child).

The vision of James Coplien, co-inventor of DCI, is tightly aligned with Alan Kay's notion of object orientation:

“I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages.” - Alan Kay

 

So, as roles are injected into data objects, do refinements satisfy the object-level interactions required by DCI? Debatable.

With refinements, we're scoping our method definitions within the bounds of a class. With modules, we're scoping our methods within the abstract bounds of whatever consumes the module. By defining methods within a module, we're essentially saying, "I don't care who consumes my methods, as long as they conform to a specific interface." Further, in order to adhere to Alan Kay's vision of object orientation, our objects must be dynamically modified at runtime to accommodate for the context at hand. The use of modules and #extend ensures our data objects acquire the necessary role at runtime. Refinements, on the other hand, do not adhere to this mantra.

Along similar lines, let's look at how refinements affect polymorphism. Specifically, we want to guarantee that a role can be played by any data object conforming to the necessary interface. In statically-typed systems and formal implementations of DCI, this is particularly important because you would be defining "methodless roles", or interfaces, for which "methodful roles" would implement. These interfaces act as guards against the types of objects which can be passed around. When we work with refinements and class-specific declarations, we lose the polymorphism associated with the module-based approach. This can be conveyed in the following example:

module Runner
  def run
    puts "I have #{legs} and I'm running!"
  end
end

# The Runner role can be used by anyone who conforms to
# the interface. In this case, anyone who implements the
# #legs method, which is expected to return a number.
User.new.extend Runner
Cat.new.extend Runner
Dog.new.extend Runner

# When we use refinements, we lose polymorphism.
# Notice we have to redefine the run method multiple times for each
# possible data object.
module Runner
  refine User do
    def run
      puts "I have #{legs} and I'm running!"
    end
  end

  refine Cat do
    def run
      puts "I have #{legs} and I'm running!"
    end
  end

  refine Dog do
    def run
      puts "I have #{legs} and I'm running!"
    end
  end
end

The really unfortunate thing about refinements is we have to specify an individual class we wish to refine. We're not able to specify multiple classes to refine. So, we can't do this:

module Runner
  refine User, Cat, Dog do # Not possible.
    def run
      puts "I have #{legs} and I'm running!"
    end
  end
end

But even if we could supply multiple classes to refine, we're displacing polymorphism. Any time a new data object can play the role of a Runner (it implements #legs), the Runner role needs to be updated to include the newly defined data class. The point of polymorphism is that we don't really care what type of object we're working with, as long as it conforms to the desired API. With refinements, since we're specifically declaring the classes we wish to play the Runner role, we lose all polymorphism. That is to say, if some other type, say Bird, conforms to the interface expected of the Runner role, it can't be polymorphically interjected in place of a User.

Wrapping Up

Refinements are a unique approach to solving role injection in DCI. Let's look at some pros and cons of using refinements:

Pros

  • #refine provides a clean syntax for declaring data-role interactions.
  • Refinements perform around 500% better than #extend in DCI.
  • The data objects are clean after leaving a context. Since the refinements are lexically scoped to the context class, when the user object leaves the context, it's #run method no longer exists.

Cons

  • We lose all polymorphism! Roles cannot be injected into API-conforming data objects at runtime. Data objects must be specifically declared as using a role.
  • We can't pass multiple classes into #refine, causing huge maintenance hurdles and a large degree of duplication.
  • We lose the object-level, cell-like interaction envisioned by Alan Kay in which objects can play multiple and sporatic roles throughout their lifecycle.
  • Testing. We didn't cover this, but in order to test refinements, you would need to apply the cleanroom approach with a bit of test setup. In my opinion, this isn't as nice as testing the results of method after using #extend.

While there's certainly some benefits to using refinements in DCI, I don't think I could see it in practice. There's too much overhead involved. More importantly, I feel it's critical to maintain Alan Kay's (and James Coplien's) vision of OO: long-lived, role-based objects performing variable actions within dynamic contexts.

After all this...maybe I should wait to see if refinements even make it into Ruby 2.0 .

Happy refining!

Posted by Mike Pack on 08/22/2012 at 09:13AM

Tags: dci, ruby, refinements


SOA for the Little Guys

My article titled "SOA for the Little Guys" was just published on RubySource. It covers breaking apart a monolithic app into services with testing and Sinatra as the driving forces.

Give it a read!

SOA for the Little Guys

Posted by Mike Pack on 02/13/2012 at 10:04AM

Tags: architecture, soa, sinatra, ruby


The Right Way to Code DCI in Ruby

Many articles found in the Ruby community largely oversimplify the use of DCI. These articles, including my own, highlight how DCI injects Roles into objects at runtime, the essence of the DCI architecture. Many posts regard DCI in the following way:

class User; end # Data
module Runner # Role
  def run
    ...
  end
end

user = User.new # Context
user.extend Runner
user.run

There's a few flaws with oversimpilfied examples like this. First, it reads "this is how to do DCI". DCI is far more than just extending objects. Second, it highlights #extend as the go-to means of adding methods to objects at runtime. In this article, I would like to specifically address the former issue: DCI beyond just extending objects. A followup post will contain a comparison of techniques to inject Roles into objects using #extend and otherwise.

DCI (Data-Context-Interaction)

As stated previously, DCI is about much more than just extending objects at runtime. It's about capturing the end user's mental model and reconstructing that into maintainable code. It's an outside → in approach, similar to BDD, where we regard the user interaction first and the data model second. The outside → in approach is one of the reasons I love the architecture; it fits well into a BDD style, which further promotes testability.

The important thing to know about DCI is that it's about more than just code. It's about process and people. It starts with principles behind Agile and Lean and extends those into code. The real benefit of following DCI is that it plays nicely with Agile and Lean. It's about code maintainability, responding to change, and decoupling what the system does (it's functionality) from what the system is (it's data model).

I'll take a behavior-driven approach to implementing DCI within a Rails app, starting with the Interactions and moving to the Data model. For the most part, I'm going to write code first then test. Of course, once you have a solid understanding of the components behind DCI, you can write tests first. I just don't feel test-first is a great way of explaining concepts.

User Stories

User stories are an important feature of DCI although not distinct to the architecture. They're the starting point of defining what the system does. One of the beauties of starting with user stories is that it fits well into an Agile process. Typically, we'll be given a story which defines our end-user feature. A simplified story might look like the following:

"As a user, I want to add a book to my cart."

At this point, we have a general idea of the feature we'll be implementing.

Aside: A more formal implementation of DCI would require turning a user story into a use case. The use case would then provide us with more clarification on the input, output, motivation, roles, etc.

Write Some Tests

We should have enough at this point to write an acceptance test for this feature. Let's use RSpec and Capybara:

spec/integration/add_to_cart_spec.rb

describe 'as a user' do
  it 'has a link to add the book to my cart' do
    @book = Book.new(:title => 'Lean Architecture')
    visit book_path(@book)
    page.should have_link('Add To Cart')
  end
end

In the spirit of BDD, we've started to identify how our domain model (our Data) will look. We know that Book will contain a title attribute. In the spirit of DCI, we've identified the Context for which this use case enacts and the Actors which play key parts. The Context is adding a book to the cart. The Actor we've identified is the User.

Realistically, we would add more tests to further cover this feature but the above suites us well for now.

The "Roles"

Actors play Roles. For this specific feature, we really only have one Actor, the User. The User plays the Role of a customer looking to add an item to their cart. Roles describe the algorithms used to define what the system does.

Let's code it up:

app/roles/customer.rb

module Customer
  def add_to_cart(book)
    self.cart << book
  end
end

Creating our Customer Role has helped tease out more information about our Data model, the User. We now know we'll need a #cart method on any Data objects which plays the Customer Role.

The Customer role defined above doesn't expose much about what #cart is. One design decision I made ahead of time, for the sake of simplicity, is to assume the cart will be stored in the database instead of the sesssion. The #cart method defined on any Actor playing the Customer Role should not be an elaborate implementation of a cart. I merely assume a simple association.

Roles also play nicely with polymorphism. The Customer Role could be played by any object who responds to the #cart method. The Role itself never knows what type of object it will augment, leaving that decision up to the Context.

Write Some Tests

Let's jump back into testing mode and write some tests around our newly created Role.

spec/roles/customer_spec.rb

describe Customer do
  let(:user) { User.new }
  let(:book) { Book.new }

  before do
    user.extend Customer
  end

  describe '#add_to_cart' do
    it 'puts the book in the cart' do
      user.add_to_cart(book)
      user.cart.should include(book)
    end
  end
end

The above test code also expresses how we will be using this Role, the Customer, within a given Context, adding a book to the cart. This makes the segway into actually writing the Context dead simple.

The "Context"

In DCI, the Context is the environment for which Data objects execute their Roles. There is always at least one Context for every one user story. Depending on the complexity of the user story, there may be more than one Context, possibly necessitating a story break-down. The goal of the Context is to connect Roles (what the system does) to Data objects (what the system is).

At this point, we know the Role we'll be using, the Customer, and we have a strong idea about the Data object we'll be augmenting, the User.

Let's code it up:

app/contexts/add_to_cart_context.rb

class AddToCartContext
  attr_reader :user, :book

  def self.call(user, book)
    AddToCartContext.new(user, book).call
  end

  def initialize(user, book)
    @user, @book = user, book
    @user.extend Customer
  end

  def call
    @user.add_to_cart(@book)
  end
end

Update: Jim Coplien's implementation of Contexts uses AddToCartContext#execute as the context trigger. To support Ruby idioms, procs and lambdas, the examples have been changed to use AddToCartContext#call.

There's a few key points to note:

  • A Context is defined as a class. The act of instantiating the class and calling it's #call method is known as triggering.
  • Having the class method AddToCartContext.call is simply a convenience method to aid in triggering.
  • The essence of DCI is in @user.extend Customer. Augmenting Data objects with Roles ad hoc is what allows for strong decoupling. There're a million ways to inject Roles into objects, #extend being one. In a followup article, I'll address other ways in which this can be accomplished.
  • Passing user and book objects to the Context can lead to naming collisions on Role methods. To help alleviate this, it would be acceptable to pass user_id and book_id into the Context and allow the Context to instantiate the associated objects.
  • A Context should expose the Actors for which it is enabling. In this case, attr_reader is used to expose @user and @book. @book isn't an Actor in this Context, however it's exposed for completeness.
  • Most noteably: You should rarely have to (impossibly) #unextend a Role from an object. A Data object will usually only play one Role at a time in a given Context. There should only be one Context per use case (emphasis: per use case, not user story). Therefore, we should rarely need to remove functionality or introduce naming collisions. In DCI, it is acceptable to inject multiple Roles into an object within a given Context. So the problem of naming collisions still resides but should rarely occur.

Write Some Tests

I'm generally not a huge proponent of mocking and stubbing but I think it's appropriate in the case of Contexts because we've already tested running code in our Role specs. At this point we're just testing the integration.

spec/contexts/add_to_cart_context_spec.rb

describe AddToCartContext do
  let(:user) { User.new }
  let(:book) { Book.new }

  it 'adds the book to the users cart' do
    context = AddToCartContext.new(user, book)
    context.user.should_recieve(:add_to_cart).with(context.book)
    context.call
  end
end

The main goal of the above code is to make sure we're calling the #add_to_cart method with the correct arguments. We do this by setting the expectation that the user Actor within the AddToCartContext should have it's #add_to_cart method called with book as an argument.

There's not much more to DCI. We've covered the Interaction between objects and the Context for which they interact. The important code has already been written. The only thing left is the dumb Data.

The "Data"

Data should be slim. A good rule of thumb is to never define methods on your models. This is not always the case. Better put: "Data object interfaces are simple and minimal: just enough to capture the domain properties, but without operations that are unique to any particular scenario" (Lean Architecture). The Data should really only consist of persistence-level methods, never how the persisted data is used. Let's look at the Book model for which we've already teased out the basic attributes.

class Book < ActiveRecord::Base
  validates :title, :presence => true
end

No methods. Just class-level definitions of persistence, association and data validation. The ways in which Book is used should not be a concern of the Book model. We could write some tests around the model, and we probably should. Testing validations and associations is fairly standard and I won't cover them here.

Keep your Data dumb.

Fitting Into Rails

There's not a whole lot to be said about fitting the above code into Rails. Simply put, we trigger our Context within the Controller.

app/controllers/book_controller.rb

class BookController < ApplicationController
  def add_to_cart
    AddToCartContext.call(current_user, Book.find(params[:id]))
  end
end

Here's a diagram illustrating how DCI compliments Rails MVC. The Context becomes a gateway between the user interface and the data model.

MVC + DCI

What We've Done

The following could warrant it's own article, but I want to briefly look at some of the benefits of structuring code with DCI.

  • We've highly decoupled the functionality of the system from how the data is actually stored. This gives us the added benefit of compression and easy polymorphism.
  • We've created readable code. It's easy to reason about the code both by the filenames and the algorithms within. It's all very well organized. See Uncle Bob's gripe about file-level readability.
  • Our Data model, what the system is, can remain stable while we progress and refactor Roles, what the system does.
  • We've come closer to representing the end-user mental model. This is the primary goal of MVC, something that has been skewed over time.

Yes, we're adding yet another layer of complexity. We have to keep track of Contexts and Roles on top of our traditional MVC. Contexts, specifically, exhibit more code. We've introduce a little more overhead. However, with this overhead comes a large degree of prosperity. As a developer or team of developers, it's your descretion on whether these benefits could resolve your business and engineering ailments.

Final Words

Problems with DCI exist as well. First, it requires a large paradigm shift. It's designed to compliment MVC (Model-View-Controller) so it fits well into Rails but it requires you to move all your code outside the controller and model. As we all know, the Rails community has a fetish for putting code in models and controllers. The paradigm shift is large, something that would require a large refactor for some apps. However, DCI could probably be refactored in on a case-by-case basis allowing apps to gradually shift from "fat models, skinny controllers" to DCI. Second, it potentially carries performance degradations, due to the fact that objects are extended ad hoc.

The main benefit of DCI in relation to the Ruby community is that it provides a structure for which to discuss maintainable code. There's been a lot of recent discussion in the vein of "'fat models, skinny controllers is bad'; don't put code in your controller OR your model, put it elsewhere." The problem is we're lacking guidance for where our code should live and how it should be structured. We don't want it in the model, we don't want it in the controller, and we certainly don't want it in the view. For most, adhering to these requirements leads to confusion, overengineering, and a general lack of consistency. DCI gives us a blueprint to break the Rails mold and create maintainable, testable, decoupled code.

Aside: There's been other work in this area. Avdi Grimm has a phenominal book called Objects on Rails which proposes alternative solutions.

Happy architecting!

This article is translated to Serbo-Croatian.

Further Reading

DCI: The King of the Open/Closed Principle
DCI With Ruby Refinements
DCI Role Injection in Ruby
Benchmarking DCI in Ruby

Posted by Mike Pack on 01/24/2012 at 12:58PM

Tags: architecture, rails, ruby, dci, testing


Benchmarking DCI in Ruby

I've recently become quite intrigued with the concepts behind DCI (Data Context and Interaction). I won't go too in depth about what DCI is or why you might use it, that's been discussed many times elsewhere. In short, DCI is an architecture which allows us to delineate our domain objects from the actual functions they perform. It mixes-in Roles (functionality) into our Data component when and only when that functionality it needed; when in Context. Most of the value DCI brings to the table derives from the way it forces you to abstract out behavior into testable modules.

What I'd like to do is take a look at the performance implications of using DCI in Ruby applications.

I think it should be said upfront that this is purely academic and may have minimal bearing within the ecosystem of a complex app. For this sake, I won't try to draw any vast conclusions.

How to use DCI in Ruby

DCI can be used in Ruby by augmenting your objects with Roles at runtime so that the necessary interactions are available to that object.

class User
  ...
end

module Runner
  def run
    ...
  end
end
 
user = User.new
user.extend Runner
user.run 

In more traditional, idiomatic Ruby you would normally just include the module while defining the class:

class User
  include Runner
  ...
end

user = User.new
user.run 

Hypothesis

Since every #extend called carries some memory and processing implications, lately I've been wondering if, while using DCI, we could be incurring a performance hit when extending many objects ad hoc. I decided to profile this to understand if we could be blindly degrading performance and whether there are optimization techniques I should be aware of.

My process involves taking the most simplified example (shown in the above snippets) and benchmarking the traditional approach against the DCI-inclined approach.

I'm running these benchmarks on a MacBook Pro - 2.2 GHz - 8 GB memory.

The Runner Module

Here's the Runner module used in the following examples. It's just one method that does some arbitrary calculation.

runner.rb

module Runner
  def run
    Math.tan(Math::PI / 4)
  end
end

Ruby Benchmark

Using Ruby's Benchmark library, we can extrapolate the amount of time taken for these processes to execute. First, we'll benchmark the traditional, idiomatic Ruby way: using an include to augment the class.

include_bm.rb

require 'benchmark'
require './runner'

class IncludeUser
  include Runner
end

Benchmark.bm do |bench|
  3.times do
    bench.report('include') do
      1000000.times do
        user = IncludeUser.new
        user.run
      end
    end
  end
end
$ ruby include_bm.rb
         user       system     total       real
include  0.500000   0.000000   0.500000 (  0.497114)
include  0.500000   0.000000   0.500000 (  0.497363)
include  0.490000   0.000000   0.490000 (  0.497342)

The results of this benchmark tell us that executing 1 million "run" operations results in roughly 0.5 seconds.

Let's look at how this compares to the DCI implementation.

dci_bm.rb

require 'benchmark'
require './runner'

class DCIUser; end

Benchmark.bm do |bench|
  3.times do
    bench.report('DCI') do
      1000000.times do
        user = DCIUser.new
        user.extend Runner
        user.run
      end
    end
  end
end
$ ruby dci_bm.rb
     user       system     total       real
DCI  8.430000   0.000000   8.430000 (  8.429382)
DCI  8.490000   0.010000   8.500000 (  8.486804)
DCI  8.450000   0.010000   8.460000 (  8.447363)

Quite a difference! It's probably safe to say at this point that calling extend 1 million times is a lot less performant than including the module one time as the class is defined. The reasoning is simple. Including the module once injects it in the user objects' lookup hierarchy. When the run method is called, the hierarchy is traversed and the method is fetched. In the traditional (include) approach, the module never leaves or reenters the hierarchy after it's been defined. Conversely, in DCI, the module enters the hierarchy each time extend is called.

Let's profile these two approaches and discover why they're so different.

perftools.rb

Assuming the same class/module structure as above, let's use perftools.rb to profile their execution. Using perftools.rb is a two-step process. First, generate the profile: a summary of where the code is spending it's time. Second, display the profile in the designated format. To visualize the components, we'll genereate graphs using the GIF format. You'll need the dot tool in order to generate graphs. Check out this presentation for more info on using perftools.rb.

Let's first observe the traditional approach:

include_profile.rb

require 'perftools'
require './runner'

class IncludeUser
  include Runner
end

PerfTools::CpuProfiler.start('/tmp/include_profile') do
  1000000.times do
    user = IncludeUser.new
    user.run
  end
end
$ ruby include_profile.rb
$ pprof.rb --gif /tmp/include_profile > include_profile.gif

include_profile.gif

The above graph tell us that most of the execution time is happening in the iteration of our test loop. Barely any time is spent creating the objects or executing the arbitrary math calculation. More info on reading the pprof.rb output can be found here.

Now let's take a look at the DCI approach:

dci_profile.rb

require 'perftools'
require './runner'

class DCIUser; end

PerfTools::CpuProfiler.start('/tmp/dci_profile') do
  1000000.times do
    user = DCIUser.new
    user.extend Runner
    user.run
  end
end
$ ruby dci_profile.rb
$ pprof.rb --gif /tmp/dci_profile > dci_profile.gif

dci_profile.gif

The above results tell us that almost half the time is spent extending objects at runtime through Module#extend_object. In this example, the time spent iterating over our test case is dwarfed against the time taken to extend objects. So, after profiling we can verify that extending the object is indeed taking up most of our time.

ObjectSpace.count_objects

Let's compare how the number of objects in memory stack up with the two implementations. Ruby 1.9 provides us with the ObjectSpace.count_objects method to inspect all objects currently initialized in memory. It's important to turn off garbage collection as it may be invoked mid-test, skewing the results. Here is the module used to inspect the number of objects currently in memory. It's a modified version of Aaron Patterson's implementation.

allocation.rb

module Allocation
  def self.count
    GC.disable
    before = ObjectSpace.count_objects
    yield
    after = ObjectSpace.count_objects
    after.each { |k,v| after[k] = v - before[k] }
    GC.enable
    after
  end
end

This method turns off the garbage collector, records the number of objects pre-benchmark, runs the benchmark, records the number of objects post-benchmark, then compiles the difference between the two. Let's gather more information by exracting object allocations.

include_space.rb

require './runner'
require './allocation'

class IncludeUser
  include Runner
end

p(Allocation.count do
  1000000.times do
    user = IncludeUser.new
    user.run
  end
end)
$ ruby include_space.rb
{:TOTAL=>2995684, :FREE=>-4344, :T_OBJECT=>1000000, :T_CLASS=>0, :T_MODULE=>0, :T_FLOAT=>2000000, :T_STRING=>27, :T_REGEXP=>0, :T_ARRAY=>0, :T_HASH=>1, :T_BIGNUM=>0, :T_FILE=>0, :T_DATA=>0, :T_COMPLEX=>0, :T_NODE=>0, :T_ICLASS=>0}

Most of the keys in the printed hash refer to Ruby types. The ones we're interested in are :TOTAL, :T_CLASS, :T_ICLASS. The meaning of these keys isn't very well documented but the Ruby source hints at them. Here's my understanding:

:TOTAL is the total number of objects present in memory.
:T_CLASS is the total number of classes that have been declared (in memory).
:T_ICLASS is the total number of internal use classes, or iClasses, used when modules are mixed in.

The DCI approach:

dci_space.rb

require './runner'
require './allocation'

class DCIUser; end

p(Allocation.count do
  1000000.times do
    user = DCIUser.new
    user.extend Runner
    user.run
  end
end)
$ ruby dci_space.rb
{:TOTAL=>4995536, :FREE=>-4492, :T_OBJECT=>1000000, :T_CLASS=>1000000, :T_MODULE=>0, :T_FLOAT=>2000000, :T_STRING=>27, :T_REGEXP=>0, :T_ARRAY=>0, :T_HASH=>1, :T_BIGNUM=>0, :T_FILE=>0, :T_DATA=>0, :T_COMPLEX=>0, :T_NODE=>0, :T_ICLASS=>1000000}

Let's looks at the significant differences between the two:

# Include
{:TOTAL=>2995684, :FREE=>-4344, :T_CLASS=>0, :T_ICLASS=>0}

# DCI
{:TOTAL=>4995536, :FREE=>-4492, :T_CLASS=>1000000, :T_ICLASS=>1000000}

These results expose a few facts about the two implementations.

  • The DCI approach uses roughly 40% more objects in memory.
  • The DCI approach initializes many more classes and internal (mixed-in) classes.

Wrapping It Up

As I said at the beginning, it's quite hard to determine how, if at all, using DCI will affect the performance of a real world application. Certainly, a single web request will never invoke 1 million inclusions of a module at runtime.

What this does show us is that Ruby is not optimized for this architecture. In idiomatic Ruby, modules are usually included during the class definition. It's possible that languages, like Scala, with built-in tools to extend objects ad hoc perform better than Ruby. Scala's traits provide high-level support for this type of functionality and are optimized for use with DCI.

I'm still quite interested in DCI. Specifically, in optimizations for Ruby. I'm also quite interested in running these benchmarks against a production app, something that'll just have to wait.

All the code used here can be found on github.

Happy benchmarking!

Posted by Mike Pack on 01/17/2012 at 12:46PM

Tags: dci, ruby, rails, performance, benchmarking, profiling, architecture


Snowday Released

I just released an RSpec formatter to make you feel all warm inside. Like hot chocolate. It's called snowday.

Posted by Mike Pack on 11/20/2011 at 12:03PM

Tags: ruby, rspec, formatter


Tuesday Tricks - Regex Posix Shortcuts

Hate typing redundant regular expressions? Me too. How often have you typed the regex [a-zA-Z0-9]?

Posix character classes are here to save the day. You can replace a-zA-Z0-9 with [:alnum:]. [:alnum:] is the posix character class and there's a whole slew of them at your disposal. Use them in Ruby like so:

'-- I have 37 dollars --' =~ /[[:alnum:]]/ #=> 3
'-- I have 37 dollars --' =~ /[[:digit:]]/ #=> 10
'-- I have 37 dollars --' =~ /[[:space:]]/ #=> 2

Note: An expression with =~ returns the first position in the string which matches the regex.

Check out the full list of posix character classes and determine how you can prettify your expressions.

This Tuesday's Trick

Posix character classes won't prevent global warming but they sure can help make your regular expressions more readable.

Posted by Mike Pack on 06/28/2011 at 01:20PM

Tags: regex, ruby, posix


Tuesday Tricks - Splatting

In Ruby, the * (asterisk) token is often referred to as the "splat operator". It's purpose is to turn a group of arguments into an array. This can be useful if you want to accept an enumeration to your method but don't care how it's formed. For example:

def note_tasks(*tasks)
  puts "[ ] #{tasks.join(' and ')}"
end

note_tasks('mow the lawn') #=> [ ] mow the lawn
note_tasks('take out the trash', 'walk the dog') #=> [ ] mow the lawn and walk the dog
note_tasks(['feed yourself', 'get some sleep']) #=> [ ] feed yourself and get some sleep

Splatting in Ruby 1.9

In Ruby 1.8 you were constricted to using the splat operator on the last argument in a method signature. In Ruby 1.9, you can splat anywhere.

def note_task(name, *options, stream)
  $stdout = stream
  puts "#{options.first.to_s}#{name}#{options.last.to_s}"
end

note_task('mow the lawn', '[ ] ', 'ignored', '!!', $stdout)
#=> [ ] mow the lawn!!

The above method shouldn't normally be defined in such a way. It would make much more sense to define it with stream as the second argument and allow options to be a trailing hash.

def note_task(name, stream, options = {})
  $stdout = stream
  puts "#{options[:before].to_s}Make sure you #{name}#{options[:after].to_s}"
end

note_task('mow the lawn', File.new('/dev/null', 'w'), :before => '[ ] ',
:ignore => 'this',
:after => '!!')
#=> [ ] mow the lawn!! (to /dev/null)
note_task('mow the lawn', File.new('/dev/null', 'w'))
#=> mow the lawn (to /dev/null)

Ruby will automatically convert the trailing parameters into a hash. Thanks Ruby.

This Tuesday's Trick

Splatting is fun and useful, but careful, it can sometimes decrease the integrity of your method signature. Mainly use them when you have a trailing enumerable set that can be passed as a list of arguments.

Posted by Mike Pack on 05/10/2011 at 03:55PM

Tags: tuesday tricks, splat, ruby


Tuesday Tricks - Named Regex Groups

New in Ruby 1.9 is the ability to name capture groups so you don't have to use $1, $2...$n. First a demonstration:

Ruby Pre-1.9

regex = /(\w+),(\w+),(\w+)/
"Mike,Pack,Ruby".match regex
"First Name: #$1"
"Last Name: #$2"
"Favorite Language: #$3"

Ruby 1.9.2

regex = /(?<first_name>\w+),(?<last_name>\w+),(?<favorite_language>\w+)/
m = "Mike,Pack,Ruby".match regex
"First Name: #{m[:first_name]}"
"Last Name: #{m[:last_name]}"
"Favorite Language: #{m[:favorite_language]}"

Note: If you use named groups, Ruby won't process unnammed groups. So the following won't work:

regex = /(?<first_name>\w+),(?<last_name>\w+),(?<favorite_language>\w+),(\w+)/
m = "Mike,Pack,Ruby,Colorado".match regex
"First Name: #{m[:first_name]}"
"Last Name: #{m[:last_name]}"
"Favorite Language: #{m[:favorite_language]}"
"Location: #$4"

Note: Even though Ruby won't populate $4, it will still populate $1, $2 and $3.

This Tuesday's Trick

Perl had named regex groups, now Ruby has them. Naming your regex groups can be extremely helpful, especially when the regex becomes complex. Use 'em.

Posted by Mike Pack on 05/03/2011 at 12:58PM

Tags: tuesday tricks, regex, ruby


Dynamically Requesting Facebook Permissions with OmniAuth

One of the major benefits of dynamically requesting the Facebook permissions is the increased rate of users who will allow you to access their account. Facebook puts it nicely, "There is a strong inverse correlation between the number of permissions your app requests and the number of users that will allow those permissions."

This solution uses OmniAuth to handle the authentication. The concept is simple. Ask the user to allow your application access to their most basic information (or the bare minimum your app needs). When they perform an action that requires more than the permissions they have currently allowed, redirect them to Facebook and ask for more permissions.

If you haven't set up OmniAuth, follow Ryan Bate's Railscasts, Part 1 and Part 2.

Configuring OmniAuth

OmniAuth expects you to configure your authentication schemes within your initializers.

config/initializers/omniauth.rb

Rails.application.config.middleware.use OmniAuth::Builder do
provider :facebook, ENV['FB_APP_ID'], ENV['FB_APP_SECRET']
end

Now, when you visit /auth/facebook, you will be redirected to Facebook and asked for basic permissions.

In order to permit your app to dynamically change the OmniAuth Stategy, you'll need a controller which has access to your OmniAuth Strategy. OmniAuth provides a pre-authorization setup hook to handle this. Update your omniauth.rb initializer to look like the following:

config/initializers/omniauth.rb

Rails.application.config.middleware.use OmniAuth::Builder do
provider :facebook, ENV['FB_APP_ID'], ENV['FB_APP_SECRET'], :setup => true
end

Now when you visit /auth/facebook you'll be redirected to /auth/facebook/setup. You should add a route for this:

config/routes.rb

match '/auth/facebook/setup', :to => 'facebook#setup'

Your Facebook controller with the setup action should look as follows:

app/controllers/facebook_controller.rb

class FacebookController < ApplicationController
  def setup
    request.env['omniauth.strategy'].options[:scope] = session[:fb_permissions]
    render :text => "Setup complete.", :status => 404
  end
end

Note: OmniAuth says, "we render a response with a 404 status to let OmniAuth know that it should continue on with the authentication flow."

Once your FacebookController#setup action has completed, OmniAuth will take it from there and process your request through to Facebook.

Dynamically Setting the Permissions

The appropriate code can be used like so:

app/controllers/some_controller.rb

session[:fb_permissions] = 'user_events'
redirect_to '/auth/facebook'

session[:fb_permissions] is the interface between your two controller actions: the one that wants to request more permissions (some_controller.rb) and the one that wants to modify your OmniAuth Strategy (facebook_controller.rb).

For reference, here's a list of available Facebook permissions you can use; comma deliminated.

That's it. Upon redirection, Facebook will ask to allow the new permissions, redirect back to your app, and you can now successfully make calls to the Facebook API (I use Koala to work with the Facebook API).

One Gotcha

One thing I ran into on OmniAuth 0.2.4 and Rails 3.0.7 is the OmniAuth Stategy which was available in request.env['omniauth.stategy']. If you have more than one provider in your OmniAuth::Builder DSL, request.env['omniauth.stategy'] will be set to the last entry in the DSL. If you have your initializer set up like the following:

Rails.application.config.middleware.use OmniAuth::Builder do
provider :facebook, ENV['FB_APP_ID'], ENV['FB_APP_SECRET'], :setup => true
provider :twitter, ENV['TWITTER_CONSUMER_KEY'], ENV['TWITTER_CONSUMER_SECRET']
end

request.env['omniauth.stategy'] will be set to #<OmniAuth::Strategies::Twitter>, not exactly what you want. Your Facebook stategy needs to be the last entry in the DSL, like so:

Rails.application.config.middleware.use OmniAuth::Builder do
provider :twitter, ENV['TWITTER_CONSUMER_KEY'], ENV['TWITTER_CONSUMER_SECRET']
provider :facebook, ENV['FB_APP_ID'], ENV['FB_APP_SECRET'], :setup => true
end

Happy Facebooking!

Posted by Mike Pack on 04/27/2011 at 10:51AM

Tags: ruby, rails, omniauth, facebook