Conditional Indexing with Sunspot

One of my favorite new features of Sunspot 1.3 is the ability to conditionally index an instance of a model based on anything that returns a boolean.

Say I have a Post model to store my blog posts. We want to index only blog posts which are published so users aren't searching on unpublished posts. The syntax looks as follows:

class Post < ActiveRecord::Base
  attr_accessible :title, :content, :published, :external_source

  searchable :if => :published do
    text :title
    text :content
  end
end

Pretty nice. But let's elaborate a little. Say we want to index only published posts but we don't want to index posts where content comes from an external source.

searchable :if => :published, :unless => :external_source do
  ...
end

Let's flip things around. What if want to index only published blog posts which come from external sources? Supply an array.

searchable :if => [:published, :external_source] do
  ...
end 

What if the conditions for indexing are more complex than a simple boolean method on our model? Supply a proc.

searchable :if => proc { |post| post.content.size > 0 } do
  ...
end 

Indexing with sunspot just got a whole lot easier.

Happy indexing!

Posted by Mike Pack on 11/16/2011 at 02:32PM

Tags: sunspot, search, indexing, rails


Stubbing Internal Methods and Rails Associations with RSpec

Occasionally, times arise where you would like to unit test the inner workings of a method. As a disclaimer, I don't recommend it because tests should generally be behavior driven. Tests should treat your methods as black boxes; you put something in, you get something out. How it works internally shouldn't really matter. However, if you would like to test the inner workings of your methods there's a number of ways to do so with pure Ruby including :send, :instance_variable_get and others. Testing the innards feels dirty no matter which way you spin it but I like to at least do it with RSpec.

Why Test the Internals?

Lets say you have a method that does some expensive lookup:

class Library < ActiveRecord::Base
  include ExpensiveQueries

  att_accessor :books
  def books
    @books ||= expensive_query # The expensive query takes 5 seconds
  end
end

The above example should be familiar, you plan to perform something Ruby or database expensive and you would like to cache the result in an instance variable so that all subsequent calls to that method draw from the instance variable.

If you were taking a TDD approach, you wouldn't have this class already written. In that case, you know you'll be performing something very expensive and you want to ensure that method caches the result. How do you test this without knowing the internals of the method?

Stubbing with RSpec

Let's say you're writing your tests before you write the above class. You could use RSpec's stubbing library to ensure your method is caching it's result.

describe Library do
  describe '#books' do
    it 'caches the result' do
# Assume some books get associated upon creation
      @library = Library.create!

# 5 seconds for this call
      the_books = @library.books

# Stub out the expensive_query method so it raises an error
      @library.stub(:expensive_query) { raise 'Should not execute' }

# If the value was cached, expensive_query shouldn't be called
      lambda { @library.books }.should_not raise_error
    end
  end
end

The key component here is the @library.stub call. This is also where we're breaking the black box, behavior driven test idiom. We assume at this line that we know there will be a method call internally named expensive_query. This test is also brittle because if expensive_query ever changes it's name to really_expensive_query, our test will break even though the functionality of our method remains the same.

Stubbing Rails Associations

What if your expensive_query is really an ActiveRecord association? So, let's say your Library class looks more like the following:

class Library < ActiveRecord::Base
  has_many books

  att_accessor :authors
  def authors
    @authors ||= books.authors # The expensive query takes 5 seconds
  end
end

You could use the nifty stub_chain method provided by RSpec to stub the books.authors method and ensure it only gets called once.

describe Library do
  describe '#books' do
    it 'caches the result' do
# Assume some books and authors get associated upon creation
      @library = Library.create!

# 5 seconds for this call
      the_authors = @library.authors

# Stub out the books.authors association so it raises an error
      @library.stub_chain(:books, :authors) { raise 'Should not execute' }

# If the value was cached, books.authors shouldn't be called
      lambda { @library.authors }.should_not raise_error
    end
  end
end

Arguments to stub_chain represent the associations used. stub_chain could also be used to stub out additional methods which get called within the chain.

Happy stubbing!

Posted by Mike Pack on 10/07/2011 at 11:20AM

Tags: rails, rspec, stubbing


OmniAuth's Overzealous Approach to Facebook Auth

Call me a stickler, but I think there should be two pages that load quickest in any web app: The home page (for people not logged in) and the initial page you see once you are logged in, usually the dashboard.

The Good

Tackling the first of these two criteria, the guest home page, is fairly easy. This page could be as simple or as complex as you want. Facebook keeps it simple and static. Foursquare adds some flair. Whatever the approach may be, it's pretty easy to control the load time of the guest home page because you're likely building it from scratch.

OmniAuth had the revolutionary idea to consolidate third party methods of authentication, most using OAuth 1 or 2. But as consumers of libraries which take on such a burden, we have to be extra careful of the intricacies. For OmniAuth, one of those intricacies includes authentication with Facebook.

The Bad

When OmniAuth successfully authenticates with Facebook, somethings terrible happens: it makes a request to the Facebook Graph API...every...single...time. Not only the first time you log in with Facebook, but all subsequent times. This is because of the vast decoupling between OmniAuth and your app. OmniAuth knows nothing about your underlying data model so it can't reliably store the authenticated user's Facebook information (and know not to request it again). To provide the user's Facebook information within your success callback, OmniAuth makes a request to the Facebook API.

I think it goes without saying but this is really bad for usability. To glance at a couple problems with this approach, consider that the Graph API is down. Or, consider that it never responds at all. Or, consider that you're over your Facebook API quota. Your users will be sitting in limbo at the most critical time they're using your app, during the login process. Maybe this is their first time logging in. Making one API call could potentially make it their last. Impress users early and with OmniAuth's Facebook integration you could be missing out.

The Fix

The solution is to roll your own Authentication. Facebook's JavaScript SDK is awesome (most of the time) and you could probably integrate it within the same timeframe as you could OmniAuth but with the added benifit of a much better user experience. Unlike similar solutions to the Facebook JS SDK (Twitter @Anywhere), Facebook provides you with everything OmniAuth does, including the API access token.

Sidenote: As of this posting, OmniAuth 1.0 is currently under active development and it doesn't look like this issue has leaked into the OmniAuth Facebook Extension yet. The official release is still at 0.2.6.

Happy Facebooking!

Posted by Mike Pack on 09/21/2011 at 01:49PM

Tags: facebook, omniauth, api