Cacheable HTTP search query results

I have worked on a number of web applications which required searching catalogs of data based on filtering criteria. The most common implementation I see involves issuing a GET request to a search service, providing the search criteria as part of the request’s query string.

http://example.com/search?category=music&subcategory=rock&page=7

This approach does not easily lend itself to static resource caching, one of the most effective ways to improve a web app’s performance. Regardless of the level of optimization applied to application code, fine tuning of database queries, even the addition of something like memcached, a request reaching the application server is unlikely to be served more efficiently than if it was handled by a high performance HTTP server like Nginx.

By approaching search queries as RESTful HTTP resources uniquely identified by a URI as opposed to RPC based commands we should be able to cache the results the first time they are processed following a search request.

http://example.com/search_results/someuniqueidentifier

The unique identifier part of the URI can take the form of a hash which, when deserialized, will provide the application with the filter criteria for the search. This assumes that the client and server share a common protocol, one which defines how the hash for the URI is constructed. For example, it is a good idea that there is an expected order for the set of criteria. While searches for {category : music, subcategory : rock} and {subcategory : music, category : rock} will produce the same results, using both combinations will cause the resource to be cached twice under two separate URIs, resulting in a performance penalty.

A potential solution can involve Base64 encoding and decoding a string constructed using a predefined format and comprising of the filter criteria.

CGI.unescape(identifier).unpack('m')[0] # => "music,rock,,,,7,30"

This method will not be useful for plain HTML fronted websites. It requires a potent enough client with the ability to dynamically construct URIs based on filter criteria. JavaScript, ActionScript or generic web service consumer applications are all good candidates.

Testing web services with ActiveResource

ActiveResource can be a useful tool for abstracting away low level HTTP or data marshaling details when testing web services with an XML schema and URI patterns which respect the Rails protocol for REST.

Here’s a possible implementation for use in tests that exercise a service from the outside, a sort of black box web service testing approach, if you’d like.

def resource(name)
  class_name = name.to_s.camelize
  return class_name.constantize if Object.const_defined?(class_name.intern)
  rsrc = Class.new(ActiveResource::Base) do
    self.site = "http://localhost:4001/api"
    self.element_name = name.to_s
  end
  Object.const_set(class_name.intern, rsrc)
end

Let’s imagine an API call to http://localhost:4001/api/categories.xml which returns a list of product categories with their respective subcategories. Following is a potential response to a GET request to the afore mentioned URI.

<?xml version="1.0" encoding="UTF-8"?>
<categories type="array">
  <category>
    <id type="integer">3</id>
    <name>Music</name>
    <subcategories type="array">
      <subcategory type="Category">
        <id type="integer">4</id>
        <name>Rock</name>
      </subcategory>
      <subcategory type="Category">
        <id type="integer">5</id>
        <name>Metal</name>
      </subcategory>
    </subcategories>
  </category>
</categories>

Invoking resource :category in the test will provide a Category class. Category is an ActiveResource child which can be used to exercise the /categories end point of the API.

class ApiTest < Test::Unit::TestCase
  resource :category

  def test_categories
    categories = Category.find(:all)
    assert_equal(1, categories.size)
    assert_equal("Music", categories.first.name)
  end

  def test_subcategories
    subcategories = Category.find(:all).first.subcategories
    assert_equal(2, subcategories.size)
    assert_equal("Metal", subcategories[1].name)
  end

  def test_category_creation
    Category.create(:name => "Hacking")
    assert_equal(3, Category.find(:all).size)
  end
end

Abstract resource

A large portion of the internet is governed by HTTP and the World Wide Web in particular is designed based on the REST architectural style. It makes sense to design web applications or web based services in a way that respects and harnesses the web’s underlying architecture.

When it comes to developing web applications, Model-View-Controller (MVC) is one of the dominant architectural patterns current web frameworks are based on. MVC is not restricted to building web apps, on the contrary, its history can be traced back to 1979 and Smalltalk and has been originally applied to the development of applications which involved user interfaces.

The majority of Ruby web frameworks, especially the ones inspired by Rails, employ MVC and offer some sort of support for REST style application development, typically by defining resources which can be accessed through a URI and manipulated by making use of standard HTTP methods such as GET, PUT, POST, DELETE.

The above unveils an obvious similarity between the way HTTP resources can be manipulated - the four verbs can fundamentally constitute CRUD operations - and another common tier in web applications nowadays, databases.

web-db

Controllers in Merb, Rails or other similar Ruby, or not, web frameworks are a busy abstraction. A controller typically needs to dispatch to relevant actions, consolidate HTTP payloads, deal with sessions, sometimes caching, etc. These controllers are usually REST aware, meaning that they will by default map routed URI HTTP operations to a standard set of actions, namely index, show, create, edit, update, destroy.

If we focus on our application exposing strictly REST resource based interfaces, and assume that these resources directly map to the application’s database schema, we can relieve controllers from some of the associated strain by abstracting away the discussed common functionality.

module CrudTemplate
  def resource
    raise "You must define a resource"
  end

  def index
    instance_variable_set(resource_sym_plural, resource.find(:all))
    render
  end

  def show
    assign_resource(resource.find(params[:id]))
    render
  end

  alias edit show
  alias delete show

  def new
    assign_resource(resource.new(resource_attrs))
    render
  end

  def create
    r = resource.new(resource_attrs)
    assign_resource(r)
    if r.save
      on_create_success(r)
    else
      on_create_failure(r)
    end
  end

  def on_create_success(r)
    redirect(resource_sym)
  end

  alias on_update_success on_create_success

  def on_create_failure(r)
    assign_resource(r)
    render(:new, :status => 400)
  end

  def update
    r = resource.find(params[:id])
    if r.update_attributes(resource_attrs)
      on_update_success(r)
    else
      on_update_failure(r)
    end
  end

  def on_update_failure(r)
    assign_resource(r)
    render(:edit)
  end

  def destroy
    if resource.destroy(params[:id])
      on_destroy_success(r)
    else
      on_destroy_failure(r)
    end
    redirect(resource_sym)
  end

  def self.included(controller)
    controller.show_action(*shown_actions)
  end

  protected

  def resource_attrs
    {}
  end

  def self.shown_actions
    [:index, :show, :create, :new, :edit, :update]
  end

  private

  def assign_resource(r)
    instance_variable_set(resource_sym, r)
  end

  def resource_sym
    @resource_sym ||= :"@#{resource.name.underscore.split("/").last}"
  end

  def resource_sym_plural
    @resource_sym_plural ||= :"@#{resource.name.underscore.split("/").last.pluralize}"
  end
end

By doing so, we can write controllers that look something like the following.

class Reservations < Application
  include CrudTemplate

  def resource
    Reservation
  end

  def on_create_success
    flash[:notice] = "Thank you"
    redirect("/")
  end

  protected

  def self.shown_actions
    [:new, :create]
  end

  def resource_attrs
    params[:reservation].merge(session[:member])
  end
end

Things are usually more complicated. The above model falls short for the majority of web applications I’ve worked on. Resources rarely are direct matches to database tables and there is usually good reason for them not to be. Applications involve complex business logic, spanning further from what a set CRUD operations is appropriate for. One might argue that business logic can be incorporated into Models (as in ORM classes), but I generally prefer to avoid keeping business logic near the persistence layer and opt for a database agnostic, rich domain tier.

This however doesn’t imply that controllers shouldn’t think in terms of resources. Controllers are close to the web, and the web works well with resources. It suffices for domain layer endpoints that intend to communicate with a controller to expose an interface the controller understands. If we define that interface so that it matches its database specific counterpart, we can achieve the best of both worlds.

web-domain-db

Controllers can transparently operate on plain ruby components which include an AbstractResource module (interface) and choose to implement any of its methods, or directly on ORM models, such as ActiveRecord classes, where appropriate.

module AbstractResource
  attr_reader :params

  def initialize(params = {})
    @params = params
  end

  def save
    raise "Implement me"
  end

  def update_attributes(attrs = {})
    raise "Implement me"
  end

  def valid?
    raise "Implement me"
  end

  def errors
    raise "Implement me"
  end

  module ClassMethods
    def delete(id)
      raise "Implement me"
    end

    def find(id)
      raise "Implement me"
    end
  end

  def self.included(target)
    target.extend(ClassMethods)
  end
end

P.S. Credit due to Carlos Villela whose observations have been the core and inspiration behind the ideas in this article.

NWRUG Synthesis Talk

Stuart and I will be talking about Synthesis at this month’s North West Ruby User Group meet up in Manchester on Tuesday the 24th of June. Registration details and directions to the venue can be found on the event’s page at nwrug.org.

Testing Merb controllers

One of the features that attracted me to Merb was the ability to test controllers in an independent, lightweight manner. In essence, this involves instantiating a controller class, passing it a FakeRequest and calling methods (actions) on the controller object.

Let’s consider a controller which collaborates with a service.

class Foo < Merb::Controller
  def bar
    service = Service.new
    session[:metal] = service.metal
    @zz = service.rock
    render
  end
end

class Service
  def rock
    "zz top"
  end

  def metal
    "metallica"
  end
end

Testing the controller is as straightforward as creating an instance of Foo, setting it up, calling bar and interrogating it.

class FooTest < Test::Unit::TestCase
  def setup
    @foo = Foo.new(Merb::Test::RequestHelper::FakeRequest.new)
    @foo.request.session = {}
    @foo.bar
  end

  def test_puts_metallica_in_session
    assert_equal("metallica", @foo.session[:metal])
  end

  def test_assigns_zz_top
    assert_equal("zz top", @foo.assigns(:zz))
  end
end

I’m not sure why the controller’s session variable has to be explicitly initialized, had it been present would make testing slightly cleaner.

DataMapper without a database

DataMapper is fast becoming a credible contender in the Ruby ORM field. The first - and only at this early stage - thing that temporarily disappointed me was the following scenario.

class Foo
  include DataMapper::Resource

  property :id, Integer, :serial => true
  property :title, String
end

Running this produces ArgumentError: Unknown adapter name: default, suggesting that a database connection needs to be setup in order to use any objects that include the DataMapper::Resource module. This is something I would rather not have to do for my dependency neutral test suite, in which all calls to ORM objects are simulated using mocks.

I soon realized that DataMapper doesn’t require a database connection to be present, but needs to know which adapter to use. If we’re not interested in interacting with the database, using DataMapper::Adapters::AbstractAdapter does the trick.

DataMapper.setup(:default, "abstract::")

class Foo
  include DataMapper::Resource

  property :id, Integer, :serial => true
  property :title, String
end

Foo.new(:title => "metal").title # => "metal"

Synthesis visualizations

Synthesized testing is about accurately simulating object interactions and verifying that each end point of every interaction has been tested to work. The end result of a code base tested employing this strategy forms a specification of the application’s ecosystem in terms of object communication.

Danilo has been recently contributing some excellent work around visual representations of the above. The code is being developed on the Synthesis experimental branch on github.

Consider the Synthesis test_project example.

class DataBrander
  BRAND = "METAL"

  def initialize(storage)
    @storage = storage
  end

  def save_branded(data)
    @storage.save "#{BRAND} - #{data}"
  end

  def dont_do_this
    @storage.ouch!
  end
end

class Storage
  def initialize(filename)
    @filename = filename
  end

  def save(val)
    File.open(@filename, 'w') {|f| f < val}
  end

  def ouch!
    raise Problem
  end
end

class Problem < Exception;end

Below are the complete specs for the above implementation.

describe DataBrander do
  it "should save branded to storage" do
    storage = Storage.new("")
    storage.should_receive(:save).with("METAL - rock")
    DataBrander.new(storage).save_branded("rock")
  end

  it "should delegate problem" do
    storage = Storage.new("")
    storage.should_receive(:ouch!).and_raise(Problem.new)
    proc {DataBrander.new(storage).dont_do_this}.should raise_error(Problem)
  end
end

describe Storage do
  it "should save to file" do
    begin
      Storage.new("test.txt").save("rock")
      File.read("test.txt").should == "rock"
    ensure
      FileUtils.rm_f("test.txt")
    end
  end

  it "should raise problem on ouch!" do
    proc { Storage.new("").ouch! }.should raise_error(Problem)
  end
end

A Synthesis run using the DOT formatter produces:

dot-synthesis-passing

Removing the "should save to file" spec will cause the Synthesis task to fail.

dot-synthesis-failing

Below is how a real (relatively small) project looks like.

full-project

I find the ability to inspect our application modeling through such a representation a very appealing added benefit to the confidence in our system Synthesis provides us with. The DOT formatter will become part of the Synthesis gem as soon as we iron out the few remaining glitches.

Using Bazaar with RubyForge

Bazaar is a distributed version control system written in Python, similar to Git. Bazaar places particular focus on usability, it is easy and natural to use, especially for ones visiting or migrating from the world of Git.

One of Bazaar’s striking features is the ability to publish branches with sftp, provided there is a web server available. RubyForge project accounts come with support for both, so publishing a Bazaar branch is as easy as:

bzr push --create-prefix sftp://you@rubyforge.org/var/www/gforge-projects/your-project/bzr

Developers can create their copy of the branch by:

bzr branch http://your-project.rubyforge.org/bzr

Erlang eval and dynamic dispatch

Ruby’s Object#send method offers an elegant alternative for invoking methods based on a command translating a symbol to a function dispatch. I was looking for similar functionality in Erlang and here’s what I came up with.

First, let’s see how we can achieve eval functionality in Erlang, i.e. evaluate strings as Erlang code at runtime.

-module (meta).
-export ([eval/2]).

eval(Code, Args) ->
  {ok, Scanned, _} = erl_scan:string(Code),
  {ok, Parsed} = erl_parse:parse_exprs(Scanned),
  Bindings = lists:foldl(fun ({Key, Val}, BindingsAccumulator) ->
    erl_eval:add_binding(Key, Val, BindingsAccumulator)
  end, erl_eval:new_bindings(), Args),
  {value, Result, _} = erl_eval:exprs(Parsed, Bindings),
  Result.

erl_scan is Erlang’s token scanner module. The string function tokenizes a list of characters. erl_parse is the Erlang parser module and the parse_exprs function parses a list of tokens, each Token representing an expression. It returns a list of the abstract forms of the parsed expressions, ready to be used with erl_eval, the Erlang meta interpreter. An arbitrary list of bindings can be provided alongside the parsed expressions to erl_eval:exprs.

With the meta:eval function in place, we can evaluate arbitrary strings of code at runtime.

Eshell V5.6  (abort with ^G)
1> c(meta).
{ok,meta}
2> meta:eval("20 + 30.", []).
50
3> meta:eval("A + B.", [{'A', 15}, {'B', 60}]).
75

We can build on meta:eval to achieve Ruby-like dynamic dispatches.

send(MethodName, Args) ->
  ArgNames = lists:foldl(fun ({K, _}, Acc) -> lists:append([K], Acc) end, [], Args),
  Code = atom_to_list(MethodName) ++ "(" ++ atom_join(ArgNames, $,) ++ ").",
  eval(Code, Args).

The send function takes two arguments, an atom which is the name of the function to be dispatched and a list of tuples with key/value pairs for the arguments to be passed to the function. atom_join joins a list of atoms into one string using the supplied separator.

atom_join([], _Sep) -> [];
atom_join(Items, Sep) -> lists:flatten(atom_join1(Items, Sep, [])).
atom_join1([Head | []], _Sep, Acc) -> [atom_to_list(Head) | Acc];
atom_join1([Head | Tail], Sep, Acc) -> atom_join1(Tail, Sep, [Sep, atom_to_list(Head) | Acc]).

Let’s add a couple of test functions to showcase what has been achieved.

hello() -> "hello, world".
hello(Who) -> "hello, " ++ Who.

Back to Eshell…

Eshell V5.6  (abort with ^G)
1> c(meta).
{ok,meta}
2> meta:send('meta:hello', []).
"hello, world"
3> meta:send('meta:hello', [{'Name', "rock"}]).
"hello, rock"

JSynthesis

A big thank you to Chris Barrett who has been taking the time to port Synthesis to Java.

JSynthesis is registered as a GoogleCode project and it will surely be integral to my toolkit next time I work on a Java project.