Common Code Smells in OOP

Over years of reviewing Ruby code, the same things tend to come up over and over. In this post, I’d like to address some of the most common code smells I find when reviewing OOP code (and Ruby code in particular).

I personally think "code smell" is a great name. As all great names, it needs little or no explanation. It’s something which smells bad in code, which simply means "bad code".

Wikipedia provides us with a more formal definition:

In computer programming, a code smell is any characteristic in the source code of a program that possibly indicates a deeper problem. Determining what is and is not a code smell is subjective, and varies by language, developer, and development methodology.

Each developer, language and even ecosystem has its own code smells. Some are more general than others. In this post I’ll talk about very generic smells which apply to any object-oriented language, be it Ruby, JavaScript, C# or PHP.

Polluting Instance Data

This is one of the most frequent smells I find when reviewing code.

Object-oriented languages isolate data into objects, and only expose that data to the external world through messages — also called methods:

class Point
  attr_accessor :x, :y
  def initialize(x, y)
    @x = x
    @y = y
  end

  def move(x, y)
    @x = x
    @y = y
  end
end

point = Point.new(10, 15)
point.x # => 10
point.y # => 15
# the only way to change the point's coordinates is with this mesage
point.move(0, 0)

In the example above, our actual data is the X and Y coordinates for our little point. We can’t access them directly, so we do that through our point object. Also, when we want to change them, we have to do that through the point’s move message.

Objects not only have data, but also behavior which is tightly coupled with that data. In the example above, it makes sense to have the move behaviour inside the Point.

When dealing with objects, we don’t care much about the internal data or state, instead we care about the methods, or messages that it exposes. That’s the only way we can communicate with objects.

We use messages to ask them questions — list.empty? — and make them do stuff — mailer.send. (See Command/Query Separation for more on this, as it escapes the scope of this post).

So far, so good, but sometimes we end up adding more data than that which the object semantically represents. A common example of polluting the object’s data is adding helpers for calculations:

class Point
  attr_accessor :x, :y
  def initialize(x, y)
    @x = x
    @y = y
  end

  # ...

  def distance(other)
    @x_diff = other.x - x
    @y_diff = other.y - y
    Math.sqrt((@x_diff * @x_diff) + (@y_diff * @y_diff))
  end
end

Here we are just adding a method to calculate the distance between two points.

The math is really not important, the thing to note here is that we are using instance variables where we could just get away with using regular variables.

Why are instance variables bad? The biggest issue is that they add actual data to the object which has little to do with its role. Why does a point care about the x_diff with some other point? After the calculation is over, this data is useless.

Another issue with this is that this calculation data won’t be cleared up by the garbage collector: it will be re-defined every time the method gets called. This makes our objects bigger, and if we are going to instantiate thousands of them, it’s easy to end up with bloated systems.

Yet another issue is that the more state we have, the more likely bugs are to emerge. We want to have as little state as possible, and ideally, we don’t want it to change (but that’s a subject for another post).

Another common example of misusing instance data is using it to collect intermediate results:

class Importer
  # ...

  def import
    import_categories
    import_people
  end

  private

  def import_categories
    @categories = @xls_file.each do |category|
      # ... import category
    end
  end

  def import_people
    @categories.each do |category|
      # ... do some work importing people
    end
  end
end

Can you spot the problem here? What happens when you call #import_people without having called #import_categories first? It breaks.

Not only are we adding data which is not relevant to what our object represents, but we are also making our code more fragile.

The solution is very simple:

class Importer
  # ...

  def import
    categories = import_categories
    import_people(categories)
  end

  private

  def import_categories
    @xls_file.each do |category|
      # ... import category
    end
  end

  def import_people(categories)
    categories.each do |category|
      # ... do some work importing people
    end
  end
end

By passing intermediate values through parameters, we don’t need to pollute the object’s internal state.

Leaky Logic

Another very common smell I encounter often is leaking logic. This means an object’s logic is leaking to the outside world.

Luckily, this one is very easy to identify, and it normally comes in the form of: a) conditionality and b) serial method calls.

Let’s see a conditionality example:

if coffee.any?
  coffee.drink
  coffee.refill
end

It’s not necessarily a bad thing to ask an object questions, and although it’s not ideal (Tell, Don’t Ask), sometimes it makes sense to do so.

But problems can arise when the actions taken after asking an object something also send messages to the same object.

In the example above, we are asking coffee a question, and then in the body of the conditional sending yet more messages to coffee.

This logic could just live in the coffee object:

class Coffee
  def try_drink
    return unless any?

    drink
    refill
  end
end

# outside the class
coffee.try_drink

The second example is about serial method calls. This applies when calling several methods in a particular order:

file = File.open('some-file.txt')
file.write('something')
file.close

In the example above, you will always call open first, write second, and close last. It’s not a big smell if it just happens once, but once you know this smell, it’s always a good idea to just refactor by defining a more specific method in your object:

File.with('some-file.txt') do |file|
  file.write('something')
end

# implementation
class File
  # ...

  def self.with(file)
    file = File.open(file)
    yield file
    file.close
  end

  # ...
end

This way you make sure the methods are always called in order and it’s not possible to forget to call a method. Grouping these methods into one makes sense, and it deserves to be its own message.

Note that it doesn’t always have to be like the example above–sometimes you just spot the same methods being called over and over again:

text.trim
text.escape_slashes
text.space_paragraphs

In this case, we could simply create a new method with a nice name:

text.format

# implementation
class Text
  # ...

  def format
    trim
    escape_slashes
    space_paragraphs
  end
end

Excessive Conditionals

This one could be its own blog post, but I’ll try to keep it concise this time. It’s about replacing conditionals with inheritance.

One of the most powerful features OOP offers is polymorphism: the ability to just send messages to objects and not care about their specific type at runtime.

In dynamic languages like Ruby, you don’t need any extra ceremony to do this. In statically-typed languages like Java, you need a common interface. Regardless of language, we want to use this power to abstract away conditionality.

Let me show you an example and let’s see all the advantages we get for free:

class Store
  def save
    puts 'saving' if debug_mode?
    # ... do some work
  end
end

The conditionality above might not seem like a big deal. Even if we have to repeat it in a few methods, it’s not the end of the world. But what if we have another requirement?

class Store
  def save
    if debug_mode?
      if development?
        puts 'saving'
      elsif production?
        Log.create('Saving')
      end
    end
    # ... do some work
  end
end

Now it’s starting to get messy. First of all, if we were to make this change, we’d have to make it in every single place where the logging was implemented, which could take considerable time.

We could extract it to its own method, but what if we want that functionality on other objects too?

Also, why should the store know about debug_mode?, development?, and production?? It even knows how to create a special log using the Log class, therefore introducing another dependency to our class.

It’s now clear that this doesn’t belong here. The first step is easy, move it to its own class:

class Store
  attr_accessor :logger
  def initialize(logger)
    @logger = logger
  end

  def save
    logger.log('saving')
    # ... do some work
  end
end

class Logger
  def log(message)
    if debug_mode?
      if development?
        puts message
      elsif production?
        Log.create(message)
      end
    end
  end

  # ...
end

logger = Logger.new
store = Store.new(logger)

Now, that is much better. At least we can test the logging functionality separately now. Still, because it has many conditionals, it’s a pain to test. Every conditional branches the code, and the amount of tests we have to write to cover all cases grows exponentially with each nested conditional.

Now for the interesting part:

class DevelopmentLogger
  def log(message)
    puts message
  end
end

class ProductionLogger
  def log(message)
    Log.create(message)
  end
end

class NullLogger
  def log(messsage)
    # NOOP
  end
end

class LoggerFactory
  def get
    return NullLogger.new unless debug_mode?
    return DevelopmentLogger.new if development?
    return ProductionLogger.new if production?
    raise 'Invalid environment'
  end
end

logger = LoggerFactory.get
store = Store.new(logger)

Whoa, what happened? We created many specific logger objects, and we move the responsibility of knowing which logger to use to a factory object.

Now we can test each logger individually, and the instantiation logic happens in a single place–the factory–which can also be easily tested and reused.

Bonus

Alternatively, sometimes you can get away with automating the factory and completely removing the conditionality using convention over configuration.

Consider this example:

class User
  # ...

  def greet
    if role == 'admin'
      'Hello, admin.'
    elsif role == 'user'
      'Hello, user.'
    elsif role == 'anonymous'
      'Please sign in.'
    end
  end
end

We could use a switch statement instead, but you get the idea. Now we can refactor this as follows:

class AdminGreeter
  def greet
    'Hello, admin.'
  end
end

class UserGreeter
  def greet
    'Hello, user.'
  end
end

class AnonymousGreeter
  def greet
    'Please sign in.'
  end
end

class GreeterFactory
  def greeter_for(user)
    "#{user.role.camelize}Greeter".constantize.new
  rescue NameError
    raise "Could not find greeter: `#{user.role.camelize}Greeter'."
  end
end

class User
  # ...

  def greeter
    @greeter ||= GreeterFactory.greeter_for(self)
  end

  def greet
    greeter.greet
  end
end

In this case, we can just try to load a class named <user_role>Greeter. If we can find one, great. If it fails, we raise a helpful exception. The nice thing about this is that if we add more roles, all we need to do is implement a new class. For example, if we were to add a moderator role, simply creating a ModeratorGreeter class is all we’d need. So even if we forgot to write the new greeter class, the error message would be quite descriptive and help us immediately figure out what’s wrong.

That’s it!

It was just three code-smells this time, but I don’t want to make this too long. If you guys are interested, make sure to leave comments and I will continue this series. 🙂

Cheers!

Federico — Senior Web Developer @ Beezwax

Leave a Reply