Over years of reviewing Ruby code, the same things tend to come up over and over. In this post, I’d like to address some of the most common code smells I find when reviewing OOP code (and Ruby code in particular).
I personally think "code smell" is a great name. As all great names, it needs little or no explanation. It’s something which smells bad in code, which simply means "bad code".
Wikipedia provides us with a more formal definition:
In computer programming, a code smell is any characteristic in the source code of a program that possibly indicates a deeper problem. Determining what is and is not a code smell is subjective, and varies by language, developer, and development methodology.
Each developer, language and even ecosystem has its own code smells. Some are more general than others. In this post I’ll talk about very generic smells which apply to any object-oriented language, be it Ruby, JavaScript, C# or PHP.
Polluting Instance Data
This is one of the most frequent smells I find when reviewing code.
Object-oriented languages isolate data into objects, and only expose that data to the external world through messages — also called methods:
class Point
attr_accessor :x, :y
def initialize(x, y)
@x = x
@y = y
end
def move(x, y)
@x = x
@y = y
end
end
point = Point.new(10, 15)
point.x # => 10
point.y # => 15
# the only way to change the point's coordinates is with this mesage
point.move(0, 0)
In the example above, our actual data is the X and Y coordinates for our little point. We can’t access them directly, so we do that through our point
object. Also, when we want to change them, we have to do that through the point’s move
message.
Objects not only have data, but also behavior which is tightly coupled with that data. In the example above, it makes sense to have the move
behaviour inside the Point
.
When dealing with objects, we don’t care much about the internal data or state, instead we care about the methods, or messages that it exposes. That’s the only way we can communicate with objects.
We use messages to ask them questions — list.empty?
— and make them do stuff — mailer.send
. (See Command/Query Separation for more on this, as it escapes the scope of this post).
So far, so good, but sometimes we end up adding more data than that which the object semantically represents. A common example of polluting the object’s data is adding helpers for calculations:
class Point
attr_accessor :x, :y
def initialize(x, y)
@x = x
@y = y
end
# ...
def distance(other)
@x_diff = other.x - x
@y_diff = other.y - y
Math.sqrt((@x_diff * @x_diff) + (@y_diff * @y_diff))
end
end
Here we are just adding a method to calculate the distance between two points.
The math is really not important, the thing to note here is that we are using instance variables where we could just get away with using regular variables.
Why are instance variables bad? The biggest issue is that they add actual data to the object which has little to do with its role. Why does a point care about the x_diff
with some other point? After the calculation is over, this data is useless.
Another issue with this is that this calculation data won’t be cleared up by the garbage collector: it will be re-defined every time the method gets called. This makes our objects bigger, and if we are going to instantiate thousands of them, it’s easy to end up with bloated systems.
Yet another issue is that the more state we have, the more likely bugs are to emerge. We want to have as little state as possible, and ideally, we don’t want it to change (but that’s a subject for another post).
Another common example of misusing instance data is using it to collect intermediate results:
class Importer
# ...
def import
import_categories
import_people
end
private
def import_categories
@categories = @xls_file.each do |category|
# ... import category
end
end
def import_people
@categories.each do |category|
# ... do some work importing people
end
end
end
Can you spot the problem here? What happens when you call #import_people
without having called #import_categories
first? It breaks.
Not only are we adding data which is not relevant to what our object represents, but we are also making our code more fragile.
The solution is very simple:
class Importer
# ...
def import
categories = import_categories
import_people(categories)
end
private
def import_categories
@xls_file.each do |category|
# ... import category
end
end
def import_people(categories)
categories.each do |category|
# ... do some work importing people
end
end
end
By passing intermediate values through parameters, we don’t need to pollute the object’s internal state.
Leaky Logic
Another very common smell I encounter often is leaking logic. This means an object’s logic is leaking to the outside world.
Luckily, this one is very easy to identify, and it normally comes in the form of: a) conditionality and b) serial method calls.
Let’s see a conditionality example:
if coffee.any?
coffee.drink
coffee.refill
end
It’s not necessarily a bad thing to ask an object questions, and although it’s not ideal (Tell, Don’t Ask), sometimes it makes sense to do so.
But problems can arise when the actions taken after asking an object something also send messages to the same object.
In the example above, we are asking coffee
a question, and then in the body of the conditional sending yet more messages to coffee
.
This logic could just live in the coffee
object:
class Coffee
def try_drink
return unless any?
drink
refill
end
end
# outside the class
coffee.try_drink
The second example is about serial method calls. This applies when calling several methods in a particular order:
file = File.open('some-file.txt')
file.write('something')
file.close
In the example above, you will always call open
first, write
second, and close
last. It’s not a big smell if it just happens once, but once you know this smell, it’s always a good idea to just refactor by defining a more specific method in your object:
File.with('some-file.txt') do |file|
file.write('something')
end
# implementation
class File
# ...
def self.with(file)
file = File.open(file)
yield file
file.close
end
# ...
end
This way you make sure the methods are always called in order and it’s not possible to forget to call a method. Grouping these methods into one makes sense, and it deserves to be its own message.
Note that it doesn’t always have to be like the example above–sometimes you just spot the same methods being called over and over again:
text.trim
text.escape_slashes
text.space_paragraphs
In this case, we could simply create a new method with a nice name:
text.format
# implementation
class Text
# ...
def format
trim
escape_slashes
space_paragraphs
end
end
Excessive Conditionals
This one could be its own blog post, but I’ll try to keep it concise this time. It’s about replacing conditionals with inheritance.
One of the most powerful features OOP offers is polymorphism: the ability to just send messages to objects and not care about their specific type at runtime.
In dynamic languages like Ruby, you don’t need any extra ceremony to do this. In statically-typed languages like Java, you need a common interface. Regardless of language, we want to use this power to abstract away conditionality.
Let me show you an example and let’s see all the advantages we get for free:
class Store
def save
puts 'saving' if debug_mode?
# ... do some work
end
end
The conditionality above might not seem like a big deal. Even if we have to repeat it in a few methods, it’s not the end of the world. But what if we have another requirement?
class Store
def save
if debug_mode?
if development?
puts 'saving'
elsif production?
Log.create('Saving')
end
end
# ... do some work
end
end
Now it’s starting to get messy. First of all, if we were to make this change, we’d have to make it in every single place where the logging was implemented, which could take considerable time.
We could extract it to its own method, but what if we want that functionality on other objects too?
Also, why should the store know about debug_mode?
, development?
, and production?
? It even knows how to create a special log using the Log
class, therefore introducing another dependency to our class.
It’s now clear that this doesn’t belong here. The first step is easy, move it to its own class:
class Store
attr_accessor :logger
def initialize(logger)
@logger = logger
end
def save
logger.log('saving')
# ... do some work
end
end
class Logger
def log(message)
if debug_mode?
if development?
puts message
elsif production?
Log.create(message)
end
end
end
# ...
end
logger = Logger.new
store = Store.new(logger)
Now, that is much better. At least we can test the logging functionality separately now. Still, because it has many conditionals, it’s a pain to test. Every conditional branches the code, and the amount of tests we have to write to cover all cases grows exponentially with each nested conditional.
Now for the interesting part:
class DevelopmentLogger
def log(message)
puts message
end
end
class ProductionLogger
def log(message)
Log.create(message)
end
end
class NullLogger
def log(messsage)
# NOOP
end
end
class LoggerFactory
def get
return NullLogger.new unless debug_mode?
return DevelopmentLogger.new if development?
return ProductionLogger.new if production?
raise 'Invalid environment'
end
end
logger = LoggerFactory.get
store = Store.new(logger)
Whoa, what happened? We created many specific logger objects, and we move the responsibility of knowing which logger to use to a factory object.
Now we can test each logger individually, and the instantiation logic happens in a single place–the factory–which can also be easily tested and reused.
Bonus
Alternatively, sometimes you can get away with automating the factory and completely removing the conditionality using convention over configuration.
Consider this example:
class User
# ...
def greet
if role == 'admin'
'Hello, admin.'
elsif role == 'user'
'Hello, user.'
elsif role == 'anonymous'
'Please sign in.'
end
end
end
We could use a switch statement instead, but you get the idea. Now we can refactor this as follows:
class AdminGreeter
def greet
'Hello, admin.'
end
end
class UserGreeter
def greet
'Hello, user.'
end
end
class AnonymousGreeter
def greet
'Please sign in.'
end
end
class GreeterFactory
def greeter_for(user)
"#{user.role.camelize}Greeter".constantize.new
rescue NameError
raise "Could not find greeter: `#{user.role.camelize}Greeter'."
end
end
class User
# ...
def greeter
@greeter ||= GreeterFactory.greeter_for(self)
end
def greet
greeter.greet
end
end
In this case, we can just try to load a class named <user_role>Greeter
. If we can find one, great. If it fails, we raise a helpful exception. The nice thing about this is that if we add more roles, all we need to do is implement a new class. For example, if we were to add a moderator
role, simply creating a ModeratorGreeter
class is all we’d need. So even if we forgot to write the new greeter class, the error message would be quite descriptive and help us immediately figure out what’s wrong.
That’s it!
It was just three code-smells this time, but I don’t want to make this too long. If you guys are interested, make sure to leave comments and I will continue this series. 🙂
Cheers!
Federico — Senior Web Developer @ Beezwax