Polished Ruby Programming

1 (1 reviews total)
By Jeremy Evans
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Chapter 1: Getting the Most out of Core Classes

About this book

Most successful Ruby applications become difficult to maintain over time as the codebase grows in size. Polished Ruby Programming provides you with recommendations and advice for designing Ruby programs that are easy to maintain in the long term.

This book takes you through implementation approaches for many common programming situations, the trade-offs inherent in each approach, and why you may choose to use different approaches in different situations. You'll start by learning fundamental Ruby programming principles, such as correctly using core classes, class and method design, variable usage, error handling, and code formatting. Moving on, you'll learn higher-level programming principles, such as library design, use of metaprogramming and domain-specific languages, and refactoring. Finally, you'll learn principles specific to web application development, such as how to choose a database and web framework, and how to use advanced security features.

By the end of this Ruby programming book, you'll have gained the skills you need to design robust, high-performance, scalable, and maintainable Ruby applications.

While most code examples and principles discussed in the book apply to all Ruby versions, some examples and principles are specific to Ruby 3.0, the latest release at the time of publication.

Publication date:
July 2021
Publisher
Packt
Pages
434
ISBN
9781801072724

 

Chapter 1: Getting the Most out of Core Classes

Ruby is shipped with a rich library of core classes. Almost all Ruby programmers are familiar with the most common core classes, and one of the easiest ways to make your code intuitive to most Ruby programmers is to use these classes.

In the rest of this chapter, you'll learn more about commonly encountered core classes, as well as principles for how to best use each class. We will cover the following topics:

  • Learning when to use core classes
  • Best uses for true, false, and nil objects
  • Different numeric types for different needs
  • Understanding how symbols differ from strings
  • Learning how best to use arrays, hashes, and sets
  • Working with Struct – one of the underappreciated core classes

By the end of this chapter, you'll have a better understanding of many of Ruby's core classes, and how best to use each of them.

 

Technical requirements

In this chapter and all chapters of this book, code given in code blocks is designed to execute on Ruby 3.0. Many of the code examples will work on earlier versions of Ruby, but not all. You will find the code files on GitHub at https://github.com/PacktPublishing/Polished-Ruby-Programming/tree/main/Chapter01.

 

Learning when to use core classes

Let's consider the following Ruby code:

things = ["foo", "bar", "baz"]
things.each do |thing|
  puts thing
end

If you have come across this code, then you probably have an immediate understanding of what the code does. However, let's say you come across the following Ruby code:

things = ThingList.new("foo", "bar", " baz")
things.each do |thing|
  puts thing
end

You can probably guess what it does, but to be sure, you need to know about the ThingList class and how it is implemented. What does ThingList.new do? Does it use its arguments directly or does it wrap them in other objects? What does the ThingList#each method yield? Does it yield the same objects passed into the constructor, or other objects? When you come across code like this, your initial assumption may be that it would yield other objects and not the objects passed into the constructor, because why else would you have a class that duplicates the core Array class?

A good general principle is to only create custom classes when the benefits outweigh the costs. When deciding whether to use a core class or a custom class, you should understand the trade-off you are making. With core classes, your code is often more intuitive, and in general will perform better, since using core classes directly results in less indirection. With custom classes, you are able to encapsulate your logic, which can lead to more maintainable code in the long term, if you have to make changes. In many cases, you won't have to make changes in the future, and the benefits of encapsulation are not greater than the loss of intuition and performance. If you aren't sure whether to use a custom class or a core class, a good general principle is to start with the use of core classes, and only add a custom class when you see a clear advantage in doing so.

 

Best uses for true, false, and nil objects

The simplest Ruby objects are true and false. In general, if true and false will meet your needs, you should use them. true and false are the easiest objects to understand.

There are a few cases where you will want to return true or false and not other objects. Most Ruby methods ending with ? should return true or false. In general, the Ruby core methods use the following approach:

1.kind_of?(Integer)
# => true

Similarly, equality and inequality operator methods should return true or false:

1 > 2
# => false
1 == 1
# => true

A basic principle when writing Ruby is to use true or false whenever they will meet your needs, and only reach for more complex objects in other cases.

The nil object is conceptually more complex than either true or false. As a concept, nil represents the absence of information. nil should be used whenever there is no information available, or when something requested cannot be found. Ruby's core classes use nil extensively to convey the absence of information:

[].first
# => nil
{1=>2}[3]
# => nil

While true is the opposite of false and false is the opposite of true, nil is sort of the opposite of everything not true or false. This isn't literally true in Ruby, because NilClass#! returns true and BasicObject#! returns false:

!nil
# => true
!1
# => false

However, nil being the opposite of everything not true or false is true conceptually. In general, if you have a Ruby method that returns true in a successful case, it should return false in the unsuccessful case. If you have a Ruby method that returns an object that is not true or false in a successful case, it should return nil in the unsuccessful case (or raise an exception, but that's a discussion for Chapter 5, Handling Errors).

Ruby's core classes also use nil as a signal that a method that modifies the receiver did not make a modification:

"a".gsub!('b', '')
# => nil
[2, 4, 6].select!(&:even?)# => nil
["a", "b", "c"].reject!(&:empty?)# => nil

The reason for this behavior is optimization, so if you only want to run code if the method modified the object, you can use a conditional:

string = "..."
if string.gsub!('a', 'b')
  # string was modified
end

The trade-off here is that you can no longer use these methods in method chaining, so the following code doesn't work:

string.
  gsub!('a', 'b').
  downcase!

Because gsub! can return nil, if the string doesn't contain "a", then it calls nil.downcase!, which raises a NoMethodError exception. So, Ruby chooses a trade-off that allows higher performance but sacrifices the ability to safely method chain. If you want to safely method chain, you need to use methods that return new objects, which are going to be slower as they allocate additional objects that need to be garbage collected. When you design your own methods, you'll also have to make similar decisions, which you will learn more about in Chapter 4, Methods and Their Arguments.

One of the issues you should be aware of when using nil and false in Ruby is that you cannot use the simple approach of using the ||= operator for memoization. In most cases, if you can cache the result of an expression, you can use the following approach:

@cached_value ||= some_expression
# or
cache[:key] ||= some_expression

This works for most Ruby objects because the default value of @cached_value will be nil, and as long as some_expression returns a value that is not nil or false, it will only be called once. However, if some_expression returns a nil or false value, it will continue to be called until it returns a value that is not nil or false, which is unlikely to be the intended behavior. When you want to cache an expression that may return nil or false as a valid value, you need to use a different implementation approach.

If you are using a single instance variable for the cached value, it is simplest to switch to using defined?, although it does result in more verbose code:

if defined?(@cached_value)
  @cached_value
else
  @cached_value = some_expression
end

If you are using a hash to store multiple cached values, it is simplest to switch to using fetch with a block:

  cache.fetch(:key){cache[:key] = some_expression}

One advantage of using true, false, and nil compared to most other objects in Ruby is that they are three of the immediate object types. Immediate objects in Ruby are objects that do not require memory allocation to create and memory indirection to access, and as such they are generally faster than non-immediate objects.

In this section, you learned about the simplest objects, true, false, and nil. In the next section, you'll learn about how best to use each of Ruby's numeric types.

 

Different numeric types for different needs

Ruby has multiple core numeric types, such as integers, floats, rationals, and BigDecimal, with integers being the simplest type. As a general principle when programming, it's best if you keep your design as simple as possible, and only add complexity when necessary. Applying the principle to Ruby, if you need to choose a numeric type, you should generally use an integer unless you need to deal with fractional numbers.

Note that while this chapter is supposed to discuss core classes, BigDecimal is not a core class, though it is commonly used. BigDecimal is in the standard library, and you need to add require 'bigdecimal' to your code before you can use it.

Integers are the simplest numeric types, but they are surprisingly powerful in Ruby compared to many other programming languages. One example of this is executing a block of code a certain number of times. In many other languages, this is either done with the equivalent of a for loop or using a range, but in Ruby, it is as simple as calling Integer#times:

10.times do
  # executed 10 times
end

One thing that trips up many new Ruby programmers is how division works when both the receiver and the argument are integers. Ruby is similar to C in how integer division is handled, returning only the quotient and dropping any remainder:

5 / 10
# => 0
7 / 3
# => 2

Any time you are considering using division in your code and both arguments could be integers, be aware of this issue and consider whether you would like to use integer division. If not, you should convert the numerator or denominator to a different numeric type so that the division operation will include the remainder:

5 / 10r # or Rational(5, 10) or 5 / 10.to_r
# => (1/2)
7.0 / 3
# => 2.3333333333333335

In cases where your numeric type needs to include a fractional component, you have three main choices, floats, rationals, or BigDecimal, each with its own trade-offs. Floats are fastest but not exact in many cases, as shown in the earlier example. Rationals are exact but not as fast. BigDecimal is exact in most cases, and most useful when dealing with a fixed precision, such as two digits after the decimal point, but is generally the slowest.

Floats are the fastest and most common fractional numeric type, and they are the type Ruby uses for literal values such as 1.2. In most cases, it is fine to use a float, but you should make sure you understand that they are not an exact type. Repeated calculations on float values result in observable issues:

f = 1.1
v = 0.0
1000.times do
  v += f
end
v
# => 1100.0000000000086

Where did the .0000000000086 come from? This is the error in the calculation that accumulates because each Float#+ calculation is inexact. Note that this issue does not affect all floats:

f = 1.109375
v = 0.0
1000.times do
  v += f
end
v
# => 1109.375

This is slightly counter-intuitive to many programmers, because 1.1 looks like a much simpler number than 1.109375. The reason for this is due to the implementation of floats and the fact that computers operate in binary and not in decimal, and 0.109375 can be stored exactly in binary (it is 7/64ths of 1), but 1.1 cannot be stored exactly in binary.

Rationals are slower than floats, but since they are exact numbers, you don't need to worry about calculations introducing errors. Here's the first example using the r suffix to the number so that Ruby parses the number as a rational:

f = 1.1r
v = 0.0r
1000.times do
  v += f
end
v
# => (1100/1)

Here, we get 1100 exactly as a rational, showing there is no error. Let's use the same approach with the second example:

f = 1.109375r
v = 0.0r
1000.times do
  v += f
end
v
# => (8875/8)
v.to_f
# => 1109.375

As shown in the previous example, rationals are stored as an integer numerator and denominator, and inspecting the output reflects that. This can make debugging with them a little cumbersome, as you often need to convert them to floats for human-friendly decimal output.

While rationals are slower than floats, they are not orders of magnitude slower. They are about 2-6 times slower depending on what calculations you are doing. So, do not avoid the use of rationals on a performance basis unless you have profiled them and determined they are a bottleneck (you'll learn about that in Chapter 14, Optimizing Your Library).

A good general principle is to use a rational whenever you need to do calculations with non-integer values and you need exact answers. For cases where exactness isn't important, or you are only doing comparisons between numbers and not calculations that result in an accumulated error, it is probably better to use floats.

BigDecimal is similar to rationals in that it is an exact type in most cases, but it is not exact when dealing with divisions that result in a repeating decimal:

v = BigDecimal(1)/3
v * 3
# => 0.999999999999999999e0

However, other than divisions involving repeating decimals and exponentiation, BigDecimal values are exact. Let's take the first example, but make both arguments BigDecimal instances:

f = BigDecimal(1.1, 2)
v = BigDecimal(0)
1000.times do
  v += f
end
v
# => 0.11e4
v.to_s('F')
# => "1100.0"

So, as you can see, no error is introduced when using repeated addition on BigDecimal, similar to rationals. You can also see that inspecting the output is less helpful since BigDecimal uses a scientific notation. BigDecimal does have the advantage that it can produce human-friendly decimal string output directly without converting the object to a float first.

If we try the same approach with the second example, we can see that it also produces exact results:

f = BigDecimal(1.109375, 7)
v = BigDecimal(0)
1000.times do
  v += f
end
v
# => 0.1109375e4
v.to_s('F')
# => "1109.375"

As both examples show, one issue with using a BigDecimal that is created from floats or rationals is that you need to manually specify the initial precision. It is more common to initialize BigDecimal values from integers or strings, to avoid the need to manually specify the precision.

BigDecimal is significantly slower than floats and rationals for calculations. Due to the trade-offs inherent in BigDecimal, a good general principle is to use BigDecimal only when dealing with other systems that support similar types, such as fixed precision numeric types in many databases, or when dealing with other fixed precision areas such as monetary calculations. For most other cases, it's generally better to use a rational or float.

Of the numeric types, most integer and float values are immediate objects, which is one of the reasons why they are faster than other types. However, large integer and float values are too large to be immediate objects (which must fit in 8 bytes if using a 64-bit CPU). Rationals and BigDecimal are never immediate objects, which is one reason why they are slower.

In this section, you learned about Ruby's many numeric types and how best to use each. In the next section, you'll learn how symbols are very different from strings, and when to use each.

 

Understanding how symbols differ from strings

One of the most useful but misunderstood aspects of Ruby is the difference between symbols and strings. One reason for this is there are certain methods of Ruby that deal with symbols, but will still accept strings, or perform string-like operations on a symbol. Another reason is due to the popularity of Rails and its pervasive use of ActiveSupport::HashWithIndifferentAccess, which allows you to use either a string or a symbol for accessing the same data. However, symbols and strings are very different internally, and serve completely different purposes. However, Ruby is focused on programmer happiness and productivity, so it will often automatically convert a string to a symbol if it needs a symbol, or a symbol to a string if it needs a string.

A string in Ruby is a series of characters or bytes, useful for storing text or binary data. Unless the string is frozen, you append to it, modify existing characters in it, or replace it with a different string.

A symbol in Ruby is a number with an attached identifier that is a series of characters or bytes. Symbols in Ruby are an object wrapper for an internal type that Ruby calls ID, which is an integer type. When you use a symbol in Ruby code, Ruby looks up the number associated with that identifier. The reason for having an ID type internally is that it is much faster for computers to deal with integers instead of a series of characters or bytes. Ruby uses ID values to reference local variables, instance variables, class variables, constants, and method names.

Say you run Ruby code as follows:

foo.add(bar)

Ruby will parse this code, and for foo, add, and bar, it will look up whether it already has an ID associated with the identifier. If it already has an ID, it will use it; otherwise, it will create a new ID value and associate it with the identifier. This happens during parsing and the ID values are hardcoded into the VM instructions.

Say you run Ruby code as follows:

method = :add
foo.send(method, bar)

Ruby will parse this code, and for method, add, foo, send, and bar, Ruby will also look up whether it already has an ID associated with the identifier, or create a new ID value to associate with the identifier if it does not exist. This approach is slightly slower as Ruby will create a local variable and there is additional indirection as send has to look up the method to call dynamically. However, there are no calls at runtime to look up an ID value.

Say you run Ruby code as follows:

method = "add"
foo.send(method, bar)

Ruby will parse this code, and for method, foo, send, and bar, Ruby will also look up whether it already has an ID associated with the identifier, also creating the ID if it doesn't exist. However, during parsing, Ruby does not create an ID value for add because it is a string and not a symbol. However, when send is called at runtime, method is a string value, and send needs a symbol. So, Ruby will dynamically look up and see whether there is an ID associated with the add identifier, raising a NoMethodError if it does not exist. This ID lookup will happen every time the send method is called, making this code even slower.

So, while it looks like symbols and strings are as interchangable as the method argument to send, this is only because Ruby tries to be friendly to the programmer and accept either. The send method needs to work with an ID, and it is better for performance to use a symbol, which is Ruby's representation of an ID, as opposed to a string, which Ruby must perform substantial work on to convert to an ID.

This not only affects Kernel#send but also affects most similar methods where identifiers are passed dynamically, such as Module#define_method, Kernel#instance_variable_get, and Module#const_get. The general principle when using these methods in Ruby code is always to pass symbols to them, since it results in better performance.

The previous examples show that when Ruby needs a symbol, it will often accept a string and convert it for the programmer's convenience. This allows strings to be treated as symbols in certain cases. There are opposite cases, where Ruby allows symbols to be treated as strings for the programmer's convenience.

For example, while symbols represent integers attached to a series of characters or bytes, Ruby allows you to perform operations on symbols such as <, >, and <=>, as if they were strings, where the result does not depend on the symbol's integer value, but on the string value of the name attached to the symbol. Again, this is Ruby doing so for the programmer's convenience. For example, consider the following line of code:

object.methods.sort

This results in a list sorted by the name of the method, since that is the most useful for the programmer. In this case, Ruby needs to operate on the string value of the symbol, which has similar performance issues as when Ruby needs to convert a string to a symbol internally.

There are many other methods on Symbol that operate on the internal string associated with the symbol. Some methods, such as downcase, upcase, and capitalize, return a symbol by internally operating on the string associated with the symbol, and then converting the resulting value back to a symbol. For example, symbol.downcase basically does symbol.to_s.downcase.to_sym. Other methods, such as [], size, and match, operate on the string associated with the symbol, such as symbol.size being shorthand for symbol.to_s.size.

In all of these cases, it is possible to determine what Ruby natively wants. If Ruby needs an internal identifier, it will natively want a symbol, and only accept a string by converting it. If Ruby needs to operate on text, it will natively want a string, and only accept a symbol by converting it.

So, how does the difference between a symbol and string affect your code? The general principle is to be like Ruby, and use symbols when you need an identifier in your code, and strings when you need text or data. For example, if you need to accept a configuration value that can only be one of three options, it's probably best to use a symbol:

def switch(value)
  case value
  when :foo
    # foo
  when :bar
    # bar
  when :baz
    # baz
  end
end

However, if you are dealing with text or data, you should accept a string and not a symbol:

def append2(value)
  value.gsub(/foo/, "bar")
end

You should consider whether you want to be as flexible as many Ruby core methods, and automatically convert a string to a symbol or vice versa. If you are internally treating symbols and strings differently, you should definitely not perform automatic conversion. However, if you are only dealing with one of the types, then you have to decide how to handle it. Automatically converting the type is worse for performance, and results in less flexible internals, since you need to keep supporting both types for backward compatibility. Not automatically converting the type is better for performance, and results in more flexible internals, since you are not obligated to support both types. However, it means that users of your code will probably get errors if they pass in a type that is not expected. Therefore, it is important to understand the trade-off inherent in the decision of whether to convert both types. If you aren't sure which trade-off is better, start by not automatically converting, since you can always add automatic conversion later if needed.

In this section, you learned the important difference between symbols and strings, and when it is best to use each. In the next section, you'll learn how best to use Ruby's core collection classes.

 

Learning how best to use arrays, hashes, and sets

Ruby's collection classes are one of the reasons why it is such a joy to program in Ruby. In most cases, the choice of collection class to use is fairly straightforward. If you need a simple list of values that you are iterating over, or using the collection as a queue or a stack, you generally use an array. If you need a mapping of one or more objects to one or more objects, then you generally use a hash. If you have a large list of objects and want to see whether a given object is contained in it, you generally use a set.

In some cases, it's fine to use either an array or a hash. Often, when iterating over a small list, you could use the array approach:

[[:foo, 1], [:bar, 3], [:baz, 7]].each do |sym, i|
  # ...
end

Or, you could use the hash approach:

{foo: 1, bar: 3, baz: 7}.each do |sym, i|
  # ...
end

Since you are not indexing into the collection, the simpler approach from a design perspective is to use an array. However, because the hash approach is syntactically simpler, the idiomatic way to handle this in Ruby is to use a hash.

For more complex mapping cases, you often want to use a hash, but you may need to decide how to structure the hash. This is especially true when you are using complex keys. Let's take a deeper look at the differences between arrays, hashes, and sets by working through an example that implements an in-memory database.

Implementing an in-memory database

While many programmers often use a SQL database for data storage, there are many cases when you need to build a small, in-memory database using arrays, hashes, and sets. Often, even when you have the main data stored in a SQL database, it is faster to query the SQL database to retrieve the information, and use that to build an in-memory database for the specific class or method you are designing. This allows you to query the in-memory database with similar speed as a hash or array lookup, orders of magnitude faster than a SQL database query.

Let's say you have a list of album names, track numbers, and artist names, where you can have multiple artists for the same album and track. You want to design a simple lookup system so that given an album name, you can find all artists who worked on any track of the album, and given an album name and track number, you can find the artists who worked on that particular track.

In the following examples, you should assume that album_infos is an arbitrary object that has each method that yields the album name, track number, and artist. However, if you would like to have some sample data to work with:

album_infos = 100.times.flat_map do |i|
  10.times.map do |j|
    ["Album #{i}", j, "Artist #{j}"]
  end
end

One approach for handling this is to populate two hashes, one keyed by album name, and one keyed by an array of the album name and track number. Populating these two hashes is straightforward, by setting the value for the key to an empty array if the key doesn't exist, and then appending the artist name. Then you need to make sure the artist values are unique for the hash keyed just by album name:

album_artists = {}
album_track_artists = {}
album_infos.each do |album, track, artist|
  (album_artists[album] ||= []) << artist
  (album_track_artists[[album, track]] ||= []) << artist
end
album_artists.each_value(&:uniq!)

With this approach, looking up values is fairly straightforward, and just involves looking in the appropriate hash with the appropriate key:

lookup = ->(album, track=nil) do
  if track
    album_track_artists[[album, track]]
  else
    album_artists[album]
  end
end

An alternative approach would be to use a nested hash approach, with each album having a hash of tracks:

albums = {}
album_infos.each do |album, track, artist|
  ((albums[album] ||= {})[track] ||= []) << artist
end

With this approach, looking up values is more complex, especially in the case where a track number is not provided, and you have to dynamically create the list:

lookup = ->(album, track=nil) do
  if track
    albums.dig(album, track)
  else
    a = albums[album].each_value.to_a
    a.flatten!
    a.uniq!
    a
  end
end

In general, the first approach using multiple hashes is going to take significantly more memory than the second approach if there is a large number of albums, but it will have a much better lookup performance for albums. The first approach will also take much more time to populate the data structure. The second approach is much lighter on memory and has better lookup performance for albums with tracks as it avoids an array allocation, but will exhibit a far more inferior performance for albums.

Each of these approaches does not depend on the types of objects that album_infos.each yields. You probably made the reasonable assumption that album and artist would be strings, and track would be a number. Let's say you knew in advance that the track number was an integer between 1 and 99. You could use that information to design a different approach. You could still have a single of hash keyed by album name, with a value being an array containing arrays of artist names for each track. Since tracks only go from 1 to 99, you could use the 0 index in the array to store all artist names for the album. Populating this combination of hash and array of arrays isn't too difficult:

albums = {}
album_infos.each do |album, track, artist|
   album_array = albums[album] ||= [[]]
   album_array[0] << artist
   (album_array[track] ||= []) << artist  
end
albums.each_value do |array|
  array[0].uniq!
end

This approach is more memory-efficient than either of the previous approaches, and looking up values is very simple and never allocates an object:

lookup = ->(album, track=0) do
  albums.dig(album, track)
end

Compared to the previous two approaches, this approach uses about the same amount of memory as the nested hash approach. It takes slightly more time to populate compared to the nested hash approach. It is almost as fast as the two hash approach in terms of lookup performance for albums, and is the fastest approach for lookup performance by albums with tracks.

Maybe the needs of your application change, and now you need a feature that allows users to enter a list of artist names, and will return an array with only the artist names that the application knows are on one of the albums. One way to handle this is to store the artists in an array:

album_artists = album_infos.flat_map(&:last)
album_artists.uniq!

The lookup can use an array intersection to determine the values:

lookup = ->(artists) do
  album_artists & artists 
end

The problem with this approach is that Array#& uses a linear search of the array, so this approach is very slow for a large number of artists.

A better performing approach would use a hash, keyed by the artist name:

album_artists = {}
album_infos.each do |_, _, artist|
  album_artists[artist] ||= true
end

The lookup can use the hash to filter the values in the submitted array:

lookup = ->(artists) do
  artists.select do |artist|
    album_artists[artist]
  end
end

This approach performs much better. The code isn't as simple, though it isn't too bad. However, it would be nicer to have simpler code that performed as well. Thankfully, the Ruby Set class can meet this need. Like BigDecimal, Set is not currently a core Ruby class. Set is in the standard library, and you can load it via require 'set'. However, Set may be moved from the standard library to a core class in a future version of Ruby. Using a set is pretty much as simple as using an array in terms of populating the data structure:

album_artists = Set.new(album_infos.flat_map(&:last))

You don't need to manually make the array unique, because the set automatically ignores duplicate values. The lookup code can stay exactly the same as the array case:

lookup = ->(artists) do
  album_artists & artists 
end

Of the three approaches, the hash approach is the fastest to populate and the fastest to look up. The Set approach is much faster to look up than the array approach, but still significantly slower than hash. Set is actually implemented using a hash internally, so in general, it will perform worse than using a hash directly. As a general rule, you should only use a set for code that isn't performance-sensitive and you would like to use a nicer API. For any performance-sensitive code, you should prefer using a hash directly.

In this section, you learned about Ruby's core collection of classes, arrays, hashes, and sets. In the next section, you'll learn about Struct, one of Ruby's underappreciated core classes.

 

Working with Struct – one of the underappreciated core classes

The Struct class is one of the underappreciated Ruby core classes. It allows you to create classes with one or more fields, with accessors automatically created for each field. So, say you have the following:

class Artist
  attr_accessor :name, :albums
  def initialize(name, albums)
    @name = name
    @albums = albums
  end
end

Instead of that, you can write a small amount of Ruby code, and have the initializer and accessor automatically created:

Artist = Struct.new(:name, :albums)

In general, a Struct class is a little lighter on memory than a regular class, but has slower accessor methods. Struct used to be faster in terms of both initialization and reader methods in older versions of Ruby, but regular classes and attr_accessor methods have gotten faster at a greater rate than Struct has. Therefore, for maximum performance, you may want to consider using regular classes and attr_accessor methods instead of Struct classes.

One of the more interesting aspects of Struct is how it works internally. For example, unlike the new method for most other classes, Struct.new does not return a Struct instance; it returns a Struct subclass:

Struct.new(:a, :b).class
# => Class

However, the new method on the subclass creates instances of the subclass; it doesn't create future subclasses. Additionally, if you provide a string and not a symbol as the first argument, Struct will automatically create the class using that name nested under its own namespace:

Struct.new('A', :a, :b).new(1, 2).class
# => Struct::A

A simplified version of the default Struct.new method is similar to the following. This example is a bit larger, so we'll break it into sections. If a string is given as the first argument, it is used to set the class in the namespace of the receiver; otherwise, it is added to the list of fields:

def Struct.new(name, *fields)
  unless name.is_a?(String)
    fields.unshift(name)
    name = nil
  end

Next, a subclass is created. If a class name was given, it is set as a constant in the current namespace:

  subclass = Class.new(self)
  if name
    const_set(name, subclass)
  end

Then, some internal code is run to set up the storage for the members of the subclass. Then, the new, allocate, [], members, and inspect singleton methods are defined on the subclass. Finally, some internal code is run to set up accessor instance methods for each member of the subclass:

  # Internal magic to setup fields/storage for subclass
  def subclass.new(*values)
    obj = allocate
    obj.initialize(*values)
    obj
  end
  # Similar for allocate, [], members, inspect
  # Internal magic to setup accessor instance methods
  subclass
end

Interestingly, you can still create Struct subclasses the normal way:

class SubStruct < Struct
end

Struct subclasses created via the normal way operate like Struct itself, not like Struct subclasses created via Struct.new. You can then call new on the Struct subclass to create a subclass of that subclass, but the setup is similar to a Struct subclass created via Struct.new:

SubStruct.new('A', :a, :b).new(1, 2).class
# => SubStruct::A

In general, Struct is good for creating simple classes that are designed for storing data. One issue with Struct is that the design encourages the use of mutable data and discourages a functional approach, by defaulting to creating setter methods for every member. However, it is possible to easily force the use of immutable structs by freezing the object in initialize:

A = Struct.new(:a, :b) do
  def initialize(...)
    super
    freeze
  end
end

There have been feature requests submitted on the Ruby issue tracker to create immutable Struct subclasses using a keyword argument to Struct.new or via the addition of a separate Struct::Value class. However, as of Ruby 3, neither feature request has been accepted. It is possible that a future version of Ruby will include them, but in the meantime, freezing the receiver in initialize is the best approach.

 

Summary

In this chapter, you've learned about the core classes. You've learned about issues with true, false, and nil, and how best to use Ruby's numeric types. You've learned why the difference between symbols and strings is important. You've learned how best to use arrays, hashes, and sets, and when it makes sense to use your own custom structs.

In the next chapter, you'll build on this knowledge of the core classes and learn about constructing your own custom classes.

 

Questions

  1. How are nil and false different from all other objects?
  2. Are all standard arithmetic operations using two BigDecimal objects exact?
  3. Would it make sense for Ruby to combine symbols and strings?
  4. Which uses less memory for the same data-hash, or Set?
  5. What are the only two core methods that return a new instance of Class?
 

Further reading

These books will also be applicable to all other chapters in this book, but are only listed in this chapter to reduce duplication:

About the Author

  • Jeremy Evans

    Jeremy Evans is a Ruby committer who focuses on fixing bugs in Ruby, as well as improving the implementation of Ruby. He is the maintainer of many popular Ruby libraries, including the fastest web framework (Roda) and fastest database library (Sequel). His libraries are known not just for their performance, but also for their code quality, understandability, documentation, and how quickly any bugs found are fixed. For his contributions to Ruby and the Ruby community, he has received multiple awards, such as receiving the prestigious RubyPrize in 2020 and being chosen as a Ruby Hero in 2015. He has given presentations at over 20 Ruby conferences. In addition to working on Ruby, he is also a committer for the OpenBSD operating system.

    Browse publications by this author

Latest Reviews

(1 reviews total)
Download link said not available :(
Polished Ruby Programming
Unlock this book and the full library for FREE
Start free trial