Chapter 1: Getting the Most out of Core Classes
Ruby is shipped with a rich library of core classes. Almost all Ruby programmers are familiar with the most common core classes, and one of the easiest ways to make your code intuitive to most Ruby programmers is to use these classes.
In the rest of this chapter, you'll learn more about commonly encountered core classes, as well as principles for how to best use each class. We will cover the following topics:
- Learning when to use core classes
- Best uses for
true
,false
, andnil
objects - Different numeric types for different needs
- Understanding how symbols differ from strings
- Learning how best to use arrays, hashes, and sets
- Working with
Struct
– one of the underappreciated core classes
By the end of this chapter, you'll have a better understanding of many of Ruby's core classes, and how best to use each of them.
Technical requirements
In this chapter and all chapters of this book, code given in code blocks is designed to execute on Ruby 3.0. Many of the code examples will work on earlier versions of Ruby, but not all. You will find the code files on GitHub at https://github.com/PacktPublishing/Polished-Ruby-Programming/tree/main/Chapter01.
Learning when to use core classes
Let's consider the following Ruby code:
things = ["foo", "bar", "baz"] things.each do |thing| puts thing end
If you have come across this code, then you probably have an immediate understanding of what the code does. However, let's say you come across the following Ruby code:
things = ThingList.new("foo", "bar", " baz") things.each do |thing| puts thing end
You can probably guess what it does, but to be sure, you need to know about the ThingList
class and how it is implemented. What does ThingList.new
do? Does it use its arguments directly or does it wrap them in other objects? What does the ThingList#each
method yield? Does it yield the same objects passed into the constructor, or other objects? When you come across code like this, your initial assumption may be that it would yield other objects and not the objects passed into the constructor, because why else would you have a class that duplicates the core Array
class?
A good general principle is to only create custom classes when the benefits outweigh the costs. When deciding whether to use a core class or a custom class, you should understand the trade-off you are making. With core classes, your code is often more intuitive, and in general will perform better, since using core classes directly results in less indirection. With custom classes, you are able to encapsulate your logic, which can lead to more maintainable code in the long term, if you have to make changes. In many cases, you won't have to make changes in the future, and the benefits of encapsulation are not greater than the loss of intuition and performance. If you aren't sure whether to use a custom class or a core class, a good general principle is to start with the use of core classes, and only add a custom class when you see a clear advantage in doing so.
Best uses for true, false, and nil objects
The simplest Ruby objects are true
and false
. In general, if true
and false
will meet your needs, you should use them. true
and false
are the easiest objects to understand.
There are a few cases where you will want to return true
or false
and not other objects. Most Ruby methods ending with ?
should return true
or false
. In general, the Ruby core methods use the following approach:
1.kind_of?(Integer) # => true
Similarly, equality and inequality operator methods should return true
or false
:
1 > 2 # => false 1 == 1 # => true
A basic principle when writing Ruby is to use true
or false
whenever they will meet your needs, and only reach for more complex objects in other cases.
The nil
object is conceptually more complex than either true
or false
. As a concept, nil
represents the absence of information. nil
should be used whenever there is no information available, or when something requested cannot be found. Ruby's core classes use nil
extensively to convey the absence of information:
[].first # => nil {1=>2}[3] # => nil
While true
is the opposite of false
and false
is the opposite of true
, nil
is sort of the opposite of everything not true
or false
. This isn't literally true in Ruby, because NilClass#!
returns true
and BasicObject#!
returns false
:
!nil # => true !1 # => false
However, nil
being the opposite of everything not true
or false
is true conceptually. In general, if you have a Ruby method that returns true
in a successful case, it should return false
in the unsuccessful case. If you have a Ruby method that returns an object that is not true
or false
in a successful case, it should return nil
in the unsuccessful case (or raise an exception, but that's a discussion for Chapter 5, Handling Errors).
Ruby's core classes also use nil
as a signal that a method that modifies the receiver did not make a modification:
"a".gsub!('b', '') # => nil [2, 4, 6].select!(&:even?)# => nil ["a", "b", "c"].reject!(&:empty?)# => nil
The reason for this behavior is optimization, so if you only want to run code if the method modified the object, you can use a conditional:
string = "..." if string.gsub!('a', 'b') # string was modified end
The trade-off here is that you can no longer use these methods in method chaining, so the following code doesn't work:
string. gsub!('a', 'b'). downcase!
Because gsub!
can return nil
, if the string doesn't contain "a"
, then it calls nil.downcase!
, which raises a NoMethodError
exception. So, Ruby chooses a trade-off that allows higher performance but sacrifices the ability to safely method chain. If you want to safely method chain, you need to use methods that return new objects, which are going to be slower as they allocate additional objects that need to be garbage collected. When you design your own methods, you'll also have to make similar decisions, which you will learn more about in Chapter 4, Methods and Their Arguments.
One of the issues you should be aware of when using nil
and false
in Ruby is that you cannot use the simple approach of using the ||=
operator for memoization. In most cases, if you can cache the result of an expression, you can use the following approach:
@cached_value ||= some_expression # or cache[:key] ||= some_expression
This works for most Ruby objects because the default value of @cached_value
will be nil
, and as long as some_expression
returns a value that is not nil
or false
, it will only be called once. However, if some_expression
returns a nil
or false
value, it will continue to be called until it returns a value that is not nil
or false
, which is unlikely to be the intended behavior. When you want to cache an expression that may return nil
or false
as a valid value, you need to use a different implementation approach.
If you are using a single instance variable for the cached value, it is simplest to switch to using defined?
, although it does result in more verbose code:
if defined?(@cached_value) @cached_value else @cached_value = some_expression end
If you are using a hash to store multiple cached values, it is simplest to switch to using fetch
with a block:
cache.fetch(:key){cache[:key] = some_expression}
One advantage of using true
, false
, and nil
compared to most other objects in Ruby is that they are three of the immediate object types. Immediate objects in Ruby are objects that do not require memory allocation to create and memory indirection to access, and as such they are generally faster than non-immediate objects.
In this section, you learned about the simplest objects, true
, false
, and nil
. In the next section, you'll learn about how best to use each of Ruby's numeric types.
Different numeric types for different needs
Ruby has multiple core numeric types, such as integers, floats, rationals, and BigDecimal, with integers being the simplest type. As a general principle when programming, it's best if you keep your design as simple as possible, and only add complexity when necessary. Applying the principle to Ruby, if you need to choose a numeric type, you should generally use an integer unless you need to deal with fractional numbers.
Note that while this chapter is supposed to discuss core classes, BigDecimal is not a core class, though it is commonly used. BigDecimal is in the standard library, and you need to add require 'bigdecimal'
to your code before you can use it.
Integers are the simplest numeric types, but they are surprisingly powerful in Ruby compared to many other programming languages. One example of this is executing a block of code a certain number of times. In many other languages, this is either done with the equivalent of a for
loop or using a range, but in Ruby, it is as simple as calling Integer#times
:
10.times do # executed 10 times end
One thing that trips up many new Ruby programmers is how division works when both the receiver and the argument are integers. Ruby is similar to C in how integer division is handled, returning only the quotient and dropping any remainder:
5 / 10 # => 0 7 / 3 # => 2
Any time you are considering using division in your code and both arguments could be integers, be aware of this issue and consider whether you would like to use integer division. If not, you should convert the numerator or denominator to a different numeric type so that the division operation will include the remainder:
5 / 10r # or Rational(5, 10) or 5 / 10.to_r # => (1/2) 7.0 / 3 # => 2.3333333333333335
In cases where your numeric type needs to include a fractional component, you have three main choices, floats, rationals, or BigDecimal, each with its own trade-offs. Floats are fastest but not exact in many cases, as shown in the earlier example. Rationals are exact but not as fast. BigDecimal is exact in most cases, and most useful when dealing with a fixed precision, such as two digits after the decimal point, but is generally the slowest.
Floats are the fastest and most common fractional numeric type, and they are the type Ruby uses for literal values such as 1.2
. In most cases, it is fine to use a float, but you should make sure you understand that they are not an exact type. Repeated calculations on float values result in observable issues:
f = 1.1 v = 0.0 1000.times do v += f end v # => 1100.0000000000086
Where did the .0000000000086
come from? This is the error in the calculation that accumulates because each Float#+
calculation is inexact. Note that this issue does not affect all floats:
f = 1.109375 v = 0.0 1000.times do v += f end v # => 1109.375
This is slightly counter-intuitive to many programmers, because 1.1
looks like a much simpler number than 1.109375
. The reason for this is due to the implementation of floats and the fact that computers operate in binary and not in decimal, and 0.109375
can be stored exactly in binary (it is 7/64ths of 1), but 1.1
cannot be stored exactly in binary.
Rationals are slower than floats, but since they are exact numbers, you don't need to worry about calculations introducing errors. Here's the first example using the r
suffix to the number so that Ruby parses the number as a rational:
f = 1.1r v = 0.0r 1000.times do v += f end v # => (1100/1)
Here, we get 1100
exactly as a rational, showing there is no error. Let's use the same approach with the second example:
f = 1.109375r v = 0.0r 1000.times do v += f end v # => (8875/8) v.to_f # => 1109.375
As shown in the previous example, rationals are stored as an integer numerator and denominator, and inspecting the output reflects that. This can make debugging with them a little cumbersome, as you often need to convert them to floats for human-friendly decimal output.
While rationals are slower than floats, they are not orders of magnitude slower. They are about 2-6 times slower depending on what calculations you are doing. So, do not avoid the use of rationals on a performance basis unless you have profiled them and determined they are a bottleneck (you'll learn about that in Chapter 14, Optimizing Your Library).
A good general principle is to use a rational whenever you need to do calculations with non-integer values and you need exact answers. For cases where exactness isn't important, or you are only doing comparisons between numbers and not calculations that result in an accumulated error, it is probably better to use floats.
BigDecimal is similar to rationals in that it is an exact type in most cases, but it is not exact when dealing with divisions that result in a repeating decimal:
v = BigDecimal(1)/3 v * 3 # => 0.999999999999999999e0
However, other than divisions involving repeating decimals and exponentiation, BigDecimal values are exact. Let's take the first example, but make both arguments BigDecimal instances:
f = BigDecimal(1.1, 2) v = BigDecimal(0) 1000.times do v += f end v # => 0.11e4 v.to_s('F') # => "1100.0"
So, as you can see, no error is introduced when using repeated addition on BigDecimal, similar to rationals. You can also see that inspecting the output is less helpful since BigDecimal uses a scientific notation. BigDecimal does have the advantage that it can produce human-friendly decimal string output directly without converting the object to a float first.
If we try the same approach with the second example, we can see that it also produces exact results:
f = BigDecimal(1.109375, 7) v = BigDecimal(0) 1000.times do v += f end v # => 0.1109375e4 v.to_s('F') # => "1109.375"
As both examples show, one issue with using a BigDecimal that is created from floats or rationals is that you need to manually specify the initial precision. It is more common to initialize BigDecimal values from integers or strings, to avoid the need to manually specify the precision.
BigDecimal is significantly slower than floats and rationals for calculations. Due to the trade-offs inherent in BigDecimal, a good general principle is to use BigDecimal only when dealing with other systems that support similar types, such as fixed precision numeric types in many databases, or when dealing with other fixed precision areas such as monetary calculations. For most other cases, it's generally better to use a rational or float.
Of the numeric types, most integer and float values are immediate objects, which is one of the reasons why they are faster than other types. However, large integer and float values are too large to be immediate objects (which must fit in 8 bytes if using a 64-bit CPU). Rationals and BigDecimal are never immediate objects, which is one reason why they are slower.
In this section, you learned about Ruby's many numeric types and how best to use each. In the next section, you'll learn how symbols are very different from strings, and when to use each.
Understanding how symbols differ from strings
One of the most useful but misunderstood aspects of Ruby is the difference between symbols and strings. One reason for this is there are certain methods of Ruby that deal with symbols, but will still accept strings, or perform string-like operations on a symbol. Another reason is due to the popularity of Rails and its pervasive use of ActiveSupport::HashWithIndifferentAccess
, which allows you to use either a string or a symbol for accessing the same data. However, symbols and strings are very different internally, and serve completely different purposes. However, Ruby is focused on programmer happiness and productivity, so it will often automatically convert a string to a symbol if it needs a symbol, or a symbol to a string if it needs a string.
A string in Ruby is a series of characters or bytes, useful for storing text or binary data. Unless the string is frozen, you append to it, modify existing characters in it, or replace it with a different string.
A symbol in Ruby is a number with an attached identifier that is a series of characters or bytes. Symbols in Ruby are an object wrapper for an internal type that Ruby calls ID
, which is an integer type. When you use a symbol in Ruby code, Ruby looks up the number associated with that identifier. The reason for having an ID
type internally is that it is much faster for computers to deal with integers instead of a series of characters or bytes. Ruby uses ID
values to reference local variables, instance variables, class variables, constants, and method names.
Say you run Ruby code as follows:
foo.add(bar)
Ruby will parse this code, and for foo
, add
, and bar
, it will look up whether it already has an ID associated with the identifier. If it already has an ID, it will use it; otherwise, it will create a new ID
value and associate it with the identifier. This happens during parsing and the ID
values are hardcoded into the VM instructions.
Say you run Ruby code as follows:
method = :add foo.send(method, bar)
Ruby will parse this code, and for method
, add
, foo
, send
, and bar
, Ruby will also look up whether it already has an ID associated with the identifier, or create a new ID
value to associate with the identifier if it does not exist. This approach is slightly slower as Ruby will create a local variable and there is additional indirection as send
has to look up the method to call dynamically. However, there are no calls at runtime to look up an ID
value.
Say you run Ruby code as follows:
method = "add" foo.send(method, bar)
Ruby will parse this code, and for method
, foo
, send
, and bar
, Ruby will also look up whether it already has an ID associated with the identifier, also creating the ID if it doesn't exist. However, during parsing, Ruby does not create an ID
value for add
because it is a string and not a symbol. However, when send
is called at runtime, method
is a string value, and send
needs a symbol. So, Ruby will dynamically look up and see whether there is an ID associated with the add
identifier, raising a NoMethodError
if it does not exist. This ID
lookup will happen every time the send method is called, making this code even slower.
So, while it looks like symbols and strings are as interchangable as the method
argument to send
, this is only because Ruby tries to be friendly to the programmer and accept either. The send
method needs to work with an ID, and it is better for performance to use a symbol, which is Ruby's representation of an ID, as opposed to a string, which Ruby must perform substantial work on to convert to an ID.
This not only affects Kernel#send
but also affects most similar methods where identifiers are passed dynamically, such as Module#define_method
, Kernel#instance_variable_get
, and Module#const_get
. The general principle when using these methods in Ruby code is always to pass symbols to them, since it results in better performance.
The previous examples show that when Ruby needs a symbol, it will often accept a string and convert it for the programmer's convenience. This allows strings to be treated as symbols in certain cases. There are opposite cases, where Ruby allows symbols to be treated as strings for the programmer's convenience.
For example, while symbols represent integers attached to a series of characters or bytes, Ruby allows you to perform operations on symbols such as <
, >
, and <=>
, as if they were strings, where the result does not depend on the symbol's integer value, but on the string value of the name attached to the symbol. Again, this is Ruby doing so for the programmer's convenience. For example, consider the following line of code:
object.methods.sort
This results in a list sorted by the name of the method, since that is the most useful for the programmer. In this case, Ruby needs to operate on the string value of the symbol, which has similar performance issues as when Ruby needs to convert a string to a symbol internally.
There are many other methods on Symbol that operate on the internal string associated with the symbol. Some methods, such as downcase
, upcase
, and capitalize
, return a symbol by internally operating on the string associated with the symbol, and then converting the resulting value back to a symbol. For example, symbol.downcase
basically does symbol.to_s.downcase.to_sym
. Other methods, such as []
, size
, and match
, operate on the string associated with the symbol, such as symbol.size
being shorthand for symbol.to_s.size
.
In all of these cases, it is possible to determine what Ruby natively wants. If Ruby needs an internal identifier, it will natively want a symbol, and only accept a string by converting it. If Ruby needs to operate on text, it will natively want a string, and only accept a symbol by converting it.
So, how does the difference between a symbol and string affect your code? The general principle is to be like Ruby, and use symbols when you need an identifier in your code, and strings when you need text or data. For example, if you need to accept a configuration value that can only be one of three options, it's probably best to use a symbol:
def switch(value) case value when :foo # foo when :bar # bar when :baz # baz end end
However, if you are dealing with text or data, you should accept a string and not a symbol:
def append2(value) value.gsub(/foo/, "bar") end
You should consider whether you want to be as flexible as many Ruby core methods, and automatically convert a string to a symbol or vice versa. If you are internally treating symbols and strings differently, you should definitely not perform automatic conversion. However, if you are only dealing with one of the types, then you have to decide how to handle it. Automatically converting the type is worse for performance, and results in less flexible internals, since you need to keep supporting both types for backward compatibility. Not automatically converting the type is better for performance, and results in more flexible internals, since you are not obligated to support both types. However, it means that users of your code will probably get errors if they pass in a type that is not expected. Therefore, it is important to understand the trade-off inherent in the decision of whether to convert both types. If you aren't sure which trade-off is better, start by not automatically converting, since you can always add automatic conversion later if needed.
In this section, you learned the important difference between symbols and strings, and when it is best to use each. In the next section, you'll learn how best to use Ruby's core collection classes.
Learning how best to use arrays, hashes, and sets
Ruby's collection classes are one of the reasons why it is such a joy to program in Ruby. In most cases, the choice of collection class to use is fairly straightforward. If you need a simple list of values that you are iterating over, or using the collection as a queue or a stack, you generally use an array. If you need a mapping of one or more objects to one or more objects, then you generally use a hash. If you have a large list of objects and want to see whether a given object is contained in it, you generally use a set.
In some cases, it's fine to use either an array or a hash. Often, when iterating over a small list, you could use the array approach:
[[:foo, 1], [:bar, 3], [:baz, 7]].each do |sym, i| # ... end
Or, you could use the hash approach:
{foo: 1, bar: 3, baz: 7}.each do |sym, i| # ... end
Since you are not indexing into the collection, the simpler approach from a design perspective is to use an array. However, because the hash approach is syntactically simpler, the idiomatic way to handle this in Ruby is to use a hash.
For more complex mapping cases, you often want to use a hash, but you may need to decide how to structure the hash. This is especially true when you are using complex keys. Let's take a deeper look at the differences between arrays, hashes, and sets by working through an example that implements an in-memory database.
Implementing an in-memory database
While many programmers often use a SQL database for data storage, there are many cases when you need to build a small, in-memory database using arrays, hashes, and sets. Often, even when you have the main data stored in a SQL database, it is faster to query the SQL database to retrieve the information, and use that to build an in-memory database for the specific class or method you are designing. This allows you to query the in-memory database with similar speed as a hash or array lookup, orders of magnitude faster than a SQL database query.
Let's say you have a list of album names, track numbers, and artist names, where you can have multiple artists for the same album and track. You want to design a simple lookup system so that given an album name, you can find all artists who worked on any track of the album, and given an album name and track number, you can find the artists who worked on that particular track.
In the following examples, you should assume that album_infos
is an arbitrary object that has each method that yields the album name, track number, and artist. However, if you would like to have some sample data to work with:
album_infos = 100.times.flat_map do |i| 10.times.map do |j| ["Album #{i}", j, "Artist #{j}"] end end
One approach for handling this is to populate two hashes, one keyed by album name, and one keyed by an array of the album name and track number. Populating these two hashes is straightforward, by setting the value for the key to an empty array if the key doesn't exist, and then appending the artist name. Then you need to make sure the artist values are unique for the hash keyed just by album name:
album_artists = {} album_track_artists = {} album_infos.each do |album, track, artist| (album_artists[album] ||= []) << artist (album_track_artists[[album, track]] ||= []) << artist end album_artists.each_value(&:uniq!)
With this approach, looking up values is fairly straightforward, and just involves looking in the appropriate hash with the appropriate key:
lookup = ->(album, track=nil) do if track album_track_artists[[album, track]] else album_artists[album] end end
An alternative approach would be to use a nested hash approach, with each album having a hash of tracks:
albums = {} album_infos.each do |album, track, artist| ((albums[album] ||= {})[track] ||= []) << artist end
With this approach, looking up values is more complex, especially in the case where a track number is not provided, and you have to dynamically create the list:
lookup = ->(album, track=nil) do if track albums.dig(album, track) else a = albums[album].each_value.to_a a.flatten! a.uniq! a end end
In general, the first approach using multiple hashes is going to take significantly more memory than the second approach if there is a large number of albums, but it will have a much better lookup performance for albums. The first approach will also take much more time to populate the data structure. The second approach is much lighter on memory and has better lookup performance for albums with tracks as it avoids an array allocation, but will exhibit a far more inferior performance for albums.
Each of these approaches does not depend on the types of objects that album_infos.each
yields. You probably made the reasonable assumption that album
and artist
would be strings, and track
would be a number. Let's say you knew in advance that the track number was an integer between 1 and 99. You could use that information to design a different approach. You could still have a single of hash keyed by album name, with a value being an array containing arrays of artist names for each track. Since tracks only go from 1 to 99, you could use the 0 index in the array to store all artist names for the album. Populating this combination of hash and array of arrays isn't too difficult:
albums = {} album_infos.each do |album, track, artist| album_array = albums[album] ||= [[]] album_array[0] << artist (album_array[track] ||= []) << artist end albums.each_value do |array| array[0].uniq! end
This approach is more memory-efficient than either of the previous approaches, and looking up values is very simple and never allocates an object:
lookup = ->(album, track=0) do albums.dig(album, track) end
Compared to the previous two approaches, this approach uses about the same amount of memory as the nested hash approach. It takes slightly more time to populate compared to the nested hash approach. It is almost as fast as the two hash approach in terms of lookup performance for albums, and is the fastest approach for lookup performance by albums with tracks.
Maybe the needs of your application change, and now you need a feature that allows users to enter a list of artist names, and will return an array with only the artist names that the application knows are on one of the albums. One way to handle this is to store the artists in an array:
album_artists = album_infos.flat_map(&:last) album_artists.uniq!
The lookup can use an array intersection to determine the values:
lookup = ->(artists) do album_artists & artists end
The problem with this approach is that Array#&
uses a linear search of the array, so this approach is very slow for a large number of artists.
A better performing approach would use a hash, keyed by the artist name:
album_artists = {} album_infos.each do |_, _, artist| album_artists[artist] ||= true end
The lookup can use the hash to filter the values in the submitted array:
lookup = ->(artists) do artists.select do |artist| album_artists[artist] end end
This approach performs much better. The code isn't as simple, though it isn't too bad. However, it would be nicer to have simpler code that performed as well. Thankfully, the Ruby Set
class can meet this need. Like BigDecimal, Set
is not currently a core Ruby class. Set
is in the standard library, and you can load it via require 'set'
. However, Set
may be moved from the standard library to a core class in a future version of Ruby. Using a set is pretty much as simple as using an array in terms of populating the data structure:
album_artists = Set.new(album_infos.flat_map(&:last))
You don't need to manually make the array unique, because the set automatically ignores duplicate values. The lookup code can stay exactly the same as the array case:
lookup = ->(artists) do album_artists & artists end
Of the three approaches, the hash approach is the fastest to populate and the fastest to look up. The Set
approach is much faster to look up than the array approach, but still significantly slower than hash. Set
is actually implemented using a hash internally, so in general, it will perform worse than using a hash directly. As a general rule, you should only use a set for code that isn't performance-sensitive and you would like to use a nicer API. For any performance-sensitive code, you should prefer using a hash directly.
In this section, you learned about Ruby's core collection of classes, arrays, hashes, and sets. In the next section, you'll learn about Struct
, one of Ruby's underappreciated core classes.
Working with Struct – one of the underappreciated core classes
The Struct
class is one of the underappreciated Ruby core classes. It allows you to create classes with one or more fields, with accessors automatically created for each field. So, say you have the following:
class Artist attr_accessor :name, :albums def initialize(name, albums) @name = name @albums = albums end end
Instead of that, you can write a small amount of Ruby code, and have the initializer and accessor automatically created:
Artist = Struct.new(:name, :albums)
In general, a Struct
class is a little lighter on memory than a regular class, but has slower accessor methods. Struct
used to be faster in terms of both initialization and reader methods in older versions of Ruby, but regular classes and attr_accessor
methods have gotten faster at a greater rate than Struct
has. Therefore, for maximum performance, you may want to consider using regular classes and attr_accessor
methods instead of Struct
classes.
One of the more interesting aspects of Struct
is how it works internally. For example, unlike the new
method for most other classes, Struct.new
does not return a Struct
instance; it returns a Struct
subclass:
Struct.new(:a, :b).class # => Class
However, the new
method on the subclass creates instances of the subclass; it doesn't create future subclasses. Additionally, if you provide a string and not a symbol as the first argument, Struct
will automatically create the class using that name nested under its own namespace:
Struct.new('A', :a, :b).new(1, 2).class # => Struct::A
A simplified version of the default Struct.new
method is similar to the following. This example is a bit larger, so we'll break it into sections. If a string is given as the first argument, it is used to set the class in the namespace of the receiver; otherwise, it is added to the list of fields:
def Struct.new(name, *fields) unless name.is_a?(String) fields.unshift(name) name = nil end
Next, a subclass is created. If a class name was given, it is set as a constant in the current namespace:
subclass = Class.new(self) if name const_set(name, subclass) end
Then, some internal code is run to set up the storage for the members of the subclass. Then, the new
, allocate
, []
, members
, and inspect
singleton methods are defined on the subclass. Finally, some internal code is run to set up accessor instance methods for each member of the subclass:
# Internal magic to setup fields/storage for subclass def subclass.new(*values) obj = allocate obj.initialize(*values) obj end # Similar for allocate, [], members, inspect # Internal magic to setup accessor instance methods subclass end
Interestingly, you can still create Struct
subclasses the normal way:
class SubStruct < Struct end
Struct
subclasses created via the normal way operate like Struct
itself, not like Struct
subclasses created via Struct.new
. You can then call new
on the Struct
subclass to create a subclass of that subclass, but the setup is similar to a Struct
subclass created via Struct.new
:
SubStruct.new('A', :a, :b).new(1, 2).class # => SubStruct::A
In general, Struct
is good for creating simple classes that are designed for storing data. One issue with Struct
is that the design encourages the use of mutable data and discourages a functional approach, by defaulting to creating setter methods for every member. However, it is possible to easily force the use of immutable structs by freezing the object in initialize
:
A = Struct.new(:a, :b) do def initialize(...) super freeze end end
There have been feature requests submitted on the Ruby issue tracker to create immutable Struct
subclasses using a keyword argument to Struct.new
or via the addition of a separate Struct::Value
class. However, as of Ruby 3, neither feature request has been accepted. It is possible that a future version of Ruby will include them, but in the meantime, freezing the receiver in initialize
is the best approach.
Summary
In this chapter, you've learned about the core classes. You've learned about issues with true
, false
, and nil
, and how best to use Ruby's numeric types. You've learned why the difference between symbols and strings is important. You've learned how best to use arrays, hashes, and sets, and when it makes sense to use your own custom structs.
In the next chapter, you'll build on this knowledge of the core classes and learn about constructing your own custom classes.
Questions
- How are
nil
andfalse
different from all other objects? - Are all standard arithmetic operations using two BigDecimal objects exact?
- Would it make sense for Ruby to combine symbols and strings?
- Which uses less memory for the same data-hash, or
Set
? - What are the only two core methods that return a new instance of
Class
?
Further reading
These books will also be applicable to all other chapters in this book, but are only listed in this chapter to reduce duplication:
- Comprehensive Ruby Programming: https://www.packtpub.com/product/comprehensive-ruby-programming/9781787280649
- The Ruby Workshop: https://www.packtpub.com/product/the-ruby-workshop/9781838642365