CIT 042 Index > Hashes

Hashes

Introduction

As we have seen, an array is an ordered list of scalar values, which we index by number. The key word here is ordered; this lets us use foreach, push, shift, etc. in a predictable manner. Although this is a good thing, it’s not ideal -- the only way we can access an array is through its numeric index. Let’s say we want to have a list of names and ages. We can create two separate lists:

@names = ("Vinko", "Fred", "Thai", "Esmeralda");
@ages =  (28, 34, 27, 20);

The drawback here is that we have two lists to maintain; if we insert a new name, we must also insert an age. An alternate approach is to make a single array with alternating names and ages:

@info_array = ("Vinko", 28, "Fred", 34, "Thai", 27, "Esmeralda", 20);

This is much more convenient as there’s only one array to maintain. We just have to be careful to step through the array two items at a time. With either approach, though, we have difficulty answering the question "How old is Esmeralda?" We have to go through the array until we find the correct entry, and then print out the result. Here’s the code for the single array:

print "Find age for person: ";
chomp($person=<STDIN>);
for ($pos=0; $pos < scalar(@info_array); $pos += 2)
{
    if ($person eq $info_array[$pos])
    {
        last;
    }
}
if ($pos < scalar(@info_array))
{
    print "$person is $info_array[$pos+1] years old\n";
}
else
{
    print "$person is not in the list.\n";
}

Wouldn’t it be great if there were some way to have a "lookup table" where we could have the name directly associated with the age, so that we could just say, "Give us Fred’s age," and the computer would do all the heavy lifting for us? Well, there is exactly such a thing, and it’s called a hash. Instead of being ordered like an array, a hash has no particular order. Instead of using an index to access an entry, you can use any scalar value, called the key to access the corresponding information.

Just as a scalar is preceded by a $ and an array by an @, hashes have their own symbol: %. Here’s how we’d set up the name and age information as a hash:

%info = ("Vinko" => 28, "Fred" => 34, "Thai" => 27, "Esmeralda" => 20);

You can read the => symbol as "is associated with."

Accessing a Hash

Now it’s easy to access the information. Instead of using a number in square brackets (the array method), we use the index key inside of curly braces. So, if we want to assign Vinko’s age to a scalar, we say this:

$age = $info{"Vinko"};

Notice that we put a $ before the word info because the individual value that we’re getting out of the hash is a scalar!

If you try to read a value from a hash with a non-existent index, you get the undefined value. To test if something is defined, we can’t compare it to the null string or zero, since undef is automatically converted. Instead, we need to use the defined function to ask if the value is defined or not. (We will see another way to do this later on.)

print "Find age for person: ";
chomp($person=<STDIN>);
$age = $info{$person};
if (defined($age))
{
    print "$person is $age years old\n";
}
else
{
    print "$person not in hash\n";
}

Hashes are Really Lists

The reason we used the => symbol is because it implies a relationship between a key and a value. In reality, the list of items between the parentheses is just a plain old list. We could have just as well used commas, but the meaning wouldn’t have been as clear.

%info = ("Vinko", 28, "Fred", 34, "Thai", 27, "Esmeralda", 20);

When you assign a list to a hash, the list is automatically taken to mean a series of key/value pairs. This means, of course, that you should have an even number of items in the list. If you have an odd number of items, the last item is considered to be a key with an undefined value.

Stepping Through a Hash

To step through an array, we went to each index number in turn and extracted its value. To step through a hash, we will go to each key in turn and extract the corresponding value. You use the keys function to do this; it produces a list of all the keys in the hash. So, the following code produces the following output:

foreach $item ( keys(%info) )
{
    print "$item is $info{$item} years old.\n";
}
Fred is 34 years old.
Vinko is 28 years old.
Esmeralda is 20 years old.
Thai is 27 years old.

Surprise! Although an array’s indices are in order, a hash’s keys are not stored in the order that you enter them. Instead, Perl distributes them throughout memory in an order that makes the keys easy to find. In this case, the keys(%info) function produced the list in the order ("Fred", "Vinko", "Esmeralda", "Thai"). If you want to get the information from a hash in a predictable order, you can call the sort function on the list that keys returns:

foreach $item ( sort(keys(%info)) )
{
    print "$item is $info{$item} years old.\n";
}
Esmeralda is 20 years old.
Fred is 34 years old.
Thai is 27 years old.
Vinko is 28 years old.

There is also a function called values, which returns the values from the hash, again in no particular set order. This isn’t as useful as the keys function.

Stepping through a Hash with each

There’s another function called each, which returns a list consisting of the next key/value pair from a hash. You call each repeatedly, usually in a while loop. When it hits the end of the array, each returns the undefined value undef. We could rewrite our non-ordered output as follows:

while (($person, $age) = each(%info))
{
    print "$person is $age years old.\n";
}

Hash Miscellanea

If you assign two values to the same key, the second value overwrites the first value.

Although many examples show a hash with a word as a key and a number as a value, it’s not a law. You can have a hash with numbers as keys and words as values. The following hash wouldn’t be a good candidate for an array, since there would be approximate 90,000 unused entries at the beginning of the array.

%california_zipcodes = (94111 => "San Francisco", 
    95129 => "San Jose", 95472 => "Sebastopol");

And here’s a word-to-word hash:

%colors = ("gray" => "gris", "blue" => "azul", "green" => "verde");

You can take advantage of the fact that hashes are also arrays; if you don’t have any duplicate values, you can use the reverse function to get a hash that “goes the other direction”:

%to_spanish = ("gray" => "gris", "blue" => "azul", "green" => "verde");
%to_english = reverse(%to_spanish);
#
# same as this assignment - order may vary; this is a hash
#
%to_english = ("gris" => "gray", "azul" => "blue", "verde" => "green");

You can delete an entry from a hash with the aptly named delete function; delete($info{"Vinko"}) will get rid of his name and age. If you want to get rid of all the values in a hash, you say undef( %hashname ).

If you want all the values from a hash, instead of the keys, use the values( %hashname ) function. This isn’t anywhere near as useful as the keys function.

Unlike arrays, if you assign a hash to a scalar, you will not get the length of the hash (a meaningless concept, if you think about it). Instead:

$usage = %info;

will return a fraction that tells how well the keys are distributed throughout memory. This is of little practical value; if you want to know how many keys are in a hash, do this:

$n_keys = keys(%info);

This is an exception; keys returns a list, and ordinarily, assigning a list to a scalar would give you the last value. Rather than give you this essentially useless information, keys gives you the list length when used in a scalar context.

The exists function returns true if a hash element exists; thus exists($info{"Fred"}) will return true, but exists{$info{"Jody"}) will return false.