Saving packed_trie_map with custom value type

In the previos example, you didn’t have to explicitly specify the serializer type to the save_state() and load_state() methods, even though these two methods require the serializer type as their template arguments. That’s because the library provides default serializer types for

  • numeric value types i.e. integers, float and double,

  • std::string, and

  • the standard sequence types, such as std::vector, whose elements are of numeric value types,

and the previous example used int as the value type.

In this section, we are going to illustrate how you can write your own custom serializer to allow serialization of your own custom value type. In this example, we are going to use the list of presidents of the United States, with the names of the presidents as the keys, and their years of inauguration and political affiliations as the values.

We will use the following structure to store the values:

enum affiliated_party_t : uint8_t
{
    unaffiliated = 0,
    federalist,
    democratic_republican,
    democratic,
    whig,
    republican,
    national_union,
    republican_national_union,
};

struct us_president
{
    uint16_t year;
    affiliated_party_t party;
};

Each entry stores the year as a 16-bit integer and the affiliated party as an enum value of 8-bit width.

Next, let’s define the container type:

using map_type = mdds::packed_trie_map<std::string, us_president>;

As with the previous example, the first step is to define the entries that are sorted by the keys, which in this case are the president’s names:

std::vector<map_type::entry> entries =
{
    { MDDS_ASCII("Abraham Lincoln"),        { 1861, republican_national_union } },
    { MDDS_ASCII("Andrew Jackson"),         { 1829, democratic                } },
    { MDDS_ASCII("Andrew Johnson"),         { 1865, national_union            } },
    { MDDS_ASCII("Barack Obama"),           { 2009, democratic                } },
    { MDDS_ASCII("Benjamin Harrison"),      { 1889, republican                } },
    { MDDS_ASCII("Bill Clinton"),           { 1993, democratic                } },
    { MDDS_ASCII("Calvin Coolidge"),        { 1923, republican                } },
    { MDDS_ASCII("Chester A. Arthur"),      { 1881, republican                } },
    { MDDS_ASCII("Donald Trump"),           { 2017, republican                } },
    { MDDS_ASCII("Dwight D. Eisenhower"),   { 1953, republican                } },
    { MDDS_ASCII("Franklin D. Roosevelt"),  { 1933, democratic                } },
    { MDDS_ASCII("Franklin Pierce"),        { 1853, democratic                } },
    { MDDS_ASCII("George H. W. Bush"),      { 1989, republican                } },
    { MDDS_ASCII("George W. Bush"),         { 2001, republican                } },
    { MDDS_ASCII("George Washington"),      { 1789, unaffiliated              } },
    { MDDS_ASCII("Gerald Ford"),            { 1974, republican                } },
    { MDDS_ASCII("Grover Cleveland 1"),     { 1885, democratic                } },
    { MDDS_ASCII("Grover Cleveland 2"),     { 1893, democratic                } },
    { MDDS_ASCII("Harry S. Truman"),        { 1945, democratic                } },
    { MDDS_ASCII("Herbert Hoover"),         { 1929, republican                } },
    { MDDS_ASCII("James A. Garfield"),      { 1881, republican                } },
    { MDDS_ASCII("James Buchanan"),         { 1857, democratic                } },
    { MDDS_ASCII("James K. Polk"),          { 1845, democratic                } },
    { MDDS_ASCII("James Madison"),          { 1809, democratic_republican     } },
    { MDDS_ASCII("James Monroe"),           { 1817, democratic_republican     } },
    { MDDS_ASCII("Jimmy Carter"),           { 1977, democratic                } },
    { MDDS_ASCII("John Adams"),             { 1797, federalist                } },
    { MDDS_ASCII("John F. Kennedy"),        { 1961, democratic                } },
    { MDDS_ASCII("John Quincy Adams"),      { 1825, democratic_republican     } },
    { MDDS_ASCII("John Tyler"),             { 1841, whig                      } },
    { MDDS_ASCII("Lyndon B. Johnson"),      { 1963, democratic                } },
    { MDDS_ASCII("Martin Van Buren"),       { 1837, democratic                } },
    { MDDS_ASCII("Millard Fillmore"),       { 1850, whig                      } },
    { MDDS_ASCII("Richard Nixon"),          { 1969, republican                } },
    { MDDS_ASCII("Ronald Reagan"),          { 1981, republican                } },
    { MDDS_ASCII("Rutherford B. Hayes"),    { 1877, republican                } },
    { MDDS_ASCII("Theodore Roosevelt"),     { 1901, republican                } },
    { MDDS_ASCII("Thomas Jefferson"),       { 1801, democratic_republican     } },
    { MDDS_ASCII("Ulysses S. Grant"),       { 1869, republican                } },
    { MDDS_ASCII("Warren G. Harding"),      { 1921, republican                } },
    { MDDS_ASCII("William Henry Harrison"), { 1841, whig                      } },
    { MDDS_ASCII("William Howard Taft"),    { 1909, republican                } },
    { MDDS_ASCII("William McKinley"),       { 1897, republican                } },
    { MDDS_ASCII("Woodrow Wilson"),         { 1913, democratic                } },
    { MDDS_ASCII("Zachary Taylor"),         { 1849, whig                      } },
};

Note that we need to add numeric suffixes to the entries for Grover Cleveland, who became president twice in two separate periods, in order to make the keys for his entries unique.

Now, proceed to create an instance of packed_trie_map:

map_type us_presidents(entries.data(), entries.size());

and inspect its size to make sure it is instantiated properly:

std::cout << "Number of entries: " << us_presidents.size() << std::endl;

You should see the following output:

Number of entries: 45

Before we proceed to save the state of this instance, let’s define the custom serializer type first:

struct us_president_serializer
{
    union bin_buffer
    {
        char buffer[2];
        uint16_t i16;
        affiliated_party_t party;
    };

    static constexpr bool variable_size = false;
    static constexpr std::size_t value_size = 3;

    static void write(std::ostream& os, const us_president& v)
    {
        bin_buffer buf;

        // Write the year value first.
        buf.i16 = v.year;
        os.write(buf.buffer, 2);

        // Write the affiliated party value.
        buf.party = v.party;
        os.write(buf.buffer, 1);
    }

    static void read(std::istream& is, std::size_t n, us_president& v)
    {
        // For a fixed-size value type, this should equal the defined value size.
        assert(n == 3);

        bin_buffer buf;

        // Read the year value.
        is.read(buf.buffer, 2);
        v.year = buf.i16;

        // Read the affiliated party value.
        is.read(buf.buffer, 1);
        v.party = buf.party;
    }
};

A custom value type can be either variable-size or fixed-size. For a variable-size value type, each value segment is preceded by the byte length of that segment. For a fixed-size value type, the byte length of all of the value segments is written only once up-front, followed by one or more value segments of equal byte length.

Since the value type in this example is fixed-size, we set the value of the variable_size static constant to false, and define the size of the value to 3 (bytes) as the value_size static constant. Keep in mind that you need to define the value_size constant only for fixed-size value types; if your value type is variable-size, you can leave it out.

Additionally, you need to define two static methods - one for writing to the output stream, and one for reading from the input stream. The write method must have the following signature:

static void write(std::ostream& os, const T& v);

where the T is the value type. In the body of this method you write to the output stream the bytes that represent the value. The length of the bytes you write must match the size specified by the value_size constant.

The read method must have the following signature:

static void read(std::istream& is, size_t n, T& v);

where the T is the value type, and the n specifies the length of the bytes you need to read for the value. For a fixed-size value type, the value of n should equal the value_size constant. Your job is to read the bytes off of the input stream for the length specified by the n, and populate the value instance passed to the method as the third argument.

Now that we have defined the custom serializer type, let’s proceed to save the state to a file:

std::ofstream outfile("us-presidents.bin", std::ios::binary);
us_presidents.save_state<us_president_serializer>(outfile);

This time around, we are specifying the serializer type explicitly as the template argument to the save_state() method. Otherwise it is no different than what we did in the previous example.

Let’s create another instance of packed_trie_map and restore the state back from the file we just created:

map_type us_presidents_loaded;

std::ifstream infile("us-presidents.bin", std::ios::binary);
us_presidents_loaded.load_state<us_president_serializer>(infile);

Once again, aside from explicitly specifying the serializer type as the template argument to the load_state() method, it is identical to the way we did in the previous example.

Let’s compare the new instance with the old one to see if the two are equal:

std::cout << "Equal to the original? " << std::boolalpha << (us_presidents == us_presidents_loaded) << std::endl;

The output says:

Equal to the original? true

They are. While we are at it, let’s run a simple prefix search to find out all the US presidents whose first name is ‘John’:

std::cout << "Presidents whose first name is 'John':" << std::endl;
auto results = us_presidents_loaded.prefix_search("John");
for (const auto& entry : results)
    std::cout << "  * " << entry.first << " (" << entry.second.year << "; " << entry.second.party << ")" << std::endl;

Here is the output:

Presidents whose first name is 'John':
  * John Adams (1797; Federalist)
  * John F. Kennedy (1961; Democratic)
  * John Quincy Adams (1825; Democratic Republican)
  * John Tyler (1841; Whig)

This looks like the correct results!

You can find the complete source code for this example here.