Maps

In this chapter, you will work with maps (not to be confused with the map function, though you can use map on a map). Also, the études are designed to run on the server side with Node.js®, so you may want to see how to set that up in ClojureScript on the Server.

Étude 4-1: Condiments

If you spend some time going through open datasets such as those form data.gov, you will find some fairly, shall we say, esoteric data. Among them is MyPyramid Food Raw Data from the Food and Nutrition Service of the United States Department of Agriculture.

One of the files is Foods_Needing_Condiments_Table.xml, which gives a list of foods and condiments that go with them. Here is what part of the file looks like, indented and edited to eliminate unnecessary elements, and placed in a file named test.xml.

<Foods_Needing_Condiments_Table>
  <Foods_Needing_Condiments_Row>
    <Survey_Food_Code>51208000</Survey_Food_Code>
    <display_name>100% Whole Wheat Bagel</display_name>
    <cond_1_name>Butter</cond_1_name>
    <cond_2_name>Tub margarine</cond_2_name>
    <cond_3_name>Reduced calorie spread (margarine type)</cond_3_name>
    <cond_4_name>Cream cheese (regular)</cond_4_name>
    <cond_5_name>Low fat cream cheese</cond_5_name>
  </Foods_Needing_Condiments_Row>
  <Foods_Needing_Condiments_Row>
    <Survey_Food_Code>58100100</Survey_Food_Code>
    <display_name>"Beef burrito (no beans):"</display_name>
    <cond_1_name>Sour cream</cond_1_name>
    <cond_2_name>Guacamole</cond_2_name>
    <cond_3_name>Salsa</cond_3_name>
  </Foods_Needing_Condiments_Row>
  <Foods_Needing_Condiments_Row>
    <Survey_Food_Code>58104740</Survey_Food_Code>
    <display_name>Chicken & cheese quesadilla:</display_name>
    <cond_1_name>Sour cream</cond_1_name>
    <cond_2_name>Guacamole</cond_2_name>
    <cond_3_name>Salsa</cond_3_name>
  </Foods_Needing_Condiments_Row>
</Foods_Needing_Condiments_Table>

Your task, in this étude, is to take this XML file and build a ClojureScript map whose keys are the condiments and whose values are vectors of foods that go with those condiments. Thus, for the sample file, running the program from the command line the output would be this map (formatted and quotemarked for ease of reading):

[etudes@localhost nodetest]$ node condiments.js test.xml
{"Butter" ["100% Whole Wheat Bagel"],
"Tub margarine" ["100% Whole Wheat Bagel"],
"Reduced calorie spread (margarine type)" ["100% Whole Wheat Bagel"],
"Cream cheese (regular)" ["100% Whole Wheat Bagel"],
"Low fat cream cheese" ["100% Whole Wheat Bagel"],
"Sour cream" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"],
"Guacamole" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"],
"Salsa" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"]}

Parsing XML

How do you parse XML using Node.js? Install the node-xml-lite module:

[etudes@localhost ~]$ npm install node-xml-lite
npm http GET https://registry.npmjs.org/node-xml-lite
npm http 304 https://registry.npmjs.org/node-xml-lite
npm http GET https://registry.npmjs.org/iconv-lite
npm http 304 https://registry.npmjs.org/iconv-lite
node-xml-lite@0.0.3 node_modules/node-xml-lite
└── iconv-lite@0.4.8

Bring the XML parsing module into your core.cljs file:

(def xml (js/require "node-xml-lite"))

The following code will parse an XML file and return a JavaScript object:

(.parseFileSync xml "test.xml")

And here is the JavaScript object that it produces:

  {:name "Foods_Needing_Condiments_Table", :childs [
    {:name "Foods_Needing_Condiments_Row", :childs [
      {:name "Survey_Food_Code", :childs ["51208000"]}
      {:name "display_name", :childs ["100% Whole Wheat Bagel"]}
      {:name "cond_1_name", :childs ["Butter"]}
      {:name "cond_2_name", :childs ["Tub margarine"]}
      {:name "cond_3_name", :childs ["Reduced calorie spread (margarine type)"]}
      {:name "cond_4_name", :childs ["Cream cheese (regular)"]}
      {:name "cond_5_name", :childs ["Low fat cream cheese"]}
    ]}
    {:name "Foods_Needing_Condiments_Row", :childs [
      {:name "Survey_Food_Code", :childs ["58100100"]}
      {:name "display_name", :childs ["Beef burrito (no beans):"]}
      {:name "cond_1_name", :childs ["Sour cream"]}
      {:name "cond_2_name", :childs ["Guacamole"]}
      {:name "cond_3_name", :childs ["Salsa"]}
    ]}
    {:name "Foods_Needing_Condiments_Row", :childs [
      {:name "Survey_Food_Code", :childs ["58104740"]}
      {:name "display_name", :childs ["Chicken & cheese quesadilla:"]}
      {:name "cond_1_name", :childs ["Sour cream"]}
      {:name "cond_2_name", :childs ["Guacamole"]}
      {:name "cond_3_name", :childs ["Salsa"]}
    ]}
  ]}

Command Line Arguments

While you can hard-code the XML file name into your program, it makes the program less flexible. It would be much nicer if (as in the description of the étude) you could specify the file name to process on the command line.

To get command line arguments, use the arg property of the global js/process variable. Element 0 is "node", element 1 is the name of the JavaScript file, and element 2 is where your command line arguments begin. Thus, you can get the file name with:

(nth (.-argv js/process) 2)

Mutually Recursive Functions

While writing my solution, I had two separate functions: process-children, which iterated through all the childs. calling function process-child for each of them. However, a child element could itself have children, so process-child had to be able to call process-children. The term for this sort of situtation is that you have mutually recursive functions. Here’s the problem: ClojureScript requires you to define a function before you can use it, so you would think that you can’t have mutually recursive functions. Luckily, the inventor of Clojure foresaw this sort of situation and created the declare form that lets you declare a symbol that you will define later. Thus, I was able to write code like this:

(declare process-child)
  
(defn process-children [...]
   (process-child ...))

(defn process-child [...]
   (process-children ...))

Just because I used mutually recursive functions to solve the problem doesn’t mean you have to. If you can find a way to do it with a single recursive function, go for it. I was following the philosophy of “the first way you think of doing it that works is the right way.”

There’s a lot of explanation in this étude, and you are probably thinking this is going to be a huge program. It sure seemed that way to me while I was writing it, but it turned that was mostly because I was doing lots of tests in the REPL and looking things up in documentation. When I looked at the resulting program, it was only 45 lines. Here it is: Solution 4-1.

Étude 4-2: Condiment Server

Now that you have the map from the previous étude, what can you do with it? Well, how many times have you been staring at that jar of mustard and asking yourself “What food would go well with this?” This étude will cure that indecision once and for all. You will write a server using Express, which, as the web site says, is a “minimalist web framework for Node.js.” This article about using ClojureScript and Express was very helpful when I was first learning about the subject; I strongly suggest you read it.

Let’s set up a simple server that you can use as a basis for this étude. The server presents a form with an input field for the user's name. When the user clicks the submit button, the data is submitted back to the server and it echoes back the form and a message: “Pleased to meet you, username.”

Setting up Express

You will need to do the following:

  • Add [express "4.11.1"] to the :node-dependencies in your project.clj file.
  • Add [cljs.nodejs :as nodejs] to the (:require...) clause of the namespace declaration at the beginning of core.cljs.
  • Add (def express (nodejs/require "express")) in your core.cljs file
  • Make your main function look like this:
    (defn -main []
      (let [app (express)]
        (.get app "/" generate-page!)
        (.listen app 3000
                 (fn []
                   (println "Server started on port 3000")))))

    This starts a server on port 3000, and when it receives a get request, calls the generate-page! function. (You can also set up the server to accept post requests and route them to other URLS than the server root, but that is beyond the scope of this book.)

Generating HTML from ClojureScript

To generate the HTML dynamically, you will use the html function of the hiccups library. The function takes as its argument a vector that has a keyword as an element name, an optional map of attributes and values, and the element content. Here are some examples:

HTMLHiccup
<h1>Heading</h1> (html [:h1 "Heading"])
<p id="intro">test</p> (html [:p {:id "intro"} test])
<p>Click to <a href="page2.html">go to page two</a>.</p> (html [:p "Click to " [:a {:href "page2.html"} "go to page two"] "."])

You add [hiccups "0.3.0"] to your project.clj dependencies and modify your core.cljs file to require hiccups:

(ns servertest.core
  (:require-macros [hiccups.core :as hiccups])
  (:require [cljs.nodejs :as nodejs]
            [hiccups.runtime :as hiccupsrt]))

You are now ready to write the generate-page! function, which has two parameters: the HTTP request that the server received, and the HTTP response that you will send back to the client. The property (.-query request) is a JavaScript object with the form names as its properties. Thus, if you have a form entry like this:

<input type="text" name="userName"/>

You would access the value via (.-userName (.-query request)).

The generate-page function creates the HTML page as a string to send back to the client; you send it back by calling (.send response html-string). The HTML page will contain a form whose action URL is the server root (/). The form will have an input area for the user name and a submit button. This will be followed by a paragraph that has the text “Pleased to meet you, user name.” (or an empty paragraph if there's no user name). You can either figure out this code on your own or see a suggested solution. I’m giving you the code here because the purpose of this étude is to process the condiment map in the web page context rather than setting up the web page in the first place. (Of course, I strongly encourage you to figure it out on your own; you will learn a lot—I certainly did!)

Putting the Étude Together

Your program will use the previous étude’s code to build the map of condiments and compatible foods from the XML file. Then use the same framework that was developed in Generating HTML from ClojureScript, with the generated page containing:

  • A form with a <select> menu that gives the condiment names (the keys of the map). You may want to add an entry with the text “Choose a condiment” at the beginning of the menu to indicate “no choice yet.” When you create the menu, remember to select the selected="selected" attribute for the current menu choice.
  • A submit button for the form
  • An unordered list that gives the matching foods for that condiment (the value from the map), or an empty list if no condiment has been chosen.

Your code should alphabetize the condiment names and compatible foods. Some of the foods begin with capital letters; others with lower case. You will want to do a case-insensitive form. (Hint: use the form of sort that takes a comparison function.)

See a suggested solution: Solution 4-2B. To make the program easier to read, I put the code for creating the map into a separate file with its own namespace.

Étude 4-3: Maps—Frequency Table

This étude uses an excerpt of the Montgomery County, Maryland (USA) traffic violation database, which you may find at this URL. I have taken only the violations for July 2014, removed several of the columns of the data, and put the result into a TAB-separated value file named traffic_july_2014_edited.csv, which you may find in the GitHub repository. (Yes, I know CSV should be comma-separated, but using TAB makes life much easier.)

Here are the column headings:

As you can see, you have a treasure trove of data here. For example, one reason I chose July is that I was interested in seeing if the number of traffic violations was greater around the July 4th holiday (in the United States) than during the rest of the month.

If you look at the data, you will notice the “Make” (vehicle manufacturer) column would need some cleaning up to be truly useful. For example, there are entries such as TOYOTA, TOYT, TOYO, and TOUOTA. Various other creative spellings and abbreviations abound in that column. Also, the Scion is listed as both a make and a model. Go figure.

In this étude, you are going to write a Node.js project named frequency It will contain a function that reads the CSV file and creates a data structure (I suggest a vector of maps) for each row. For example:

[{:date "07/31/2014", :time "22:08:00" ... :gender "F", :driver-state "MD"},
  {:date "07/31/2014", :time "21:27:00" ... :gender "F", :driver-state "MD"}, ...]

Hints:

You will then write a function named frequency-table with two parameters:

  1. The data structure from the CSV file
  2. A column specifier

You can take advantage of ClojureScript’s higher order functions here. The specifier is a function that takes one entry (a “row”) in the data structure and returns a value. So, if you wanted a frequency table to figure out how many violations there are in each hour of the day, you would write code like this:

(defn hour [csv-row]
  (.substr (csv-row :time) 0 2))

(defn frequency-table [all-data col-spec]
  ;; your code here
)
  
;; now you do a call like this:
(frequency-table traffic-data hour)

Note that, because keyword access to maps works like a function, you could get the frequency of genders by doing this call:

(frequency-table traffic-data :gender true)

The return value from frequency-table will be a vector that consists of:

The return value from the call for gender looks like this: [["F" "M" "U"] [6732 12776 7] 19515]. Hint: Build a map whose keys are labels and whose values are their frequency, then use seq.

Here are some frequency tables that might be interesting: Color of car—which car colors are most likely to have a violation? Year of car manufacture—are older cars more likely to have a violation? (To be sure, there are other factors at work here. Car colors are not equally common, and there are fewer cars on the road that were manufactured in 1987 than were made last year. This étude is meant to teach you to use maps, not to make rigorous, research-ready hypotheses.)

Reading the CSV File

Reading a file one line at a time from Node.js is a non-trivial matter. Luckily for you and me, Jonathan Boston (twitter/github: bostonou), author of the ClojureScript Made Easy blog posted a wonderful solution just days before I wrote this étude. He has kindly given me permission to use the code, which you can get at this GitHub gist. Follow the instructions in the gist, and separate the Clojure and ClojureScript code. Your src directory will look like this:

src
├── cljs_made_easy
│   ├── line_seq.clj
│   └── line_seq.cljs
└── traffic
    └── core.cljs

Inside the core.cljs file, you will have these requirements:

(ns traffic.core
  (:require [cljs.nodejs :as nodejs]
            [clojure.string :as str]
            [cljs-made-easy.line-seq :as cme]))
 
(def filesystem (js/require "fs")) ;;require nodejs lib

You can then read a file like this, using with-open and line-seq very much as they are used in Clojure. In the following code, the call to .openSync has three arguments: the filesystem defined earlier, the file name, and the file mode, with "r" for reading.

(defn example [filename]
  (cme/with-open [file-descriptor (.openSync filesystem filename "r")]
             (println (cme/line-seq file-descriptor))))

Note: You may want to use a smaller version of the file for testing. The code repository contains a file named small_sample.csv with 14 entries.

See a suggested solution: Solution 4-3.

Étude 4-4: Complex Maps—Cross-tabulation

Add to the previous étude by writing a function named cross-tab; it creates frequency cross-tabluations. It has these parameters:

  1. The data structure from the CSV file
  2. A row specifier
  3. A column specifier

Again, the row and column specifiers are functions. So, if you wanted a cross-tabulation with hour of day as the rows and gender as the columns, you might write code like this:

(defn hour [csv-row]
  (.substr (csv-row :time) 0 2))
  
(defn cross-tab [all-data row-spec col-spec]
  ;; your code here
  )
  
;; now you do a call like this:
(crosstab traffic-data hour :gender)

The return value from cross-tab will be a vector that consists of:

The previous search on the full data set returns this result, reformatted to avoid excessively long lines:

(cross-tab traffic-data hour :gender)
[["00" "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12"
"13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23"] ["F" "M" "U"]
[[335 719 0] [165 590 0] [141 380 0] [96 249 0] [73 201 0] [63 119 0]
[129 214 2] [380 625 0] [564 743 1] [481 704 0] [439 713 1] [331 527 0]
[243 456 0] [280 525 0] [344 515 0] [276 407 0] [307 514 1] [317 553 0]
[237 434 1] [181 461 0] [204 553 1] [289 657 0] [424 961 0] [433 956 0]]
[1054 755 521 345 274 182 345 1005 1308 1185 1153 858 699 805 859 683
822 870 672 642 758 946 1385 1389] [6732 12776 7] 19515]

Here are some of the cross-tabulations that might be interesting:

Bonus points: write the code such that if you give cross-tab a nil for the column specifier, it will still work, returning only the totals for the row specifier. Then, re-implement frequency-table by calling cross-tab with nil for the column specifier. Hint: You will have to take the vector of vectors for the “cross-tabulation” totals and make it a simple vector. Either map or flatten will be useful here.

See a suggested solution: Solution 4-4.

Étude 4-5: Cross-Tabulation Server

Well, as you can see, the output from the previous étude is ugly to the point of being nearly unreadable. This rather open-ended étude aims to fix that. Your mission, should you decide to accept it, is to set up the code in an Express server to deliver the results in a nice, readable HTML table. Here are some of the things I found out while coming up with a solution, a screenshot of which appears in Figure 4-1.

Screenshot showing traffic
Figure 4-1. Screenshot of Traffic Cross-Tabulation Table

See a suggested solution (which I put in a project named traffic): Solution 4-5.