In this chapter, you will work with maps (not to be confused with the
map
function, though you can use map
on a map). Also,
the études are designed to run on the server side with Node.js®, so you may want to see how to set that
up in ClojureScript on the Server.
If you spend some time going through open datasets such as those form data.gov, you will find some fairly, shall we say, esoteric data. Among them is MyPyramid Food Raw Data from the Food and Nutrition Service of the United States Department of Agriculture.
One of the files is Foods_Needing_Condiments_Table.xml, which gives a list of foods and condiments that go with them. Here is what part of the file looks like, indented and edited to eliminate unnecessary elements, and placed in a file named test.xml.
<Foods_Needing_Condiments_Table> <Foods_Needing_Condiments_Row> <Survey_Food_Code>51208000</Survey_Food_Code> <display_name>100% Whole Wheat Bagel</display_name> <cond_1_name>Butter</cond_1_name> <cond_2_name>Tub margarine</cond_2_name> <cond_3_name>Reduced calorie spread (margarine type)</cond_3_name> <cond_4_name>Cream cheese (regular)</cond_4_name> <cond_5_name>Low fat cream cheese</cond_5_name> </Foods_Needing_Condiments_Row> <Foods_Needing_Condiments_Row> <Survey_Food_Code>58100100</Survey_Food_Code> <display_name>"Beef burrito (no beans):"</display_name> <cond_1_name>Sour cream</cond_1_name> <cond_2_name>Guacamole</cond_2_name> <cond_3_name>Salsa</cond_3_name> </Foods_Needing_Condiments_Row> <Foods_Needing_Condiments_Row> <Survey_Food_Code>58104740</Survey_Food_Code> <display_name>Chicken & cheese quesadilla:</display_name> <cond_1_name>Sour cream</cond_1_name> <cond_2_name>Guacamole</cond_2_name> <cond_3_name>Salsa</cond_3_name> </Foods_Needing_Condiments_Row> </Foods_Needing_Condiments_Table>
Your task, in this étude, is to take this XML file and build a ClojureScript map whose keys are the condiments and whose values are vectors of foods that go with those condiments. Thus, for the sample file, running the program from the command line the output would be this map (formatted and quotemarked for ease of reading):
[etudes@localhost nodetest]$ node condiments.js test.xml {"Butter" ["100% Whole Wheat Bagel"], "Tub margarine" ["100% Whole Wheat Bagel"], "Reduced calorie spread (margarine type)" ["100% Whole Wheat Bagel"], "Cream cheese (regular)" ["100% Whole Wheat Bagel"], "Low fat cream cheese" ["100% Whole Wheat Bagel"], "Sour cream" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"], "Guacamole" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"], "Salsa" ["Beef burrito (no beans):" "Chicken & cheese quesadilla:"]}
How do you parse XML using Node.js? Install the node-xml-lite
module:
[etudes@localhost ~]$ npm install node-xml-lite npm http GET https://registry.npmjs.org/node-xml-lite npm http 304 https://registry.npmjs.org/node-xml-lite npm http GET https://registry.npmjs.org/iconv-lite npm http 304 https://registry.npmjs.org/iconv-lite node-xml-lite@0.0.3 node_modules/node-xml-lite └── iconv-lite@0.4.8
Bring the XML parsing module into your core.cljs file:
(def xml (js/require "node-xml-lite"))
The following code will parse an XML file and return a JavaScript object:
(.parseFileSync xml "test.xml")
And here is the JavaScript object that it produces:
{:name "Foods_Needing_Condiments_Table", :childs [ {:name "Foods_Needing_Condiments_Row", :childs [ {:name "Survey_Food_Code", :childs ["51208000"]} {:name "display_name", :childs ["100% Whole Wheat Bagel"]} {:name "cond_1_name", :childs ["Butter"]} {:name "cond_2_name", :childs ["Tub margarine"]} {:name "cond_3_name", :childs ["Reduced calorie spread (margarine type)"]} {:name "cond_4_name", :childs ["Cream cheese (regular)"]} {:name "cond_5_name", :childs ["Low fat cream cheese"]} ]} {:name "Foods_Needing_Condiments_Row", :childs [ {:name "Survey_Food_Code", :childs ["58100100"]} {:name "display_name", :childs ["Beef burrito (no beans):"]} {:name "cond_1_name", :childs ["Sour cream"]} {:name "cond_2_name", :childs ["Guacamole"]} {:name "cond_3_name", :childs ["Salsa"]} ]} {:name "Foods_Needing_Condiments_Row", :childs [ {:name "Survey_Food_Code", :childs ["58104740"]} {:name "display_name", :childs ["Chicken & cheese quesadilla:"]} {:name "cond_1_name", :childs ["Sour cream"]} {:name "cond_2_name", :childs ["Guacamole"]} {:name "cond_3_name", :childs ["Salsa"]} ]} ]}
While you can hard-code the XML file name into your program, it makes the program less flexible. It would be much nicer if (as in the description of the étude) you could specify the file name to process on the command line.
To get command line arguments, use the arg
property of the global js/process
variable. Element 0 is "node"
, element 1 is the name of the JavaScript file, and element 2 is where your command line arguments begin. Thus, you can get the file name with:
(nth (.-argv js/process) 2)
While writing my solution, I had two separate functions: process-children
, which iterated through all the childs
. calling function process-child
for each of them. However, a child element could itself have children, so process-child
had to be able to call process-children
. The term for this sort of situtation is that you have mutually recursive functions. Here’s the problem: ClojureScript requires you to define a function before you can use it, so you would think that you can’t have mutually recursive functions. Luckily, the inventor of Clojure foresaw this sort of situation and created the declare
form that lets you declare a symbol that you will define later. Thus, I was able to write code like this:
(declare process-child) (defn process-children [...] (process-child ...)) (defn process-child [...] (process-children ...))
Just because I used mutually recursive functions to solve the problem doesn’t mean you have to. If you can find a way to do it with a single recursive function, go for it. I was following the philosophy of “the first way you think of doing it that works is the right way.”
There’s a lot of explanation in this étude, and you are probably thinking this is going to be a huge program. It sure seemed that way to me while I was writing it, but it turned that was mostly because I was doing lots of tests in the REPL and looking things up in documentation. When I looked at the resulting program, it was only 45 lines. Here it is: Solution 4-1.
Now that you have the map from the previous étude, what can you do with it? Well, how many times have you been staring at that jar of mustard and asking yourself “What food would go well with this?” This étude will cure that indecision once and for all. You will write a server using Express, which, as the web site says, is a “minimalist web framework for Node.js.” This article about using ClojureScript and Express was very helpful when I was first learning about the subject; I strongly suggest you read it.
Let’s set up a simple server that you can use as a basis for this étude. The server presents a form with an input field for the user's name. When the user clicks the submit button, the data is submitted back to the server and it echoes back the form and a message: “Pleased to meet you, username.”
You will need to do the following:
[express "4.11.1"]
to the :node-dependencies
in your
project.clj file.[cljs.nodejs :as nodejs]
to the (:require...)
clause of the namespace declaration at the beginning of core.cljs.(def express (nodejs/require "express"))
in your core.cljs filemain
function look like this:
(defn -main [] (let [app (express)] (.get app "/" generate-page!) (.listen app 3000 (fn [] (println "Server started on port 3000")))))
This starts a server on port 3000, and when it receives a get
request, calls the generate-page!
function. (You can also set up the server to accept post
requests and route them to other URLS than the server root, but that is beyond the scope of this book.)
To generate the HTML dynamically, you will use the html
function of the hiccups library. The function takes as its argument a vector that has a keyword as an element name, an optional map of attributes and values, and the element content. Here are some examples:
HTML | Hiccup |
---|---|
<h1>Heading</h1> | (html [:h1 "Heading"]) |
<p id="intro">test</p> | (html [:p {:id "intro"} test]) |
<p>Click to <a href="page2.html">go to page two</a>.</p> | (html [:p "Click to " [:a {:href "page2.html"} "go to page two"] "."]) |
You add [hiccups "0.3.0"]
to your project.clj dependencies and modify your core.cljs file to require hiccups:
(ns servertest.core (:require-macros [hiccups.core :as hiccups]) (:require [cljs.nodejs :as nodejs] [hiccups.runtime :as hiccupsrt]))
You are now ready to write the generate-page!
function, which has two parameters: the HTTP request that the server received, and the HTTP response that you will send back to the client. The property (.-query request)
is a JavaScript object with the form names as its properties. Thus, if you have a form entry like this:
<input type="text" name="userName"/>
You would access the value via (.-userName (.-query request))
.
The generate-page
function creates the HTML page as a string to send back to the client; you send it back by calling (.send response html-string)
. The HTML page will contain a form whose action
URL is the server root (/
). The form will have an input area for the user name and a submit button. This will be followed by a paragraph that has the text “Pleased to meet you, user name.” (or an empty paragraph if there's no user name). You can either figure out this code on your own or see a suggested solution. I’m giving you the code here because the purpose of this étude is to process the condiment map in the web page context rather than setting up the web page in the first place. (Of course, I strongly encourage you to figure it out on your own; you will learn a lot—I certainly did!)
Your program will use the previous étude’s code to build the map of condiments and compatible foods from the XML file. Then use the same framework that was developed in Generating HTML from ClojureScript, with the generated page containing:
<select>
menu that gives the condiment names (the keys of the map). You may want to add an entry with the text “Choose a condiment” at the beginning of the menu to indicate “no choice yet.” When you create the menu, remember to select the selected="selected"
attribute for the current menu choice.
Your code should alphabetize the condiment names and compatible foods. Some of the foods begin with capital letters; others with lower case. You will want to do a case-insensitive form. (Hint: use the form of sort
that takes a comparison function.)
See a suggested solution: Solution 4-2B. To make the program easier to read, I put the code for creating the map into a separate file with its own namespace.
This étude uses an excerpt of the Montgomery County, Maryland (USA) traffic violation database, which you may find at this URL. I have taken only the violations for July 2014, removed several of the columns of the data, and put the result into a TAB-separated value file named traffic_july_2014_edited.csv, which you may find in the GitHub repository. (Yes, I know CSV should be comma-separated, but using TAB makes life much easier.)
Here are the column headings:
As you can see, you have a treasure trove of data here. For example, one reason I chose July is that I was interested in seeing if the number of traffic violations was greater around the July 4th holiday (in the United States) than during the rest of the month.
If you look at the data, you will notice the “Make” (vehicle manufacturer) column would need some cleaning up to be truly useful. For example, there are entries such as TOYOTA, TOYT, TOYO, and TOUOTA. Various other creative spellings and abbreviations abound in that column. Also, the Scion is listed as both a make and a model. Go figure.
In this étude, you are going to write a Node.js project named frequency It will contain a function that reads the CSV file and creates a data structure (I suggest a vector of maps) for each row. For example:
[{:date "07/31/2014", :time "22:08:00" ... :gender "F", :driver-state "MD"}, {:date "07/31/2014", :time "21:27:00" ... :gender "F", :driver-state "MD"}, ...]
Hints:
If there are columns you don’t want or need in the map, enter(
def
headings
[
:date
:time
...
:gender
:driver-state
])
nil
in the vector.zipmap
to make it easy to construct a map for each row. You will have to get rid of the nil
entry; dissoc
is your friend here.
You will then write a function named frequency-table
with two parameters:
You can take advantage of ClojureScript’s higher order functions here. The specifier is a function that takes one entry (a “row”) in the data structure and returns a value. So, if you wanted a frequency table to figure out how many violations there are in each hour of the day, you would write code like this:
(
defn
hour
[
csv-row
]
(
.substr
(
csv-row
:time
)
0
2
))
(
defn
frequency-table
[
all-data
col-spec
]
;; your code here
)
;; now you do a call like this:
(
frequency-table
traffic-data
hour
)
Note that, because keyword access to maps works like a function, you could get the frequency of genders by doing this call:
(
frequency-table
traffic-data
:gender
true
)
The return value from frequency-table
will be a vector that consists of:
The return value from the call for gender looks like this: [["F" "M" "U"] [6732 12776 7] 19515]
.
Hint: Build a map whose keys are labels and whose values are their frequency, then use seq
.
Here are some frequency tables that might be interesting: Color of car—which car colors are most likely to have a violation? Year of car manufacture—are older cars more likely to have a violation? (To be sure, there are other factors at work here. Car colors are not equally common, and there are fewer cars on the road that were manufactured in 1987 than were made last year. This étude is meant to teach you to use maps, not to make rigorous, research-ready hypotheses.)
Reading a file one line at a time from Node.js is a non-trivial matter. Luckily for you and me, Jonathan Boston (twitter/github: bostonou), author of the ClojureScript Made Easy blog posted a wonderful solution just days before I wrote this étude. He has kindly given me permission to use the code, which you can get at this GitHub gist. Follow the instructions in the gist, and separate the Clojure and ClojureScript code. Your src directory will look like this:
src ├── cljs_made_easy │ ├── line_seq.clj │ └── line_seq.cljs └── traffic └── core.cljs
Inside the core.cljs file, you will have these requirements:
(
ns
traffic.core
(
:require
[
cljs.nodejs
:as
nodejs
]
[
clojure.string
:as
str
]
[
cljs-made-easy.line-seq
:as
cme
]))
(
def
filesystem
(
js/require
"fs"
))
;;require nodejs lib
You can then read a file like this, using with-open
and line-seq
very much as they are used in Clojure. In the following code, the call to .openSync
has three arguments: the filesystem defined earlier, the file name, and the file mode, with "r"
for reading.
(
defn
example
[
filename
]
(
cme/with-open
[
file-descriptor
(
.openSync
filesystem
filename
"r"
)]
(
println
(
cme/line-seq
file-descriptor
))))
Note: You may want to use a smaller version of the file for testing. The code repository contains a file named small_sample.csv with 14 entries.
See a suggested solution: Solution 4-3.
Add to the previous étude by writing a function named cross-tab
; it creates frequency cross-tabluations. It has these parameters:
Again, the row and column specifiers are functions. So, if you wanted a cross-tabulation with hour of day as the rows and gender as the columns, you might write code like this:
(
defn
hour
[
csv-row
]
(
.substr
(
csv-row
:time
)
0
2
))
(
defn
cross-tab
[
all-data
row-spec
col-spec
]
;; your code here
)
;; now you do a call like this:
(
crosstab
traffic-data
hour
:gender
)
The return value from cross-tab
will be a vector that consists of:
The previous search on the full data set returns this result, reformatted to avoid excessively long lines:
(cross-tab traffic-data hour :gender) [["00" "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23"] ["F" "M" "U"] [[335 719 0] [165 590 0] [141 380 0] [96 249 0] [73 201 0] [63 119 0] [129 214 2] [380 625 0] [564 743 1] [481 704 0] [439 713 1] [331 527 0] [243 456 0] [280 525 0] [344 515 0] [276 407 0] [307 514 1] [317 553 0] [237 434 1] [181 461 0] [204 553 1] [289 657 0] [424 961 0] [433 956 0]] [1054 755 521 345 274 182 345 1005 1308 1185 1153 858 699 805 859 683 822 870 672 642 758 946 1385 1389] [6732 12776 7] 19515]
Here are some of the cross-tabulations that might be interesting:
Bonus points: write the code such that if you give cross-tab
a nil
for the column specifier, it will still work, returning only the totals for the row specifier. Then, re-implement frequency-table
by calling cross-tab
with nil
for the column specifier. Hint: You will have to take the vector of vectors for the “cross-tabulation” totals and make it a simple vector. Either map
or flatten
will be useful here.
See a suggested solution: Solution 4-4.
Well, as you can see, the output from the previous étude is ugly to the point of being nearly unreadable. This rather open-ended étude aims to fix that. Your mission, should you decide to accept it, is to set up the code in an Express server to deliver the results in a nice, readable HTML table. Here are some of the things I found out while coming up with a solution, a screenshot of which appears in Figure 4-1.
I wanted to use as much of the code from Étude 4-2: Condiment Server as possible, so I decided on drop-down menus to choose the fields. However, a map was not a good choice for generating the menu. In the condiment server, it made sense to alphabetize the keys of the food map. In this étude, the field names are listed by conceptual groups; it doesn't make sense to alphabetize them, and the keys of a map are inherently unordered. Thus, I ended up making a vector of vectors.
I used map-indexed
to create the option menu such that each option had a numeric value. However, when the server reads the value from the request, it gets a string, and 5
is not equal to "5"
. The fix was easy, but I lost a few minutes figuring out why my selected item wasn’t coming up when I came back from a request.
The source file felt like it was getting too big, so I put the cross tabulation code into a separate file named crosstab.cljs in the src/traffic directory.
I wanted to include a CSS file, so I put the specification in the header of the hiccups code. However, to make it work, I had to tell Express how to serve static files, using "."
for the root directory in:
(
.use
app
(
.static
express
"<em>path/to/root/directory</em>"
))
Having the REPL is really great for testing.
I finished the program late at night. Again, “the first way you think of doing it that works is the right way,” but I am unhappy with the solution. I would really like to unify the cases of one-dimensional and two-dimensional tables, and there seems to be a dreadful amount of unnecessary duplication. To paraphrase Don Marquis, my solution “isn’t moral, but it might be expedient.”
See a suggested solution (which I put in a project named traffic): Solution 4-5.