CIT 042 Index > How HTTP and CGI Work

How HTTP and CGI Work

HTTP

HTTP stands for HyperText Transport Protocol; it’s a contract between the client and server that let them communicate with each other once a connection between them has been established. The client’s contract is very simple:

  1. I will establish a connection with the server.
  2. I will tell the server which file I want.
  3. I will receive any output the server sends me until it closes the connection.
  4. The output will tell me what kind of data has been sent, and I will interpret and display the information appropriately.

Step three is the important one here: the client is just waiting for the server to send output. It doesn’t know, and doesn’t care, how that data came to be. It could have been generated by a trained ape typing at a terminal connected to the server, and the client would be just as happy. Bytes are bytes. Of course, most servers are not staffed by trained apes at terminals (no system administrator jokes, please). Servers are software programs which can be as simple as this pseudocode:

  1. I will wait for incoming data.
  2. I will analyze the client’s data as a request for a file.
  3. If I find the file the client requested, I will:
    1. Tell the client what kind of information the file contains.
    2. Read the file one line at a time and print it out to the client.
  4. If I can’t find the file, I will print an error message to the client.
  5. I will close the connection.

Notice that the server doesn’t care who the client is; it can be a browser or another trained ape. It doesn’t care how the data gets interpreted once the client receives it. The server’s purpose in life is to send those bytes. Period.

Dynamic Web Pages and CGI

This arrangement is fine as long as you only want static web pages that never, ever change. But let’s say you wanted a web page that would tell you what time of day it is on the server. Here is just such a page, which you may access via the link or by copying and pasting the URL http://evc-cit.info/cgi-bin/cit042/timeofday.cgi into the browser’s location bar.

To make this page work, we have to change the way the server handles client requests:

  1. I will wait for incoming data.
  2. I will analyze the client’s data as a request for a file.
  3. If I find the file and it is just a plain file (HTML or image), I will proceed as before, finding the file, sending its type information, and printing the file out to the client.
  4. If you find the file and it is executable and ends with .cgi, I will run that program and send its output (whatever it is) to the client.
  5. If I can’t find the file, I will print an error message to the client.
  6. I will close the connection.

When you clicked the link for the time of day program, the server used step four and ran this program:

#!/usr/bin/perl

#	Get time of day in a nicely formatted string
my $time_of_day = scalar localtime;

#
#	Tell the client what kind of information follows
#
print "Content-type: text/html\n\n";

#
#	Here is the information
#
print "<html>\n";
print "<head>\n";
print "<title>Time of Day</title>\n";
print "</head>\n";
print "<body>\n";
print "<h2>It is now $time_of_day on the server.</h2>\n";
print "</body>\n";
print "</html>\n";

The CGI Environment

When the server starts your program running in step four, it makes a set of environment variables available to the program. In Perl, these variables are accessed through the %ENV hash. If you try this program or copy and paste this URL in the location bar: http://evc-cit.info/cgi-bin/cit042/timeofday_env.cgi you will see the name of the server software as well as the time of day, and here’s the code:

#!/usr/bin/perl

#
#	Get time of day in a nicely formatted string
my $time_of_day = scalar localtime;

#
#	Get name of server software
my $software = $ENV{"SERVER_SOFTWARE"};

#
#	Tell the client what kind of information follows
#
print "Content-type: text/html\n\n";

#
#	Here is the information
#
print <<"HTMLPAGE";
<html>
<head>
<title>Time of Day</title>
</head>
<body>
<h2>It is now $time_of_day on the server.</h2>
<h2>This server is running $software</h2>
</body>
</html>
HTMLPAGE

Notice that, rather than using a series of print statements, this program uses a “here” document to make it easier to read.

Two-Way Communication

You can now make a program that generates dynamic content, but there’s one missing piece: processing user input.

User Interaction - the Client Side

Users provide information via an HTML form. The <form> tag’s action attribute gives the URL of the program on the server that will process the form’s data. Here’s what happens on the client side when you click the send button on a form:

  1. The browser goes through your form field by field and creates a string with the field names and values.
  2. The browser then makes an HTTP request for the URL given in the action address, sending along the string created in step one.

For example, presume that this form is on an HTML page:

<form action="http://evc-cit.info/cgi-bin/cit042/age.cgi"
  method="get">
<p>
Name: <input type="text" name="theName" />
Age: <input type="text" name="theAge" /><br />
<input type="submit" value="Send Data" />
</p>
</form>

If you fill in the first field with the name Fred and the second field with 20, the browser will create the string theName=Fred&theAge=20 and send that to the server along with the request for file http://evc-cit.info/cgi-bin/cit042/age.cgi

User Interaction - the Server Side

Once again, the server receives the request, finds that the file is a program, and runs it. The string that the browser created is in the $ENV{"QUERY_STRING"} variable (because we used method="get" in the form). We could use the split operator to separate out the field names and values into a hash...

%data = ();
@pairs = split("&", $ENV{$QUERY_STRING});
foreach $pair (@pairs)
{
   ($key, $value) = split("=", $pair);
   $data{$key} = $value;
}

...and then we could access the information from the form as $data{"theName"} and $data{"theAge"}. Unfortunately, this simplistic code won’t work if the input has blanks or certain punctuation in it. See why this is the case.

Also, this code won’t handle data sent via method="post". Furthermore, writing all this code would be re-inventing the wheel. Instead, we will use the CGI library that is shipped with Perl. Put this code at the top of your CGI file:

#!/usr/bin/perl
use CGI qw(:standard -debug);

This will automagically extract the information string that the client sent, taking care of any punctuation problems, and make it available via calls to the param() function.

Herewith the form, and the Perl code that runs on the server side.

Name: Age:

#!/usr/bin/perl
use CGI qw(:standard -debug);

#
#	Retrieve Parameters
my $name = param("theName");
my $age = param("theAge");
my $days;

#
#	Tell the client what kind of information follows
#
print "Content-type: text/html\n\n";

#
#	Here is the information
#
print <<"FIRSTPART";
<html>
<head>
<title>Your age in days</title>
</head>
<body>
FIRSTPART

if ($name eq "")
{
	$name = "Mystery Guest";
}
if ($age == 0)
{
	$age = "??";
	$days = "unknown number of";
}
else
{
	$days = $age * 365;
}

print "<p>Hello, $name!</p>\n";
print "<p>Your age of $age is $days days.</p>\n";
print "</body></html>\n";

FAQ about CGI

Does the HTML file have to be on the server?

No. Your HTML file can reside on your local hard disk. The only advantage of putting the HTML file on the server is that your action attribute can use a relative path name instead of an absolute path name.

How come my HTML works when I open it using File/Open in the browser, but my CGI doesn’t?

Since HTML is static content, the browser is equally capable of getting the file from your hard disk as from the server. The browser just opens the file and reads it line by line. Your CGI script is a program, and has to be run; opening the CGI script in the browser doesn’t run it—it just opens the file and reads it line by line. That’s why, ultimately, your script has to be on a server as an executable program.

How can I test my CGI script without a server?

That’s why we are using -debug when we use CGI. This will let you run your program from the command line. When the CGI module needs information, it will ask you to enter name and value pairs from the keyboard. If a value has blanks in it, enclose the value in double quotes (they will not show up as part of the value). When you finish entering name and value pairs, press CTRL-Z on Windows, or CTRL-D on Linux. As an example, here’s what happened when I ran age.cgi from the command line (user input is in bold). Note that the CTRL-Z shows up as ^Z in Windows. Do not type an accent-circumflex (^) followed by a Z!

F:\web\cit042>perl age.cgi
(offline mode: enter name=value pairs on standard input)
theName="J. Edgar Froufrou"
theAge=20
^Z
Content-type: text/html

<html>
<head>
<title>Your age in days</title>
</head>
<body>
<p>Hello, J. Edgar Froufrou!</p>
<p>Your age of 20 is 7300 days.</p>
</body></html>

F:\web\cit042>
Why do I get an Internal Server Error?

Here are the usual suspects:

How do I get more details about server errors?

Start your CGI script as follows. The use CGI::Carp sends messages from fatal errors, including many compile errors, to the browser instead of the system log file.

#!/usr/bin/perl
use CGI qw(:standard -debug);
use CGI::Carp qw(fatalsToBrowser);
Why does the resulting HTML page look wrong?

Your code is generating HTML that isn’t what you want. Right-click the resulting HTML page and choose View Page Source in Mozilla; View Source in Internet Explorer. This lets you see the HTML that your script produced, which may give you a clue as to what is going wrong. You will not see your Perl code when you view source. If you think you will, then you haven’t yet fully understood how CGI works.

I’m using some-company-name as my ISP. Can I use CGI with them?

Ask them. Personal web sites from yahoo.com and aol.com almost certainly cannot run CGI. If your ISP does let you run CGI, follow their directions for where to put the files and what permissions to give them. Most ISPs will tell you to put scripts into a directory named cgi-bin. The server you’re using has been set up to allow you to put scripts in your public_html folder. For security reasons, this is usually not a good place to put executable files, but I wanted to make the upload process as simple as possible.