Common Gateway Interface

(redirected from Main.GGI)

Site in transition, you might want to look at the old version

AdaCL.CGI: Another Ada 95 Binding to CGI

First I like to thank David A. Wheeler (dwheeler@dwheeler.com /dwheeler@ida.org) for AdaCGI from which AdaCL.CGI (inclusive this documentation) have been derived. However AdaCGI hasn't been updated for over 2 years so I took the liberty to adopt the package into AdaCL.§§

What is AdaCL.CGI

AdaCL.CGI (formerly called AdaCGI and "Package CGI") is an Ada 95 interface to the "Common Gateway Interface" (CGI). AdaCL.CGI makes it easier to create Ada programs that can be invoked by World-Wide-Web (WWW) HTTP servers using the standard CGI interface. Using it, you can create Ada programs that perform queries or other processing by request from a WWW user. Such programs are often called "web applications" or simply "web apps." If you don't already know the advantages and disadvantages of using CGI and Ada, you might want to look at the later section titled Advantages and Disadvantages: CGI, Ada, AdaCL.CGI.§§

This documentation assumes that you are already familiar with HTML. To use this package you'll need to learn a little about Ada 95; the Lovelace Ada 95 tutorial provides one good way to do so. It would help if you understood the Common Gateway Interface (CGI), though hopefully for straightforward web applications you won't need to.§§

Data Access

This Ada package provides two data access approaches for CGI-related data:

  1. As an associative array; simply provide the key name (as a string) and the value associated with that key will be returned. Since CGI keys can duplicate, you can even ask for "the Nth value of this key."
  2. As a sequence of key-value pairs, indexed from 1 to Argument_Count. This access approach is similar to the Ada library Ada.Command_Line.

As usual, AdaCL packages are object orientated. The base access class can be found in AdaCL.CGI.Abstract_Data.§§

Trying out a sample program

To actually use this library, you need to write a program (this is, after all, only a library!), test the program, and then install it so it will be invoked by your web server. Included are some sample programs so you can try things out and see how the library works.

Impatient to do something? Well, compile the demonstration program using make astro_release. This will compile my wives astrologie program. Then run the "Astro" program by typing:§§

 cd www
 md \var\log\AdaCL\
 ln -s ../§n§Linux_Release/Astro§§

./Astro§§

The output will look like this:

 Content-type: text/html

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">§§
 <HTML>§§
 <HEAD>§§
    <META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-15">§§
    <TITLE>Astro Test Input</TITLE>§§
    <META NAME="GENERATOR" CONTENT="OpenOffice.org 1.1.0  (Win32)">§§
    <META NAME="AUTHOR" CONTENT="Martin Krischik">§§
    <META NAME="CREATED" CONTENT="20040102;13402900">§§
    <META NAME="CHANGEDBY" CONTENT="Martin Krischik">§§
    <META NAME="CHANGED" CONTENT="20040301;15584325">§§
    <META NAME="CLASSIFICATION" CONTENT="Astro Test Input">§§
    <META NAME="DESCRIPTION" CONTENT="Astro Test Input">§§
    <META NAME="KEYWORDS" CONTENT="Astro Test Input">§§
    <LINK REL=STYLESHEET HREF="Astro.css" TYPE="text/css">§§
    <LINK REL=STYLESHEET HREF="Astro.css" TYPE="text/css">§§
    <!-- LINK REL=STYLESHEET HREF="Astro.css" TYPE="text/css" -->§§
    <STYLE>§§</PRE>

The first line means that the program is returning an HTML file (the common case). The second (blank) line means that there is no more meta-information about the data to be returned to the user (such as cookies to be set). The rest of the lines are a the content of the astro_in.html HTML file.§§

Notice that no web server is required here; we can just invoke the program directly. That's really handy in debugging, because sometimes when you're interacting with a buggy server program it's hard to determine why things are failing unless you can see EXACTLY what is being sent back.

Another way to debug is to look look at the log file \var\log\AdaCL\Astro-Main.log.§§

So, how can we send data to this script? The answer is by setting the REQUEST_METHOD and QUERY_STRING environment variables. Setting the REQUEST_METHOD variable to GET causes the library to get its data from the QUERY_STRING variable. This QUERY_STRING environment variable is of the form "fieldname=value"; if there's more than one field, the entries should be separated by ampersands (&). If you want the values "=", "&", " ", or "%" in the value, write them as "%3D", "%26", "%20", and "%25" respectively using the standard URL escaping mechanism (i.e., characters are replaced with % followed by 2 hexadecimal digits). On Unix-compatible systems running an sh-compatible shell (including Linux running bash), just do the following:

 REQUEST_METHOD="GET"
 export REQUEST_METHOD§§
 QUERY_STRING="name=David%20Wheeler&email=dwheeler@dwheeler.com"§§
 export QUERY_STRING§§</PRE>!!! 
 Web-Server

So, how could you get this running through a web server? Well, you'll need to copy this program into an area that the web server accepts for executables. By implication, that means that your web server will have to be configured to run programs in certain directories and that you have permission to write to such a directory. On a typical Linux system running the Apache web server, such a directory is probably called "/src/www/cgi-bin", and is writeable by root, so on such a configuration you'd do this to make the server "Astro" available to all:§§

 su
 cp Linux_Release/Astro /src/www/cgi-bin§§
 cü www/*§n§ /srv/www/cgi-bin§§§§</PRE>

Assuming that your web server will run programs and you've installed your server in an appropriate place, now you need to try it out. Start up a web browser and open up:

 http://localhost/cgi-bin/Astro§§

replacing "localhost" with the name of the machine the web server is at if it's not your local machine. You should see a request to enter your name, a text box for entry, and a submit button. You can provide preset values to it by opening the URL using a format like:

 http://localhost/cgi-bin/minimal?name=David%20Wheeler&email=dwheeler@dwheeler.com§§</PRE>

You can also see the screenshot showing the Astro.adb program in action.§§

Details on Ada 95 Binding to CGI

Now, let's talk about how to write your own programs using this library.

To use package CGI, "with AdaCL.CGI.Parameter" in your Ada program and instantiate an AdaCL.CGI.Parameter.Object object. CGI handles both "GET" and "POST" forms of CGI data automatically. The form information from a GET or POST form is loaded into a sequence of variables; each variable has a Key and a Value. Package CGI transforms "Isindex" queries into a form with a single key (named "isindex"), and the key's value is the query value.§§

Once the main Ada program starts, it can make various calls to the CGI subprograms to get information or to send information back out.

A typical program using package CGI would first call "AdaCL.CGI-Result.Put_CGI_Header", which tells the calling HTTP server what kind of information will be returned. Usually the CGI header is a reply saying "I will reply a generated HTML document", so that is the default of Put_CGI_Header, but you could reply something else (for example, a Location: header to automatically redirect someone to a different URL;see the CGI specification for information about references which allow you to redirect browsers elsewhere).§§

Most CGI programs handle various types of forms, and most should automatically reply with a blank form if the user hasn't provided a filled-in form. Thus, your program will probably call AdaCL.CGI.Parameter.Input_Received, which returns True if input has been received (and otherwise it returns False). You should reply with a blank form if AdaCL.CGI§n§.Parameter§§.Input_Received§§ is False.§§

You can then use various routines to query what data values were sent. You can query either by the name of a variable (what is the value of 'name'?) or by position (what was the first variable's key name and value sent?):

  • To query by variable name, use the "Value" function with the variable name as its parameter. Normally if the variable wasn't sent you'll just get an empty string back, but you can call Value with Required=>True to cause the exception Constraint_Error to be raised if the variable was not sent. To determine if given a key was sent, you can also use Key_Exists, which will return True if the given key was sent (and False otherwise). A given key can have more than one value; you can use the "Index" parameter to return the Nth value of a given key (the default is to return the first value). Function Key_Count will return how many values there are for a given key.
  • To query by position, use the "Value" function with a Positive to get that value and the "Key"function to get the variable name. For example, Value(1) is the value of the first variable and Key(1) is the name of the first ariable. The number of values sent is stored in CGI.Argument_Count.

There are also a number of useful output functions:

  • Procedure Put_Variables is useful while debugging; it will cause all form values sent to be printed (in HTML format) to the Current_Output.
  • Procedure Put_HTML_Head will put out an HTML header. It is given a title and an optional Mail_To email address.
  • Procedure Put_HTML_Heading will put out an HTML heading (such as ! Title ).
  • Procedure HTML_Encode will encode characters special to HTML so that they'll pass through unharmed. When generating HTML, always pass variable data through HTML_Encode if it isn't already HTML formatted. If you don't do this, data with the characters &, <, >, and " may become garbled (since these characters have special meaning in HTML).
  • Procedure Put_HTML_Tail will put out an HTML tail, (</BODY></HTML>).
  • Procedure Put_Error_Message will put an error message (including an HTML_Head, an HTML_Heading, and an HTML_Tail). Call "Put_CGI_Header" before calling this.

Function Get_Environment simply calls the underlying operating system and requests the value of the given operating system variable; this may be useful for acquiring less-often-used CGI values. If the variable does not exist, function Get_Environment replies with a null string ("").

Cookies

Cookies are supported by this package. A "cookie" is simply a value sent by a web server to a web browser; from then on, the web browser will respond with that value when reconnecting with that server. You should be aware that cookies can be used to reduce user anonymity, so some users intentionally disable cookies (see the references below for more about cookie controversies). Also, cookie data is intended to be small; a web user might not store more than 20 cookies per server, cookies larger than 4K, or 300 cookies total. If you need more, just store an ID with the cookie and store the rest of the data on the server.

To set a cookie's value in a remote browser, call Set_Cookie with the appropriate parameters. Note that Set_Cookie must be called before Put_CGI_Header. The expires attribute specifies when the cookie will no longer be stored; the domain attribute determines which host names will cause the cookie to be sent (at least two domain levels); the path attribute specifies the subset of URLs in a domain where the cookie will be returned; and if the cookie is marked secure, the cookie will only be sent over encrypted channels (e.g., SSL, using https://).

Cookie values are automatically loaded by this package on initialization. You can retrieve their values by calling Cookie_Value. You can retrieve cookies by their key value or by simple index. Just like CGI form fields, a key can occur multiple times. Information other than the key and value (such as expiration time, domain, path, and secure setting) is not available through AdaCL.CGI, because this information is not sent in the underlying protocol from the user to the web server.

You can send cookie values to your program without using a web server by setting the environment variable HTTP_COOKIE. This has the format "key=value", separated by a semicolon, using URL escapes. The distribution includes a sample program, cookie_test, that prints the "first" cookie and the first value of the cookie named "problem". Here's how to try it out on a Unix-like machine using an sh-like command shell (e.g., a typical Linux system):

 HTTP_COOKIE="first_cookie=first_value;problem=my%20problem"
 export HTTP_COOKIE§§
 ./test_cookie§§

Going Further

Many CGI applications display a number of forms, data, and so on. For larger applications, I suggest drawing a sort of "state diagram" showing the different displays as the nodes and showing the expected transitions between displayes. For each display, identify what information is needed for it; make sure that all the ways to reach that display will provide that information. In many cases I find it useful to have some CGI variable indicate the form desired (say "command"). Remember that HTTP is essentially stateless; if you need some data later, you'll need to send it back to the user to store, or at least store some sort of identifier so that you can determine which user's data to use. Also, users can hop directly into any point by bookmarking things or just writing their own URLs, so don't depend on users only going through an §§<FONT SIZE=2><FONT FACE="bitstream vera serif">?</FONT></FONT>§§§§expected path.§§<FONT SIZE=2><FONT FACE="bitstream vera serif">?</FONT></FONT>§§§§

Limitations

This package has the following known limitations:

  1. It only interfaces using the standard CGI interface. It doesn't support FastCGI or other interfaces.
  2. It doesn't support generation of forms with widgets automatically set to existing values. This is easily solved by creating a higher-level package that uses this package as an interface. That way, users can access capabilities more directly or not, their choice.
  • The way it handles String and Unbounded_String is at times awkward; perhaps requiring users to use the "+" convention or explicit type changes would be better.

Security

As with all CGI programs, there are security ramifications. In general, ALWAYS check any values sent to you, and be conservative: identify the list of acceptable values, and reject anything that doesn't meet that list. Thus for strings, identify the legal characters and maximum length you'll accept, and reject anything that doesn't meet those requirements. Don't do the reverse and identify "characters you'll prohibit," because you'll probably forget an important case. You may need to escape shell characters. For numbers, identify minimum and maximum values. Be very cautious about filenames; beware of filenames with ".." or "/" in them (it's best not to accept them at all, if you can).

Since the user may not be the actual source for some variables and/or data, when generating HTML always send variable data through HTML_Encode first. That way, the presence of the special reserved characters "&", "<", ">", or """ won't cause problems for the user's browser.

It's worth noting that Ada <FONT SIZE=2>(and AdaCL.CGI)</FONT> can easily handle the "NIL" character <FONT SIZE=2>(ASCII 0, represented as %00 in URLs)</FONT>. However, many system functions called by an Ada program assume that NIL is the end of a string, so calling system functions with such values may cause surprises. This isn't really unique to Ada; Perl can also handle NIL and has the same issues.

If you don't already know them, examine the extant literature on the subject of CGI security. Useful resources about CGI security include Gundavaram's Perl CGI FAQ (particularly the security information), Kim's CGI book, Phillips' safe CGI material, Stein's WWW Security FAQ, and Webber's web security tips. You might find my document "Secure Programming for Linux HOWTO" useful; I include a number of CGI-relevant tips. You could also search altavista for "CGI" and "security": http://www.altavista.com/cgi-bin/query?q=%2BCGI+%2Bsecurity.§§

Advantages and Disadvantages: CGI, Ada, AdaCL.CGI

No tool is perfect, or appropriate for all circumstances. Here are some advantages and disadvantages of CGI and Ada for web applications; look at these and other information to determine if they're a good match for your application.

CGI is the standard interface between a web server and a web application, in the same way that HTTP is the standard interface between a web client (browser) and a web server. When using CGI there are 3 active components: the web client, the web server, and the web application (you're writing the web application). The client sends a request to the web server, the web server starts up the web application, the web server sends the request data to the web application using the CGI interface, the web application replies with data using the CGI interface, and the web server sends that data on to the web client. The web application then exits; the next client request will start a separate copy of the web application. If there are simultaneously clients making requests, then the web application will be executed more than once simultaneously (each web application serves exactly one request).

First, let's cover alternatives to CGI for interfacing to the web, listing their advantages and disadvantages compared to CGI:

  1. FastCGI: This is an alternative interface that, instead of stopping and restarting an application on each request, keeps the web application alive and sets up a permanent communication path between web server and web application.
    1. Advantages: FastCGI has better performance than CGI on start-up (since start-up time is eliminated). This §§<FONT SIZE=2><FONT FACE="bitstream vera serif">?</FONT></FONT>§§§§better performance§§<FONT SIZE=2>?/FONT>§§§§isn't as big astrength as you might think:§§
      1. Modern systems can start new processes rather quickly, so optimizing for process start-up isn't as helpful.
      2. Interpreters and just-in-time compilers do have an additional start-up time, but again, this is not as helpful as you'd think. Most such systems can cache previous analyses, eliminating a lot of this time.
      3. Fully compiled languages (such as typical C, C++, and Ada implementations) essentially eliminate start-up costs imposed by a language interpreter. They don't need to start-up their interpreter, they just have a tiny run-time-library startup overhead.
    2. Another advantage of FastCGI is that inter-application locking is potentially eliminated.
  2. Disadvantages: FastCGI requires more work to develop the application (because you must correctly reset all state), the result is likely to be less robust (because the application must survive many requests), and when using FastCGI is slightly more difficult to take advantage of potential concurrency (because you have to explicitly multi-thread your server).
  3. Server-specific (proprietary) APIs
    1. Advantages: These APIs can potentially provide even better performance by eliminating startup and inter-process communication costs.
    2. Disadvantages: These APIs lock you into a particular web server, and your application is likely to be even less robust (because an error may take down the entire web server).
  4. Specially-implemented HTTP server. This is lots of work.
  5. Separate protocol. This is lots of work, and now you need to distribute a client.
  6. Web application servers. These are intended for §§<FONT SIZE=2><FONT FACE="bitstream vera serif">?</FONT></FONT>§§§§big jobs§§<FONT SIZE=2><FONT FACE="bitstream vera serif">? </FONT></FONT>§§§§- large-scale dynamic web systems with a vast number of pages driven by many databases and programs. For smaller jobs, they're often too much. If you need something like this, take a look at Zope.§§
  7. HTML-embedded scripting language. For smaller jobs, or jobs where you have mostly static information with small snippets of dynamic information being inserted into it, a hypertext preprocessor is useful. This enables you to stick commands into an HTML document, and have the web server run those commands. A common one is PHP. If you've used Microsoft's proprietary ASP language, you can switch to PHP using asp2php. You can use the hypertext processor commands to run an Ada program.§§

For lots of people, CGI is the way to go to implement web applications. So, assuming that you've evaluated your options and decided to use CGI, let's move on.

The next question is, why use Ada? Well, here are some advantages of using Ada for web applications:

  1. Excellent Run-Time Performance: Ada is typically compiled to machine code, so its performance is generally much better than interpreters. Ada typically runs at the same speed as C and C++; sometimes faster (since it has more information) and sometimes slower (since by default it performs lots of safety checking not built into C/C++). Thus, it's typically much faster than Perl, and faster than Java if Ada is not compiled to a typical Java Virtual Machine (JVM). Note that CGI is itself a low performance interface, so this is primarily relevant only for compute-bound processes (such as graphics generation, mathematical processes such as some cryptography applications, etc.).
  2. Excellent compile-time checking: Ada is well-known for its tight compile-time checks, which tries to eliminate many errors before the first execution. The theory here is that it's cheaper to let a machine find problems than make a human find them.
  3. Highly readable: Ada is designed to be easy to read; this is especially obvious when comparing it to Perl, but many C and C++ programs belong in an obfuscated code contest. Inscrutable code and clear code can be written in any language, but it's less work to make things clear in Ada.
  4. Increased security over C/C++: Ada by default does bounds checking on all arrays (and has the information necessary to do this quickly); C and C++ have to simulate this or use special libraries which are easily bypassed.
  5. Prefer Ada: you may prefer Ada for other reasons. Ada's the only widely-used language that combines the ability to compile to machine code, platform-independent threading, enumerations, generics, buffer overflow protection, and object-orientation in a single internationally-standardized language. Java omits enumerations, generics, and compilation to machine code (some JVMs try to get close, but there's a limiting necessary overhead); C++ omits platform-independent threading and buffer overflow protection. C lacks the C++ capabilities, and it also lacks object-orientation and generics. Whether or not having all of these capabilities simultaneously available is important depends on your application, of course.
  6. Have existing Ada applications. If you have an existing Ada application, and you want to move it to becoming a web application, using AdaCL.CGI is a natural approach.

Sounds like you should always use Ada, right?

Nonsense - no engineering decision is ever that simple. Here are some weaknesses of Ada for building web applications:

  1. Wordiness. Ada is wordy, which is especially inconvenient for short scripts. As programs get large, I find this isn't as big a disadvantage; the wordiness is for readability and modularity, which is more important for larger programs. You may disagree.
  2. Less convenient string handling. Ada originally came with type String, which stays at a fixed length once initialized and is inconvenient for many situations. The Ada type Unbounded_String automatically handles resizing, etc., but it is used essentially always through functions and procedures. The lack of syntactic sugar for Unbounded_String makes simple operations especially wordy and can obscure what's happening. This is a problem for web scripts, because string handling often takes up a great deal of the program.
  3. No built-in regular expression system. You could use GNAT's regular expression library, which is portable to other Ada compilers, but it's not as rich as Perl's library (for example) nor can Ada compilers optimize it the way Perl can optimize regular expressions.
  4. No built-in garbage collector necessarily built in. Ada doesn't guarantee that a garbage collector will be available, and in fact most Ada implementations don't include one. For CGI, this is almost never an issue, since the CGI program will exit after servicing one request anyway (and clean up then).
  5. Fewer web-centric libraries and/or need to reuse code in another language. If there's an existing library in another language that you need, that might be an excellent reason to use that language instead. You could use external language bindings; C and Fortran bindings in particular are built into the specification of Ada. If you have to interface to a lot of such modules, though, you might be much better off using that other language instead.
  6. Compilation time delays deployment. Scripts can simply be edited and used, while for Ada a compilation step is required before use. Since compilations are quite fast nowadays, this is not a real disadvantage.
  7. Complications from installing a dynamically linked library or static code. Since Ada is usually implemented as a compiled language, it generally requires a dynamically linked library to run. That means you'll have to get that library installed (or made available to your scripts). Alternatively, you could just generate statically linked programs, but that makes the individual programs larger. If you control the web server, this is a non-issue, but it might be minor issue if you really want the run-time library installed in the "global" location of your system.

At one time, Ada compilers were extremely costly ($20,000 or more), which was a serious disadvantage. Nowadays, there's an open-source no-cost high-quality implementation (GNAT) and several other inexpensive implementations, so that's no longer a relevant disadvantage.

As always, base your decision based on the engineering trade-off.

Finally, even if you're using CGI and Ada, you needn't use this library. See the resource section for un-CGI and WebAda CGI. However, neither of those support cookies. For specific weaknesses, WebAda CGI has buggy encoders, and un-CGI is both slower than AdaCL.CGI and introduces data ambiguity when handling data with multiple keys. In short, I believe that if you're using Ada and CGI, AdaCL.CGI is the library you want to use. If it isn't, please let me know why so that it can be that way again :-).

Related Information Sources


Ada programming, © 2005,2006 the Authors, Content is available under GNU Free Documentation License.