Topic: AUTOMATED TESTING -- WHY AND HOW

Message

   AUTOMATED TESTING -- WHY AND HOW

Everyone  knows  how  important  testing is, and,  with luck, everyone
actually does test the software that they release. But do they really?
Can  they?  Even  a  simple program often  has many different possible
behaviors,  some of which only take  place in rather unusual (and hard
to  duplicate)  circumstances.  Even  if  every possible  behavior was
tested  when  the program was first released  to the users, what about
the  second release, or even a "minor" modification? The feature being
modified  will probably be re-tested,  but what about other, seemingly
unrelated,  features  that  may have been  inadvertently broken by the
modification?  Will  every unusual test case  from the first release's
testing  be  remembered,  much  less  retried,  for  the  new release,
especially  if  retrying  the test would require  a lot of preliminary
work (e.g. adding appropriate test records to the database)?

This  problem  arose for us several years  ago, when we found that our
software  was  getting  so complicated that  testing everything before
release  was  a  real  chore, and a good many  bugs (some of them very
obvious)  were getting out into the field. What's more, I found that I
was  actually  afraid  to add new features,  concerned that they might
break  the rest of the software. It  was this last problem that really
drove  home to me the importance of  making it possible to quickly and
easily test all the features of all our products.

AUTOMATED TESTING

The  principle of automated testing is  that there is a program (which
could  be a job stream) that runs the program being tested, feeding it
the  proper input, and checking the output against the output that was
expected.  Once  the  test suite is written,  no human intervention is
needed,  either to run the program or to look to see if it worked; the
test  suite  does  all  that,  and somehow indicates  (say, by a :TELL
message  and  a  results  file)  whether  the program's  output was as
expected.  We, for instance, have over two hundred test suites, all of
which  can  be  run  overnight by executing  one job stream submission
command;  after  they run, another command  can show which test suites
succeeded and which failed.

These test suites can help in many ways:

   * As discussed above, the test suites should always be run before a
     new  version is released, no matter how trivial the modifications
     to the program.

   *   If   the   software   is  internally  different  for  different
     environments  (e.g.  MPE/V vs. MPE/XL), but  should have the same
     external  behavior,  the  test  suites  should  be  run  on  both
     environments.

   *  As you're making serious changes to the software, you might want
     to  run  the test suites even before  the release, since they can
     tell you what still needs to be fixed.

   *  If you have the discipline to --  believe it or not -- write the
     test  suite before you've written your  program, you can even use
     the test suite to do the initial testing of your code. After all,
     you'd  have to initially test the  code anyway; you might as well
     use  your  test suites to do that  initial testing as well as all
     subsequent tests.

Note  also  that the test suites not only  run the program, but set up
the  proper environment for the program;  this might mean filling up a
test database, building necessary files, etc.

WRITING TEST SUITES

Let's  switch  for  a moment to a  concrete example -- a date-handling
package,  something that, unfortunately, many people have had to write
on  their  own,  from  scratch.  Say that one of  the routines in your
package  is DATEADD, which adds a given  number of days to a date, and
returns  the new date. Here's the code that you might write to test it
(the dates are represented as YYYYMMDD 32-bit integers):

   IF DATEADD (19901031, 7) <> 19901107 THEN
     BEGIN
     WRITELN ('Error: DATEADD (19901031, 7) <> 19901107');
     GOT_ERROR:=TRUE;
     END;
   IF DATEADD (19901220, 20) <> 19910109 THEN
     BEGIN
     WRITELN ('Error: DATEADD (19901220, 20) <> 19910109');
     GOT_ERROR:=TRUE;
     END;
   ...

As you see, the code calls DATEADD several times, and each time checks
the result against the expected result; if the result is incorrect, it
prints  an  error  message  and sets GOT_ERROR to  TRUE. After all the
tests  are done, the program can check if GOT_ERROR is TRUE, and if it
is, say, build a special "got error" file, or write an error record to
some  special  log  record.  This  way,  the test suites  can be truly
automatic -- you can run many test suites in the background, and after
they're done, find out if all went well by just checking one file, not
looking through many large spool files for error messages.

The  first thing that you might notice  is that the DATEADD test suite
can  easily grow to be much  larger than the DATEADD procedure itself!
No  doubt  about  it  --  writing  test  suites  is  a  very expensive
proposition.   Our  test  suites  for  MPEX/3000,  SECURITY/3000,  and
VEAUDIT/3000  take  up  almost  30,000 lines,  not counting supporting
files  and  supporting  code in the actual  programs; the total source
code of our products is less than 100,000 lines. Often, writing a test
suite  for  a  feature  takes  as  long or almost  as long as actually
implementing the feature. Sometimes, instead of being reluctant to add
a  new  feature for fear of breaking  something, I am now reluctant to
add  a new feature because I don't want to bother writing a test suite
for it.

Fortunately,  the  often  dramatic  costs  of writing  test suites are
recouped  not just by the decrease in  the number of bugs, but also by
the  fact that test suites, once written,  save a lot of testing time.
It's much easier for someone to run an already-written test suite than
to execute by hand even a fraction of the tests included in the suite,
especially if they require complicated set-up. Since a typical program
will actually have to be tested several times before it finally works,
the  costs of writing a test suite  (assuming that it's written at the
same  time  as  the code, or even earlier)  can be recouped before the
program is ever released.

Also,  test suites tend to have longer  lives than code. A program can
be dramatically changed -- even re-written in another language -- and,
assuming  that it was intended to behave  the same as before, the test
suite will work every bit as well. Once the substantial up-front costs
of  writing  test  suites  have  been  paid, the pay-offs  can be very
substantial.

But  even  though we should be willing  to invest time and effort into
writing test suites, there's no reason to invest more than we have to.
In  fact,  precisely  because test suites at  first glance seem like a
luxury, and people are thus not very willing to work on them, creating
test  suites  should  be  as easy as possible. What  can we do to make
writing test suites simpler and more efficient?

One  goal that I try to shoot for is to make it as easy as possible to
add  new test cases, even if this  means doing some additional work up
front.  I try to make every new test  case, if possible, to fit on one
line.   The  reason  is  quite  simple:  I  want  to  have  as  little
disincentive  as  possible  to add new test  cases. A really fine test
suite  would  have  tests for many  different situations, including as
many obscure boundary conditions and exceptions as possible; also, any
time a new bug is found, a test should be added to the test suite that
would  have  caught  the  bug,  just  in  case the  bug re-surfaces (a
remarkably  frequent  event).  If  we  grit  our teeth  and write some
convenient  testing  tools  up  front,  we can make  it much easier to
create a full test suite.

Here, for instance is one example:

   PROCEDURE TESTDATEADD (DATE, NUMDAYS, EXPECTEDRESULT: INTEGER);
   BEGIN
   IF DATEADD (DATE, NUMDAYS) <> EXPECTEDRESULT THEN
     BEGIN
     WRITELN ('Error: DATEADD (', DATE, ', ', NUMDAYS, ') <> ',
              EXPECTEDRESULT);
     GOT_ERROR:=TRUE;
     END;
   END;
   ...
   TESTDATEADD (19901031, 10, 19901110);
   TESTDATEADD (19901220, 20, 19910109);
   TESTDATEADD (19920301, -2, 19920228);
   ...

By  this  model,  each  procedure  that  you  test  would have  a test
procedure  like  this  one written for it; then  the main body of your
test program would just be calls to these test procedures.

This  is  especially  useful for procedures  that require some special
processing before or after being called; for instance, they might have
reference parameters that need to be put into variables before they're
passed,   record   structure   parameters   to   be  filled,  multiple
by-reference  output  parameters that all need  to be compared against
expected values, and so on.

You  can make up other, even  more general-purpose testing tools, such
as the following procedure:

   PROCEDURE MUSTBE (TAG: stringtype; RESULT, EXPECTEDRESULT: INTEGER);
   BEGIN
   IF RESULT<>EXPECTEDRESULT THEN
     BEGIN
     WRITELN ('Error: ', TAG, ': ', RESULT, ' <> ', EXPECTEDRESULT);
     (* error handling code *)
     END;
   END;

This  procedure  can be used to check  the result of any function that
returns an integer value, e.g.

   MUSTBE ('DATEADD #1', DATEADD (19901031, 10), 19901110);
   MUSTBE ('DATEADD #2', DATEADD (19901220, 20), 19910109);
   MUSTBE ('DATEADD #3', DATEADD (19920301, -2), 19920228);

Other,  similar,  procedures  might be written  to help test functions
that return other types (REALs, STRINGs, etc.). On the other hand, for
functions  that can't easily be called  in one statement (because they
take  by-reference or specially-formatted  parameters), you might want
to consider writing a special test procedure.

Finally,  one other alternative (which I personally prefer) is writing
a  special  "shell"  program  that  asks  for  a  procedure  name, its
parameters,  and the expected result,  calls the procedure, and checks
the result:

   PROGRAM TESTSHELL ...
   ...
   READLN (PROCNAME, P1, P2, EXPECTEDRESULT);
   WHILE PROCNAME<>'EXIT' DO
     BEGIN
     IF PROCNAME='DATEADD' THEN RESULT:=DATEADD (P1, P2)
     ELSE IF PROCNAME='DATEDIFF' THEN RESULT:=DATEDIFF (P1, P2)
     ELSE IF PROCNAME='DATEYEAR' THEN RESULT:=DATEYEAR (P1)
     ...
     IF RESULT <> EXPECTEDRESULT THEN
       ... output error ...
     READLN (PROCNAME, P1, P2, EXPECTEDRESULT);
     END;
   ...

This  way, your actual test suite could  be a job stream, to which you
can  add  as many test cases as you like  -- one line per test case --
without having to recompile anything:

   !JOB TESTDATE, ...
   !RUN TESTSHEL
   DATEADD 19901031 10 19901110
   DATEADD 19920228 2  19920301
   DATEYEAR 19920228 0 1992
   ...
   !EOJ

Whenever  you  make  a  change to your procedures,  you just rerun the
TESTDATE  job,  and  you'll  either  find  some bugs  or be reasonably
confident  (though, of course, never 100% confident) that the software
works.

TESTING PROGRAMS THAT DO I/O

It's  rather  easy  to  test  a  procedure  whose only  inputs are its
parameters and whose only output is its result (or even a by-reference
parameter). The more places a program derives its input from, or sends
its output to, the harder it becomes to test.

Let's  take a simple I/O program, one which reads a file, reformats it
in some way, and writes the result to another file. Obviously, to test
it, we should fill up the input file, run the program, and compare the
output  file against the expected output  file. As we discussed in the
previous  section, it would be nice if  we could build a program -- it
might  be a 3GL or 4GL program, or even an MPE or MPEX command file --
that  takes as parameters the input data and the expected output data,
so that we can easily add new test cases.

A first try on this might be a job stream like the following:

   :PURGE TESTIN
   :FCOPY FROM;TO=TESTIN;NEW
   LINE ONE
   LINE TWO
   LINE THREE

   :FILE MYPROGI=TESTIN
   :PURGE TESTOUT
   :FILE MYPROGO=TESTOUT
   :RUN MYPROG

   :PURGE TESTCOMP
   :FCOPY FROM;TO=TESTCOMP;NEW
   PROCESSED LINE A
   PROCESSED LINE B

   :SETJCW JCW=0
   :CONTINUE
   :FCOPY FROM=TESTCOMP;TO=TESTOUT;COMPARE=1
   :IF JCW<>0 THEN
   :  handle error
   :ENDIF

or, if the commands are put into a separate command file or UDC,

   :TESTCMDS
   LINE ONE
   LINE TWO
   LINE THREE
   :EOD
   PROCESSED LINE A
   PROCESSED LINE B
   :EOD

(the  data  would  go  as input to the  :FCOPY commands in the command
file).

Note  how the :FILE equations come  in handy to redirect the program's
input and output files. Not only does this avoid the need to overwrite
the  production  input and output files, but  it makes it possible for
several  test  suites  which test programs that  normally use the same
files  (e.g.  this  program,  the program that  created this program's
input  file, and the one that reads  this one's output file) to run at
once.  If  for  some reason your programs  don't allow :FILE equations
(e.g.  they issue their own :FILE  equations to refer to these files),
try  to  change  them  so they do, or at  least so they have a special
"test"  mode that will read :FILE-equatable  files. Note also that the
job  stream  regenerates the input and  comparison files every time it
runs.  I recommend this, since then each job stream would be a more or
less  self-contained  unit (if it uses a  special command file that no
other  test job uses, I suggest that  you build even this command file
inside  the  test job). It is easier to  move or maintain, and is less
likely to suffer from "software rot" (a condition that causes software
that's  been  left  on  the  shelf  too long to  stop working, largely
because some outside things that it depends on have changed).

Back  to our example. One problem with  it is that :FCOPY ;COMPARE= is
rather  finicky  about the files it's  comparing -- for instance, they
must  both  have  exactly the same record  size. TESTCOMP, built by an
:FCOPY  FROM;TO=TESTCOMP  would normally have the  same record size as
the  job  input  device,  so  you might need a  :FILE equation to work
around this.

A more serious problem is that :FCOPY FROM;TO= can only be easily used
for  creating  files  that  contain  ASCII  data. What if  some of the
columns of the file need to contain binary data?

Here is where I think you ought to grit your teeth and write a special
program  (unless, of course, you have a 4GL that can do this for you).
Yes, I know that it seems like a pain to write code that will never be
run  in  production,  but is only needed to  test other code, but this
rather  simple program could, if designed  right, prove to be a highly
reusable building block.

The  program would first prompt for some  sort of "layout" of the file
--  a  list of the starting column  numbers, lengths, and datatypes of
each  field in the file. Then, it  would prompt for each record in the
file,  specified  as  a list of fields,  separated by, say, commas; it
would  format  the fields into the file  record, and write it into the
file. Thus, you'd say:

   :RUN BLDFILE
   S,1,8, I,9,2, S,11,10, P,21,8  << string, integer, string, packed >>
   SMITH, 100, XYZZY, 1234567
   JONES, 55, PLUGH, 554927
   ...

Once  you write this program, incidentally, you might find that it has
other  uses,  say,  to  do manual testing of  your program once you've
already  found that it has a bug and are trying to isolate it. And, of
course,  if you make it general enough,  it should be usable in all of
your test suites.

Also,  your  input file had to have  been created by some program, and
your  output file must be intended as  to input to some other program;
there's  nothing  that  says that you can't  run those programs in the
test  job  to  create  the input file from  data you've input and then
format the output file into readable text. The problems come in if the
other programs are too hard to run in batch (e.g. require block mode),
or  if you'd like to be able  to test each program separately from the
others,  perhaps  because  you want to see  how your program reacts to
illegal data in its input file, data that shouldn't normally appear in
the input generated by the other program.

What  if your programs reads and writes  an IMAGE database? This is in
some  ways  simpler to test and in  other ways more difficult. You can
use  QUERY  to  fill the input sets and  create output (using >LIST or
>REPORT) will be usable by :FCOPY ;COMPARE=. Be sure, though, that you
sort any master sets that you dump using >REPORT -- since the order of
the  entries in the master set depends on the hashing algorithm, which
depends on the capacity, unsorted output will make the test suite find
an "error" every time you change the capacity.

However,  the  setup  of  the IMAGE database might  also be a bit more
cumbersome,  largely since you probably want  to have your own special
test  database  built  by the job (for  the reasons discussed above --
independence  from the production data,  from other test suites' data,
and  self-containedness). You might want to create a simple program or
command  file that takes a schema  file, lowers the dataset capacities
on  it,  runs  DBSCHEMA, and then does  a DBUTIL,CREATE -- you'll find
that a lot of your test suites can use it.

ADJUSTING FOR ENVIRONMENT INFORMATION

Our   pass-input-and-compare-output-against-expected-result   strategy
works just fine if the same input is always supposed to yield the same
output, but what if the output can vary? The most common variables are
based   on  current  date  and  time  --  reports  that  contain  this
information   in   headers,   output   files   that  have  each  value
date-stamped,   a   date-handling   procedure   that  returns  today's
day-of-week,  and so on. Another related problem is with programs that
check  whether they're being run online  or in batch, and do different
things  in these cases -- how can your batch test suite make sure that
the online features work properly?

What  we really have here is a different sort of input, input not from
a  file or a database, but from the system clock or the WHO intrinsic.
There  are a few ways of handling  this; for example, instead of doing
an FCOPY ;COMPARE=, which demands exact matches, you can have your own
comparison program that lets you specify that some particular field --
e.g.,  the  date  on  a  report header -- will  not get compared. Even
better,  your comparison program can let you specify that a particular
field  should  be  equal  to,  say,  the current year,  month, or day,
calculated at the time the comparison program runs.

However,  more  flexible  still  --  and  necessary  for  things  like
pretending  you're  online  rather  than  in  batch -- you  can try to
redirect  this  "input"  from the environment,  just as you redirected
input from files and databases using :FILE equations.

Now how are you going to do this redirection? Believe it or not, after
having  the  gall to ask you to write  test suites that are as long as
your  source  code,  I'm  suggesting that you  change your programs to
accommodate  testing  requirements.  Instead  of  calling  CALENDAR or
DATELINE,  for  instance  -- or using  whatever language construct may
give you this information -- you might write your own procedure:

   FUNCTION MYCALENDAR: SHORTINT;
   BEGIN
   get the value of the "PRETENDCALENDAR" jcw;
   IF the value is 0 THEN
     MYCALENDAR:=CALENDAR
   ELSE
     MYCALENDAR:=value of jcw;
   END;

This  way,  your program would normally get  the current date from the
CALENDAR intrinsic, but when the PRETENDCALENDAR JCW is non-zero, will
use  that value instead. You might, for efficiency's sake, want to get
the  JCW value only once, and then save it somewhere; for ease of use,
you   might  want  to  look  at  the  PRETENDYEAR,  PRETENDMONTH,  and
PRETENDDAY  JCWs,  and  assemble  the CALENDAR-format  value from them
(possibly using the date-handling package that we so thoroughly tested
a few pages ago).

A  similar procedure might be written to determine whether the program
is  running  online or in batch --  it'll check the PRETENDONLINE JCW,
and  if it doesn't exist, or set  to some default value, will call the
WHO  intrinsic. If your program does different things depending on the
user's  capabilities  or  logon  id,  you  might want  to have similar
procedures  for  them, too (wrapped around  the WHO intrinsic call) --
although  it's  possible  for  your test suite  to actually be several
jobs,  each  of  which logs on under  a different user, with different
capabilities,  it  may be more convenient for  you if one job can make
itself  look  like each one of these users  in turn. In fact, it might
even  be convenient for your own  manual debugging (say, when you want
to  duplicate the program's behavior as a  particular user id, or on a
particular  date,  but  don't  want  to re-logon or  change the system
clock).

Of  course, the drawback to this  approach is that you're not actually
testing  the  program  as it really behaves,  but rather as it behaves
with  the testing flag set; the  code you're executing in testing mode
is  somewhat different than is normally executed, and if, say, there's
a  bug  in  the  CALENDAR call or the WHO  call, your test suite won't
catch  it, since in testing mode  the intrinsics aren't called at all.
Unfortunately, this seems to be a necessary evil; the only solution is
to  minimize the amount of code  whose execution depends on whether or
not you're in testing mode.

One  thing  that you might do -- if you  want to be really fancy -- is
create  a  library  of  procedures called CALENDAR,  CLOCK, WHO, etc.,
which  would,  depending  on  some testing flag,  either call the real
CALENDAR,  CLOCK, or WHO, or return "pretend" values; you can then put
these  procedures  into an RL, SL, or XL,  and not have to change your
source  file. Once you debug your  library procedures, you should have
more confidence that your testing in test mode actually simulates what
the  program will really behave like in production. One thing that you
may have to do, however, is intercept not just the intrinsics that you
call  directly,  but  also  whatever  procedures  might  be  called by
language  constructs  (like  COBOL's  facility  for  returning today's
date).

WHAT TEST CASES SHOULD YOU USE?

So  far, we've talked a lot about how to write tools that make it easy
for you to add test cases to your test suites, but not much about what
your  test  cases  should  be.  Say  that  you're  testing  a  DATEADD
procedure, one that returns a date that is X days after date Y. (Let's
assume  that X could be negative -- X = -5 means a date that is 5 days
before  date  Y.)  What  test  cases should you  use? Think about this
before reading the answers!

Well, it seems to me that there are quite a few:

  * Add days so that it stays in the same month (e.g. 1990/05/10+7).

  * Add days so that it changes months (e.g. 1990/05/10+30).

  * Add days so that it changes years (e.g. 1990/05/10+300).

  *  Add days so that it changes months or years over February 28th in
    a non-leap year (e.g. 1990/02/10+30).

  *  Add days so that it changes months or years over February 29th in
    a non-leap year (e.g. 1992/02/10+30).

  *  Handle years that are divisible by  100 but not by 400 (like 1900
    or 2100), which are not leap years (did you know this?).

  * Add 0 days.

  * Add days so that it goes outside of your accepted date range (e.g.
    beyond 1999, or whatever other date is your limit).

  * Add to an invalid date -- one with an invalid year, month, or day.

  * All the above, but with subtracting days.

Wow!  That's  a lot of work. But, you'll  have to admit, all the above
are  things  that  you  really  should  test  for (unless  they're not
relevant  to  your particular interpretation, e.g.  if your date range
doesn't  extend to 1900 or 2100,  or if you've consciously decided not
to check for certain error conditions), manually if not automatically;
it's  especially important to test  for "boundary conditions" (did you
know  that, in the DATELINE intrinsic, the next day after DEC 31, 1999
is  JAN 1, 19:0?), for cases  that require special handling (like leap
year), and for proper handling of errors.

These  are  the obvious tests -- tests  for bugs that you expect might
happen.  As  other bugs come up, however,  you ought to add test cases
that  would have caught these bugs:  firstly, you'll have to test your
fix  anyway,  and if you add the  test case before implementing it, it
won't  cost  you anything extra; secondly, the  same bug (or a similar
one) may come up later, but this time will get caught.

Still,  there's  no  need  to get extreme  about things; shortcuts are
still  possible.  Say, for instance, that  DATEADD works by converting
the  date  into  a  "century  date"  format  (number  of days  since a
particular  base date), adding the number of days, and then converting
back  into a year/month/day format -- if  you're sure that this is all
it  does, you might just have one test case (preferably one that seems
to  exercise  as  much of the internal logic  as possible, such as one
that  changes  months  and  years).  Of  course,  you'd still  have to
properly test the date conversion routines.

In  general,  what  you  test  should  depend on how  your code works.
Whenever  you  know  your  code  treats  two different  types of input
differently,  you  should  test both. If you're  fairly certain that a
single  test will test many features, you  can just use that one test;
if,  for  instance,  you know that testing  one module or routine will
also  adequately  test  the module or routines  that it calls, you can
make  do  with just testing the top-level  one. However, try to resist
this  temptation;  firstly,  the  top-level  module  probably  doesn't
exercise all the functions of the bottom-level one, and secondly, it's
very  convenient  to have a test suite  for the bottom-level module --
that  way, if you're making substantial changes to your system and you
know   the  top-level  module  is  broken,  you  can  still  test  the
bottom-level one independently.

Finally,  an  obvious  point, but one that  it often neglected -- it's
better  to test a little than not at all. If you find something that's
hard  to test in all possible ways, test it in at least a few; if, for
instance,  its results are hard to automatically verify, at least make
sure  that they're in the right  format, or even that they're returned
at  all (i.e. that the program  doesn't just abort). There's really no
90-10  rule  in  testing -- 10% of the  effort won't get you much more
than 10% of the benefit -- but you can at least avoid some of the more
obvious  (and  more  embarrassing) bugs. Then,  once the groundwork is
laid,  you might try to get back to it periodically, adding a new test
case here or there. Don't let perfectionism get in the way of doing at
least something.

VERIFYING DATA STRUCTURES

Most  sufficiently  complicated data structures  -- anything from your
data stored in an IMAGE database to your own linked lists, hash files,
or  B-trees,  if  you  write  such  things  yourself --  have internal
consistency  requirements.  Certain fields in  your databases may only
contain  particular  values;  other  fields  must  have  corresponding
records  in  other  datasets  or  in other databases.  If any of these
internal consistency requirements are not met, you know you have a bug
somewhere.

You can get a lot of benefit out of writing a verification routine for
each such data structure that checks it for internal consistency. This
is  somewhat different from the test suites we discussed before, which
check  for  the validity of the ultimate  results, but it can still be
very  useful,  since internal inconsistency  must have, by definition,
been  caused  by a program error, and  is likely to eventually lead to
incorrect  results (incorrect results that  your test suites might not
otherwise check for).

You  should  call  this  verification routine at the  end of each test
suite to verify the consistency of the structures (again, usually data
in  the database) that the program  being tested built; you might even
run  it after each step in the test suite, to isolate exactly where an
error  might be sneaking in. You may also want to run the verification
routine  against  your production database every  night, to check your
programs  as they run in the real  world, not just the controlled test
environment;  and, you can run it  whenever you suspect that something
may be wrong (either in testing or in production), to figure out if an
internal inconsistency might be causing it.

The  verification routine shouldn't be hard to write; if you can do it
using  a 4GL or some other tool (like Robelle's fast SUPRTOOL -- speed
is  important, since you want to make it as quick and easy as possible
to verify your data), all the better. Simply put, check all the fields
for  which at least one possible value would be invalid, whether it is
because  it's not one of a list of allowable values for this field, or
because  it's  out  of  range, or because  it's inconsistent with some
other   values   in   this   record,  or  some  other  values  in  the
dataset/database. Possible checks include:

   * Flag fields may contain only certain allowable values.

   *  Numbers, like salaries or prices,  must be within certain ranges
     (e.g. non-negative, below a certain amount, etc.).

   * Dates must be valid (valid year, month, day).

   * Strings must at least not include non-alphanumeric characters.

   *  Some  fields  must  have  corresponding  entries in  a different
     dataset  or  database  (do a DBGET mode  7, for instance, to make
     sure they're there).

   *  Some  fields  are  calculated from other  fields, and must match
     (e.g.  a total price field in an  invoice that must be the sum of
     the price fields in the line items).

Not  only can this check for bugs in your programs, but can also check
for invalid production data that your programs might not have detected
(e.g.  garbage  characters  in string fields,  bad states, state codes
that  don't  match  phone numbers, etc.). And,  again, if written as a
QUERY  >XEQ file, or as a 4GL  program, it can be very easily created,
and used over and over again.

TESTING COMMAND-DRIVEN AND CHARACTER-MODE INTERACTIVE PROGRAMS

As  we discussed before, the key to successful automated testing is to
have  the  proper  tools  that  make  adding  test cases  easy. One in
particular  -- which takes some work to construct but can make writing
test suites much simpler -- is very much worth discussing.

This  test-bench lets you run another  program under its control, with
the son program's $STDIN and $STDLIST redirected to message files. The
test-bench  can let you specify input to be passed to the son program,
and  the  expected  output  that  the son program  should display; for
instance, a typical test suite might look like:

   :RUN TESTBENC

   RUN SONPROG          << command to start son process >>
   I CALC 10+20         << command for son to execute >>
   O 30                 << expected result >>
   DOIT                 << tells test-bench to compare the results >>
   I SQUARES 3          << command for son to execute >>
   O 1                  << expected result >>
   O 4                  << expected result >>
   O 9                  << expected result >>
   DOIT                 << tells test-bench to compare the results >>
   ...

What are the advantages of this approach?

   *  It  lets you specify the expected  output right after the input;
     this  makes  the  test  suite much easier to  write (and read and
     maintain)  than if you had to specify  all the input up front and
     all the expected output (from all the commands) at the end.

   *  It lets you compare the output and the expected output much more
     flexibly  than  a simple :FCOPY ;COMPARE=  would; you can specify
     special  commands that indicate, say,  that the output needn't be
     exactly  as you specified, but might include some variations here
     and there (e.g. date- or environment-dependent information).

   *  It tells you exactly which commands got errors, rather than just
     telling you an error was found.

Note,  however,  that all this test-bench does  is feed input into the
son  process'  $STDIN  and  read output from  its $STDLIST; what about
other  output,  say  output  to  files,  databases,  JCWs,  or  MPE/XL
variables, and input from the same places?

Fortunately,  output  to  one of those places  can easily be converted
into  output  to  $STDLIST  simply  by executing an  MPE command, like
:PRINT (or :FCOPY on MPE/V), :SHOWJCW, :RUN QUERY, etc. If our program
can  not only check the output of the  son process and feed input to a
son  process, but execute MPE commands and check their output and feed
them input, our problems will be solved.

Let's  say  your program is supposed to build  a file, and you want to
make  sure  that  the  file is built with  the right structure and the
right contents. Then, your test suite might look like this:

   :RUN TESTBENC

   RUN SONPROG          << command to start son process >>
   I BUILDFILE XXX      << command to test >>
   O File was built.    << expected output to $STDLIST >>
   DOIT
   MPE :LISTF XXX,2     << MPE command to execute >>
   O ...
   O XXX  123  32W ...  << expected :LISTF output >>
   O ...
   DOIT
   MPE :PRINT XXX       << MPE command to execute >>
   O ...                << expected :PRINT xxx output >>
   DOIT

How  does this program work? Well, as  we said before, it runs the son
program  with  $STDIN  and  $STDLIST redirected to  message files; "I"
commands write stuff to the input message file, "O" commands write the
expected  output  records  to  a  special  temporary file,  and "DOIT"
commands  read  the  output  message  file and compare  it against the
O-command temporary file.

One  problem is how "DOIT" will recognize that the son program is done
with  its  output  and  has  issued  another input prompt.  If the son
program  always  uses the same prompt (or  one of a few prompts), DOIT
can check for this; if, however, the son program's prompt isn't easily
distinguishable  from  normal output, you should  make the son program
print  a special line (e.g. "***INPUT***") before doing any input when
it's in "testing mode"; so long as this happens before any input, DOIT
can recognize these lines and realize that the output is done.

What  about  the MPE commands that can  be used to "convert" output to
files, databases, etc. into output to $STDLIST? When we test MPEX (the
test-bench  I'm describing is essentially the  one that we use to test
all  of our software), this is no problem, since we can just pass MPEX
an  MPE  command as input, and MPEX will  execute it. You might do the
same  yourself  -- make sure that  the program you're testing executes
MPE  commands -- or you can have  TESTBENC have two son processes, one
the  program being tested, and the other a simple program that prompts
for an MPE command and executes it. The only other problem that you'll
face  here  is executing :FCOPY or :RUN  commands on MPE/V (where they
can't  be done with the COMMAND intrinsic); however, if you're an MPEX
customer,  you can actually use MPEX as this MPE-command-executing son
process -- MPEX can execute :FCOPYs, :PRINTs, :RUNs, etc.

This  test-bench $STDIN-and-$STDLIST-redirection solution, it seems to
me,  would  work  quite  well  for  any command-driven  or interactive
character-mode  programs.  If  you want to use  it to test procedures,
you'll  have  to  write a simple shell  program that prompts for input
parameters, calls the procedure, and prints the output parameters, and
run  this  program  as  a  son  of the  test-bench. Testing block-mode
programs,  I  suspect,  would be much more  difficult; I'll have a few
words about it later, but it's still an unsolved problem as far as I'm
concerned.

Of course, the more complicated your test-bench is, the more important
it  is  to  write  a test suite for it!  (A bug in the test-bench that
keeps  it from properly checking  things could be almost unnoticeable,
since it will falsely tell you that your test suite ran fine.) We test
our  test-bench  by feeding a lot of  test commands, some of which are
calculated  to  produce  errors  and others to  succeed, and check the
results  of  these  operation  (not  using  the test-bench  itself, of
course) to see if they're as expected.

AUTOMATIC TEST SUITE GENERATION

No  matter  how  sophisticated  your test-bench, you'll  still have to
write  your  test  cases.  For simple  one-line-input, one-line-output
operations,  there's little that you need  to do beyond specifying the
input  and the expected output; however, for things that require a lot
of set-up, or are actually conversations, with many output prompts and
many inputs, you'd like a better way.

One idea that some automated testing people like is having you run the
program  once,  specifying all the right  inputs, and making sure that
all  the  outputs  are correct; these inputs  and outputs will then be
saved,  ready to be "re-played" by the test-bench, which will resubmit
exactly  the same inputs, and expect  to get exactly the same outputs.
In  effect,  then,  the  test  suite  will  check that  all subsequent
executions  of the program behave exactly  the same way as the initial
one  (which  was  presumably  correct).  The  way you'd do  this is by
modifying the test-bench program we discussed above (what? you mean to
say  you haven't written it yet?)  to have a special "data-collection"
mode  that will accept user inputs,  pass them to the program, collect
the  outputs,  and create a file that can  later be used by the normal
mode of the test-bench.

Now,  there  are  a few problems with this,  which lead me to conclude
that,  even if this data-collect feature  is used, the test suite that
it  generates must be easy to modify. Firstly, the user will doubtless
make errors while entering the original inputs; since the data-collect
feature  doesn't  know what's a user error  and what should be part of
the  test  suite,  there  needs  to  be some way  of editing out these
errors.  (Technically,  you  need not do this,  since exactly the same
inputs  should yield exactly the same error outputs in the future, but
if you don't edit them out, the test suite will be very unreadable and
unmaintainable.)  Secondly,  the future output  probably won't exactly
match  the  current output -- dates  and other environment information
(version  numbers, etc.) will doubtless change. There needs to be some
way  to  edit the generated test suite  to replace the expected output
with  some sort of "wildcard" characters that tell the test-bench that
any output would be acceptable in this case.

However,  taking  into  account  that  some editing will  be needed, a
data-collect feature can be quite convenient for testing features that
involve complicated I/O sequences.

A SAMPLE TEST ENVIRONMENT

Besides  having  the right test tools and  the right test suites, it's
important  to  internally  set  up  your  test  suites  and  your test
environment  so  that it is as easy as  possible to run all your tests
and  check  whether  or  not they succeeded. Here  are a few tips that
we've found handy ourselves:

   *  Have one test suite for each  major feature, not one big one for
     your  entire system. When you're  working on a particular feature
     and  want to see if it works, you'll probably want to re-run only
     that test suite after each change, and re-run all the test suites
     only at the very end.

   * As we mentioned before, have each test suite as self-contained as
     possible,  but  also try to have each  test case within each test
     suite  be relatively self-contained. The more a test case depends
     on  the results of the test cases that preceded it, the harder it
     will  be  for  you  to  fully  understand what the  state of your
     internal  files, databases, etc. is at  the time the test case is
     executed,  and  the  harder  it  will be to  maintain it, or even
     understand  why  the  "expected results" you have  for it in your
     test  suite  are really what should  be expected. Of course, some
     test  cases must not be self-contained precisely because you want
     to  make sure that they work  properly when done together, rather
     than separately.

   *  Have  each  test suite run in its  own group, with all the files
     needed  by  the  program being tested redirected  to the files in
     that  group.  The  first  thing that the test  suite should do is
     purge  the group (if it's logged on to it, this will merely purge
     all  the  files); this way, you'll be  sure that this test run is
     not influenced by previous runs of the same test suite, and since
     the  test suite runs in its own  group, it will not be influenced
     by concurrently-running other test suites.

   *  All the actual test suites and permanent support files should be
     in  their  own group, separate from the  groups in which the test
     suites run; this way, the test suites will be able to purge their
     own  groups, as discussed above. If  the test suite files are all
     in  a particular fileset (e.g.  "[email protected]"), they can be submitted
     in MPEX using a %REPEAT/%STREAM/%FORFILES construct.

   *  Each test suite should signal  that it completed successfully by
     building  a file called TESTOK, and  that it failed by building a
     file  called TESTERR (or, even better, TESTE###, where ### stands
     for   the   number   of   errors   discovered).  Then,  a  :LISTF
     [email protected],6 will show you which jobs had errors in them.
     In  case  you're  afraid that a job  might abort without building
     either  a TESTOK or TESTERR file,  you can build the TESTERR file
     at  the  very beginning and only purge it  at the end if all went
     well.

   *  Finally, if you use a  test-bench program, the test-bench should
     send  all its output, especially an  indication of all the errors
     (what  the input was, what the  output was, and what the expected
     output  was),  to  a  disc  file called, say,  TESTLOG, which can
     easily  be read, and will remain on  the system even if the spool
     file is deleted.

Thus, the configuration we use in VESOFT is:

   [email protected] -- command files used by the test suites.

   [email protected] -- MPEX test suites.

   [email protected] -- SECURITY test suites.

   [email protected] -- VEAUDIT test suites.

   TESTPROD.TEST.VESOFTD  --  a  command  file that  purges the VETEST
     account and %STREAMs [email protected][email protected][email protected].

   @.MALTFILE.VETEST     --     group     used     by    test    suite
   MALTFILE.TEST.VESOFTD.

   @.MBATCH.VETEST -- group used by test suite MBATCH.TEST.VESOFTD.

   ...

TESTING SEEMINGLY HARD-TO-TEST PROGRAMS

Some  things  are  easier  to  test  than others;  procedure calls are
simplest,   command-driven   programs   are   rather  straightforward,
"conversational"  character-mode  programs  are  a  bit  harder.  Much
depends  on how easy it is to feed the program input and intercept the
program's  output; for example, if a program does input from tape, you
might redirect it by a :FILE equation to a disc file, but how will you
test  the  code in the program that tries  to handle tape errors? If a
program  is supposed to submit a job, how can you tell whether the job
was properly submitted?

There  are  several  general  tricks  that you can  use to solve these
problems,  though  these are more examples  of ingenious solutions for
you  to  emulate,  not  specific  instructions  that should  always be
followed:

   *  Inputs:  Have  ways to "fake"  hard-to-trigger input conditions,
     like  bad tapes, control-Y, I/O errors, etc. For instance, have a
     "***BAD  TAPE***" record in a file  be interpreted by the program
     as a tape error; if the program expects a tape to contain several
     things separated by EOF markers (which can't normally be emulated
     by disc files), have it treat an "***EOF***" as an EOF marker.

     Again, this has the same problem as the PRETENDDATE/PRETENDONLINE
     features that we suggested above -- what you'll really be testing
     is  not the actual execution of the program, but the execution of
     the  program in testing mode. However, though problems with, say,
     the  actual condition code check that detects the tape error will
     not  be found, all the other  aspects of tape error handling will
     be properly tested.

   *  Outputs:  Find  commands or programs that  can convert an output
     that  is hard to test for into one  that is easy to test for; for
     instance,  if your program is supposed to  do a :DOWN, test it by
     doing  a :SHOWDEV afterwards to see if it is DOWNed or has a DOWN
     pending. If the program is supposed to do an :ABORTJOB, PAUSE for
     some  time  (since an :ABORTJOB may  not immediately take effect)
     and  then do a :SHOWJOB of that  job number to make sure that the
     job  no  longer exists. If your program  is supposed to shut down
     the system, you're out of luck...

   *  More  outputs:  But  maybe you're not out  of luck in the system
     shut-down  case,  after  all;  analogously  to  the  "fake input"
     suggestion  above,  you  might have your program  check to see if
     it's  in  testing mode, and if it  is, print a message instead of
     shutting   down   the   system   (or   doing   something  equally
     uncheckable-for).  Again, this won't make  sure that the ultimate
     operation  is done properly (since it  won't be done in this case
     at   all),  but  at  least  it'll  make  sure  that  all  of  the
     preliminaries will be handled correctly.

   *  General:  Find  ways  of  taking  care  of  timing  windows; for
     instance, if your program submits a job, a simple :SHOWJOB in the
     test  suite won't be a proper check (since a small job might have
     finished  by  the time the :SHOWJOB is  done), and having the job
     build a file or leave some such permanent file won't work either,
     since the job might still not have started up. Instead, your test
     suite  might build an empty message  file and then make sure that
     the job writes a record to this message file (possibly by setting
     up  a logon UDC for that user). Your test suite can then read the
     message  file, waiting until a record is written to it, no matter
     when the job actually gets around to executing.

     Message  files  are also quite useful when  the test suite has to
     check  something  at  a  particular  point  in the  son program's
     execution,  and if it checks it too early or too late the results
     will  not be quite right. This  is particularly so when your code
     is   supposed   to   properly   handle  concurrent  access  in  a
     non-standard  way  (i.e.  not  by  simply using  FLOCK/FUNLOCK or
     DBLOCK/DBUNLOCK).  You might want to have your programs, when run
     in  testing  mode,  try  to  read  records  from message  file at
     critical  points,  which  will let you  control when each program
     will hit a particular piece of code.

Again,  these  are  some sample solutions to  some (though by no means
all) testing problems. The $65,536 question of testing, however, still
remains:  How  do you test VPLUS  block-mode applications? Some of the
above  tricks  might  be  usable  --  instead  of  calling  the  VPLUS
intrinsics,  call  procedures  that, in testing  mode, will do normal,
unformatted terminal I/O (i.e. the input fields are to be input simply
as a data string, with all the fields run together), which can then be
run  under test-bench control. Unfortunately, it seems that this would
leave  too  much out of the testing  (for instance, the correctness of
the VPLUS calls themselves, and the correctness of any edits specified
in  the  VPLUS  forms),  and  the  test  suites  would  also  be quite
unreadable and unwritable. Someone might do something to intercept the
terminal  I/O  from  within  VPLUS  itself,  but  that's  getting  too
complicated for me. Any ideas?

CONCLUSION

To sum up, a few testing maxims:

   *  Automate  testing  --  both  the  input and the  checking of the
     output.

   *  Write test suites before or  while you're writing the program --
     that way, you can use them to do even the initial testing.

   *  Figure  out  the testing tools that you  need and don't skimp in
     building them; they can save you a lot of effort.

   * Make it as easy as possible to add new test cases (try to make it
     one test case per line), even if it means extra work up front.

   *  Have your test cases be in  job streams, not in source files, so
     that you can add new ones without recompiling.

   *  Change  your  programs  so that they  can "pretend" that today's
     date,  the  batch/online flag, your  logon information, and such,
     are  something  other than what they really  are. Do the same for
     hard-to-reproduce conditions, like I/O errors, control-Y, etc.

   *  Think about your code and come  up with test cases that exercise
     as much of it as possible; as new bugs arise, add test cases that
     would have caught them.

   *  Write  verification  routines  for  all  your  complicated  data
     structures,  especially  including  the  data  in your  files and
     databases.

   *  If feasible, write some sort  of test-bench program in which you
     can test the behavior of other programs by feeding them input and
     checking their output.

   * Think creatively about testing features that at first glance seem
     difficult  to check the results of.  Use message files to control
     timing problems.

   *  Be  prepared  to  spend a lot of  time and effort (and therefore
     money)  on  automated  testing,  but  expect  to save  a lot more
     effort, and come out with much fewer bugs, if you do it right.

Author	Message
Mithi25 Senior Member Joined: 23Jun2009 Online Status: Offline Posts: 288	Topic: AUTOMATED TESTING -- WHY AND HOW Posted: 22Oct2009 at 2:10am
	AUTOMATED TESTING -- WHY AND HOW Everyone knows how important testing is, and, with luck, everyone actually does test the software that they release. But do they really? Can they? Even a simple program often has many different possible behaviors, some of which only take place in rather unusual (and hard to duplicate) circumstances. Even if every possible behavior was tested when the program was first released to the users, what about the second release, or even a "minor" modification? The feature being modified will probably be re-tested, but what about other, seemingly unrelated, features that may have been inadvertently broken by the modification? Will every unusual test case from the first release's testing be remembered, much less retried, for the new release, especially if retrying the test would require a lot of preliminary work (e.g. adding appropriate test records to the database)? This problem arose for us several years ago, when we found that our software was getting so complicated that testing everything before release was a real chore, and a good many bugs (some of them very obvious) were getting out into the field. What's more, I found that I was actually afraid to add new features, concerned that they might break the rest of the software. It was this last problem that really drove home to me the importance of making it possible to quickly and easily test all the features of all our products. AUTOMATED TESTING The principle of automated testing is that there is a program (which could be a job stream) that runs the program being tested, feeding it the proper input, and checking the output against the output that was expected. Once the test suite is written, no human intervention is needed, either to run the program or to look to see if it worked; the test suite does all that, and somehow indicates (say, by a :TELL message and a results file) whether the program's output was as expected. We, for instance, have over two hundred test suites, all of which can be run overnight by executing one job stream submission command; after they run, another command can show which test suites succeeded and which failed. These test suites can help in many ways: * As discussed above, the test suites should always be run before a new version is released, no matter how trivial the modifications to the program. * If the software is internally different for different environments (e.g. MPE/V vs. MPE/XL), but should have the same external behavior, the test suites should be run on both environments. * As you're making serious changes to the software, you might want to run the test suites even before the release, since they can tell you what still needs to be fixed. * If you have the discipline to -- believe it or not -- write the test suite before you've written your program, you can even use the test suite to do the initial testing of your code. After all, you'd have to initially test the code anyway; you might as well use your test suites to do that initial testing as well as all subsequent tests. Note also that the test suites not only run the program, but set up the proper environment for the program; this might mean filling up a test database, building necessary files, etc. WRITING TEST SUITES Let's switch for a moment to a concrete example -- a date-handling package, something that, unfortunately, many people have had to write on their own, from scratch. Say that one of the routines in your package is DATEADD, which adds a given number of days to a date, and returns the new date. Here's the code that you might write to test it (the dates are represented as YYYYMMDD 32-bit integers): IF DATEADD (19901031, 7) <> 19901107 THEN BEGIN WRITELN ('Error: DATEADD (19901031, 7) <> 19901107'); GOT_ERROR:=TRUE; END; IF DATEADD (19901220, 20) <> 19910109 THEN BEGIN WRITELN ('Error: DATEADD (19901220, 20) <> 19910109'); GOT_ERROR:=TRUE; END; ... As you see, the code calls DATEADD several times, and each time checks the result against the expected result; if the result is incorrect, it prints an error message and sets GOT_ERROR to TRUE. After all the tests are done, the program can check if GOT_ERROR is TRUE, and if it is, say, build a special "got error" file, or write an error record to some special log record. This way, the test suites can be truly automatic -- you can run many test suites in the background, and after they're done, find out if all went well by just checking one file, not looking through many large spool files for error messages. The first thing that you might notice is that the DATEADD test suite can easily grow to be much larger than the DATEADD procedure itself! No doubt about it -- writing test suites is a very expensive proposition. Our test suites for MPEX/3000, SECURITY/3000, and VEAUDIT/3000 take up almost 30,000 lines, not counting supporting files and supporting code in the actual programs; the total source code of our products is less than 100,000 lines. Often, writing a test suite for a feature takes as long or almost as long as actually implementing the feature. Sometimes, instead of being reluctant to add a new feature for fear of breaking something, I am now reluctant to add a new feature because I don't want to bother writing a test suite for it. Fortunately, the often dramatic costs of writing test suites are recouped not just by the decrease in the number of bugs, but also by the fact that test suites, once written, save a lot of testing time. It's much easier for someone to run an already-written test suite than to execute by hand even a fraction of the tests included in the suite, especially if they require complicated set-up. Since a typical program will actually have to be tested several times before it finally works, the costs of writing a test suite (assuming that it's written at the same time as the code, or even earlier) can be recouped before the program is ever released. Also, test suites tend to have longer lives than code. A program can be dramatically changed -- even re-written in another language -- and, assuming that it was intended to behave the same as before, the test suite will work every bit as well. Once the substantial up-front costs of writing test suites have been paid, the pay-offs can be very substantial. But even though we should be willing to invest time and effort into writing test suites, there's no reason to invest more than we have to. In fact, precisely because test suites at first glance seem like a luxury, and people are thus not very willing to work on them, creating test suites should be as easy as possible. What can we do to make writing test suites simpler and more efficient? One goal that I try to shoot for is to make it as easy as possible to add new test cases, even if this means doing some additional work up front. I try to make every new test case, if possible, to fit on one line. The reason is quite simple: I want to have as little disincentive as possible to add new test cases. A really fine test suite would have tests for many different situations, including as many obscure boundary conditions and exceptions as possible; also, any time a new bug is found, a test should be added to the test suite that would have caught the bug, just in case the bug re-surfaces (a remarkably frequent event). If we grit our teeth and write some convenient testing tools up front, we can make it much easier to create a full test suite. Here, for instance is one example: PROCEDURE TESTDATEADD (DATE, NUMDAYS, EXPECTEDRESULT: INTEGER); BEGIN IF DATEADD (DATE, NUMDAYS) <> EXPECTEDRESULT THEN BEGIN WRITELN ('Error: DATEADD (', DATE, ', ', NUMDAYS, ') <> ', EXPECTEDRESULT); GOT_ERROR:=TRUE; END; END; ... TESTDATEADD (19901031, 10, 19901110); TESTDATEADD (19901220, 20, 19910109); TESTDATEADD (19920301, -2, 19920228); ... By this model, each procedure that you test would have a test procedure like this one written for it; then the main body of your test program would just be calls to these test procedures. This is especially useful for procedures that require some special processing before or after being called; for instance, they might have reference parameters that need to be put into variables before they're passed, record structure parameters to be filled, multiple by-reference output parameters that all need to be compared against expected values, and so on. You can make up other, even more general-purpose testing tools, such as the following procedure: PROCEDURE MUSTBE (TAG: stringtype; RESULT, EXPECTEDRESULT: INTEGER); BEGIN IF RESULT<>EXPECTEDRESULT THEN BEGIN WRITELN ('Error: ', TAG, ': ', RESULT, ' <> ', EXPECTEDRESULT); (* error handling code ) END; END; This procedure can be used to check the result of any function that returns an integer value, e.g. MUSTBE ('DATEADD #1', DATEADD (19901031, 10), 19901110); MUSTBE ('DATEADD #2', DATEADD (19901220, 20), 19910109); MUSTBE ('DATEADD #3', DATEADD (19920301, -2), 19920228); Other, similar, procedures might be written to help test functions that return other types (REALs, STRINGs, etc.). On the other hand, for functions that can't easily be called in one statement (because they take by-reference or specially-formatted parameters), you might want to consider writing a special test procedure. Finally, one other alternative (which I personally prefer) is writing a special "shell" program that asks for a procedure name, its parameters, and the expected result, calls the procedure, and checks the result: PROGRAM TESTSHELL ... ... READLN (PROCNAME, P1, P2, EXPECTEDRESULT); WHILE PROCNAME<>'EXIT' DO BEGIN IF PROCNAME='DATEADD' THEN RESULT:=DATEADD (P1, P2) ELSE IF PROCNAME='DATEDIFF' THEN RESULT:=DATEDIFF (P1, P2) ELSE IF PROCNAME='DATEYEAR' THEN RESULT:=DATEYEAR (P1) ... IF RESULT <> EXPECTEDRESULT THEN ... output error ... READLN (PROCNAME, P1, P2, EXPECTEDRESULT); END; ... This way, your actual test suite could be a job stream, to which you can add as many test cases as you like -- one line per test case -- without having to recompile anything: !JOB TESTDATE, ... !RUN TESTSHEL DATEADD 19901031 10 19901110 DATEADD 19920228 2 19920301 DATEYEAR 19920228 0 1992 ... !EOJ Whenever you make a change to your procedures, you just rerun the TESTDATE job, and you'll either find some bugs or be reasonably confident (though, of course, never 100% confident) that the software works. TESTING PROGRAMS THAT DO I/O It's rather easy to test a procedure whose only inputs are its parameters and whose only output is its result (or even a by-reference parameter). The more places a program derives its input from, or sends its output to, the harder it becomes to test. Let's take a simple I/O program, one which reads a file, reformats it in some way, and writes the result to another file. Obviously, to test it, we should fill up the input file, run the program, and compare the output file against the expected output file. As we discussed in the previous section, it would be nice if we could build a program -- it might be a 3GL or 4GL program, or even an MPE or MPEX command file -- that takes as parameters the input data and the expected output data, so that we can easily add new test cases. A first try on this might be a job stream like the following: :PURGE TESTIN :FCOPY FROM;TO=TESTIN;NEW LINE ONE LINE TWO LINE THREE :FILE MYPROGI=TESTIN :PURGE TESTOUT :FILE MYPROGO=TESTOUT :RUN MYPROG :PURGE TESTCOMP :FCOPY FROM;TO=TESTCOMP;NEW PROCESSED LINE A PROCESSED LINE B :SETJCW JCW=0 :CONTINUE :FCOPY FROM=TESTCOMP;TO=TESTOUT;COMPARE=1 :IF JCW<>0 THEN : handle error :ENDIF or, if the commands are put into a separate command file or UDC, :TESTCMDS LINE ONE LINE TWO LINE THREE :EOD PROCESSED LINE A PROCESSED LINE B :EOD (the data would go as input to the :FCOPY commands in the command file). Note how the :FILE equations come in handy to redirect the program's input and output files. Not only does this avoid the need to overwrite the production input and output files, but it makes it possible for several test suites which test programs that normally use the same files (e.g. this program, the program that created this program's input file, and the one that reads this one's output file) to run at once. If for some reason your programs don't allow :FILE equations (e.g. they issue their own :FILE equations to refer to these files), try to change them so they do, or at least so they have a special "test" mode that will read :FILE-equatable files. Note also that the job stream regenerates the input and comparison files every time it runs. I recommend this, since then each job stream would be a more or less self-contained unit (if it uses a special command file that no other test job uses, I suggest that you build even this command file inside the test job). It is easier to move or maintain, and is less likely to suffer from "software rot" (a condition that causes software that's been left on the shelf too long to stop working, largely because some outside things that it depends on have changed). Back to our example. One problem with it is that :FCOPY ;COMPARE= is rather finicky about the files it's comparing -- for instance, they must both have exactly the same record size. TESTCOMP, built by an :FCOPY FROM;TO=TESTCOMP would normally have the same record size as the job input device, so you might need a :FILE equation to work around this. A more serious problem is that :FCOPY FROM;TO= can only be easily used for creating files that contain ASCII data. What if some of the columns of the file need to contain binary data? Here is where I think you ought to grit your teeth and write a special program (unless, of course, you have a 4GL that can do this for you). Yes, I know that it seems like a pain to write code that will never be run in production, but is only needed to test other code, but this rather simple program could, if designed right, prove to be a highly reusable building block. The program would first prompt for some sort of "layout" of the file -- a list of the starting column numbers, lengths, and datatypes of each field in the file. Then, it would prompt for each record in the file, specified as a list of fields, separated by, say, commas; it would format the fields into the file record, and write it into the file. Thus, you'd say: :RUN BLDFILE S,1,8, I,9,2, S,11,10, P,21,8 << string, integer, string, packed >> SMITH, 100, XYZZY, 1234567 JONES, 55, PLUGH, 554927 ... Once you write this program, incidentally, you might find that it has other uses, say, to do manual testing of your program once you've already found that it has a bug and are trying to isolate it. And, of course, if you make it general enough, it should be usable in all of your test suites. Also, your input file had to have been created by some program, and your output file must be intended as to input to some other program; there's nothing that says that you can't run those programs in the test job to create the input file from data you've input and then format the output file into readable text. The problems come in if the other programs are too hard to run in batch (e.g. require block mode), or if you'd like to be able to test each program separately from the others, perhaps because you want to see how your program reacts to illegal data in its input file, data that shouldn't normally appear in the input generated by the other program. What if your programs reads and writes an IMAGE database? This is in some ways simpler to test and in other ways more difficult. You can use QUERY to fill the input sets and create output (using >LIST or >REPORT) will be usable by :FCOPY ;COMPARE=. Be sure, though, that you sort any master sets that you dump using >REPORT -- since the order of the entries in the master set depends on the hashing algorithm, which depends on the capacity, unsorted output will make the test suite find an "error" every time you change the capacity. However, the setup of the IMAGE database might also be a bit more cumbersome, largely since you probably want to have your own special test database built by the job (for the reasons discussed above -- independence from the production data, from other test suites' data, and self-containedness). You might want to create a simple program or command file that takes a schema file, lowers the dataset capacities on it, runs DBSCHEMA, and then does a DBUTIL,CREATE -- you'll find that a lot of your test suites can use it. ADJUSTING FOR ENVIRONMENT INFORMATION Our pass-input-and-compare-output-against-expected-result strategy works just fine if the same input is always supposed to yield the same output, but what if the output can vary? The most common variables are based on current date and time -- reports that contain this information in headers, output files that have each value date-stamped, a date-handling procedure that returns today's day-of-week, and so on. Another related problem is with programs that check whether they're being run online or in batch, and do different things in these cases -- how can your batch test suite make sure that the online features work properly? What we really have here is a different sort of input, input not from a file or a database, but from the system clock or the WHO intrinsic. There are a few ways of handling this; for example, instead of doing an FCOPY ;COMPARE=, which demands exact matches, you can have your own comparison program that lets you specify that some particular field -- e.g., the date on a report header -- will not get compared. Even better, your comparison program can let you specify that a particular field should be equal to, say, the current year, month, or day, calculated at the time the comparison program runs. However, more flexible still -- and necessary for things like pretending you're online rather than in batch -- you can try to redirect this "input" from the environment, just as you redirected input from files and databases using :FILE equations. Now how are you going to do this redirection? Believe it or not, after having the gall to ask you to write test suites that are as long as your source code, I'm suggesting that you change your programs to accommodate testing requirements. Instead of calling CALENDAR or DATELINE, for instance -- or using whatever language construct may give you this information -- you might write your own procedure: FUNCTION MYCALENDAR: SHORTINT; BEGIN get the value of the "PRETENDCALENDAR" jcw; IF the value is 0 THEN MYCALENDAR:=CALENDAR ELSE MYCALENDAR:=value of jcw; END; This way, your program would normally get the current date from the CALENDAR intrinsic, but when the PRETENDCALENDAR JCW is non-zero, will use that value instead. You might, for efficiency's sake, want to get the JCW value only once, and then save it somewhere; for ease of use, you might want to look at the PRETENDYEAR, PRETENDMONTH, and PRETENDDAY JCWs, and assemble the CALENDAR-format value from them (possibly using the date-handling package that we so thoroughly tested a few pages ago). A similar procedure might be written to determine whether the program is running online or in batch -- it'll check the PRETENDONLINE JCW, and if it doesn't exist, or set to some default value, will call the WHO intrinsic. If your program does different things depending on the user's capabilities or logon id, you might want to have similar procedures for them, too (wrapped around the WHO intrinsic call) -- although it's possible for your test suite to actually be several jobs, each of which logs on under a different user, with different capabilities, it may be more convenient for you if one job can make itself look like each one of these users in turn. In fact, it might even be convenient for your own manual debugging (say, when you want to duplicate the program's behavior as a particular user id, or on a particular date, but don't want to re-logon or change the system clock). Of course, the drawback to this approach is that you're not actually testing the program as it really behaves, but rather as it behaves with the testing flag set; the code you're executing in testing mode is somewhat different than is normally executed, and if, say, there's a bug in the CALENDAR call or the WHO call, your test suite won't catch it, since in testing mode the intrinsics aren't called at all. Unfortunately, this seems to be a necessary evil; the only solution is to minimize the amount of code whose execution depends on whether or not you're in testing mode. One thing that you might do -- if you want to be really fancy -- is create a library of procedures called CALENDAR, CLOCK, WHO, etc., which would, depending on some testing flag, either call the real CALENDAR, CLOCK, or WHO, or return "pretend" values; you can then put these procedures into an RL, SL, or XL, and not have to change your source file. Once you debug your library procedures, you should have more confidence that your testing in test mode actually simulates what the program will really behave like in production. One thing that you may have to do, however, is intercept not just the intrinsics that you call directly, but also whatever procedures might be called by language constructs (like COBOL's facility for returning today's date). WHAT TEST CASES SHOULD YOU USE? So far, we've talked a lot about how to write tools that make it easy for you to add test cases to your test suites, but not much about what your test cases should be. Say that you're testing a DATEADD procedure, one that returns a date that is X days after date Y. (Let's assume that X could be negative -- X = -5 means a date that is 5 days before date Y.) What test cases should you use? Think about this before reading the answers! Well, it seems to me that there are quite a few: Add days so that it stays in the same month (e.g. 1990/05/10+7). * Add days so that it changes months (e.g. 1990/05/10+30). * Add days so that it changes years (e.g. 1990/05/10+300). * Add days so that it changes months or years over February 28th in a non-leap year (e.g. 1990/02/10+30). * Add days so that it changes months or years over February 29th in a non-leap year (e.g. 1992/02/10+30). * Handle years that are divisible by 100 but not by 400 (like 1900 or 2100), which are not leap years (did you know this?). * Add 0 days. * Add days so that it goes outside of your accepted date range (e.g. beyond 1999, or whatever other date is your limit). * Add to an invalid date -- one with an invalid year, month, or day. * All the above, but with subtracting days. Wow! That's a lot of work. But, you'll have to admit, all the above are things that you really should test for (unless they're not relevant to your particular interpretation, e.g. if your date range doesn't extend to 1900 or 2100, or if you've consciously decided not to check for certain error conditions), manually if not automatically; it's especially important to test for "boundary conditions" (did you know that, in the DATELINE intrinsic, the next day after DEC 31, 1999 is JAN 1, 19:0?), for cases that require special handling (like leap year), and for proper handling of errors. These are the obvious tests -- tests for bugs that you expect might happen. As other bugs come up, however, you ought to add test cases that would have caught these bugs: firstly, you'll have to test your fix anyway, and if you add the test case before implementing it, it won't cost you anything extra; secondly, the same bug (or a similar one) may come up later, but this time will get caught. Still, there's no need to get extreme about things; shortcuts are still possible. Say, for instance, that DATEADD works by converting the date into a "century date" format (number of days since a particular base date), adding the number of days, and then converting back into a year/month/day format -- if you're sure that this is all it does, you might just have one test case (preferably one that seems to exercise as much of the internal logic as possible, such as one that changes months and years). Of course, you'd still have to properly test the date conversion routines. In general, what you test should depend on how your code works. Whenever you know your code treats two different types of input differently, you should test both. If you're fairly certain that a single test will test many features, you can just use that one test; if, for instance, you know that testing one module or routine will also adequately test the module or routines that it calls, you can make do with just testing the top-level one. However, try to resist this temptation; firstly, the top-level module probably doesn't exercise all the functions of the bottom-level one, and secondly, it's very convenient to have a test suite for the bottom-level module -- that way, if you're making substantial changes to your system and you know the top-level module is broken, you can still test the bottom-level one independently. Finally, an obvious point, but one that it often neglected -- it's better to test a little than not at all. If you find something that's hard to test in all possible ways, test it in at least a few; if, for instance, its results are hard to automatically verify, at least make sure that they're in the right format, or even that they're returned at all (i.e. that the program doesn't just abort). There's really no 90-10 rule in testing -- 10% of the effort won't get you much more than 10% of the benefit -- but you can at least avoid some of the more obvious (and more embarrassing) bugs. Then, once the groundwork is laid, you might try to get back to it periodically, adding a new test case here or there. Don't let perfectionism get in the way of doing at least something. VERIFYING DATA STRUCTURES Most sufficiently complicated data structures -- anything from your data stored in an IMAGE database to your own linked lists, hash files, or B-trees, if you write such things yourself -- have internal consistency requirements. Certain fields in your databases may only contain particular values; other fields must have corresponding records in other datasets or in other databases. If any of these internal consistency requirements are not met, you know you have a bug somewhere. You can get a lot of benefit out of writing a verification routine for each such data structure that checks it for internal consistency. This is somewhat different from the test suites we discussed before, which check for the validity of the ultimate results, but it can still be very useful, since internal inconsistency must have, by definition, been caused by a program error, and is likely to eventually lead to incorrect results (incorrect results that your test suites might not otherwise check for). You should call this verification routine at the end of each test suite to verify the consistency of the structures (again, usually data in the database) that the program being tested built; you might even run it after each step in the test suite, to isolate exactly where an error might be sneaking in. You may also want to run the verification routine against your production database every night, to check your programs as they run in the real world, not just the controlled test environment; and, you can run it whenever you suspect that something may be wrong (either in testing or in production), to figure out if an internal inconsistency might be causing it. The verification routine shouldn't be hard to write; if you can do it using a 4GL or some other tool (like Robelle's fast SUPRTOOL -- speed is important, since you want to make it as quick and easy as possible to verify your data), all the better. Simply put, check all the fields for which at least one possible value would be invalid, whether it is because it's not one of a list of allowable values for this field, or because it's out of range, or because it's inconsistent with some other values in this record, or some other values in the dataset/database. Possible checks include: * Flag fields may contain only certain allowable values. * Numbers, like salaries or prices, must be within certain ranges (e.g. non-negative, below a certain amount, etc.). * Dates must be valid (valid year, month, day). * Strings must at least not include non-alphanumeric characters. * Some fields must have corresponding entries in a different dataset or database (do a DBGET mode 7, for instance, to make sure they're there). * Some fields are calculated from other fields, and must match (e.g. a total price field in an invoice that must be the sum of the price fields in the line items). Not only can this check for bugs in your programs, but can also check for invalid production data that your programs might not have detected (e.g. garbage characters in string fields, bad states, state codes that don't match phone numbers, etc.). And, again, if written as a QUERY >XEQ file, or as a 4GL program, it can be very easily created, and used over and over again. TESTING COMMAND-DRIVEN AND CHARACTER-MODE INTERACTIVE PROGRAMS As we discussed before, the key to successful automated testing is to have the proper tools that make adding test cases easy. One in particular -- which takes some work to construct but can make writing test suites much simpler -- is very much worth discussing. This test-bench lets you run another program under its control, with the son program's $STDIN and $STDLIST redirected to message files. The test-bench can let you specify input to be passed to the son program, and the expected output that the son program should display; for instance, a typical test suite might look like: :RUN TESTBENC RUN SONPROG << command to start son process >> I CALC 10+20 << command for son to execute >> O 30 << expected result >> DOIT << tells test-bench to compare the results >> I SQUARES 3 << command for son to execute >> O 1 << expected result >> O 4 << expected result >> O 9 << expected result >> DOIT << tells test-bench to compare the results >> ... What are the advantages of this approach? * It lets you specify the expected output right after the input; this makes the test suite much easier to write (and read and maintain) than if you had to specify all the input up front and all the expected output (from all the commands) at the end. * It lets you compare the output and the expected output much more flexibly than a simple :FCOPY ;COMPARE= would; you can specify special commands that indicate, say, that the output needn't be exactly as you specified, but might include some variations here and there (e.g. date- or environment-dependent information). * It tells you exactly which commands got errors, rather than just telling you an error was found. Note, however, that all this test-bench does is feed input into the son process' $STDIN and read output from its $STDLIST; what about other output, say output to files, databases, JCWs, or MPE/XL variables, and input from the same places? Fortunately, output to one of those places can easily be converted into output to $STDLIST simply by executing an MPE command, like :PRINT (or :FCOPY on MPE/V), :SHOWJCW, :RUN QUERY, etc. If our program can not only check the output of the son process and feed input to a son process, but execute MPE commands and check their output and feed them input, our problems will be solved. Let's say your program is supposed to build a file, and you want to make sure that the file is built with the right structure and the right contents. Then, your test suite might look like this: :RUN TESTBENC RUN SONPROG << command to start son process >> I BUILDFILE XXX << command to test >> O File was built. << expected output to $STDLIST >> DOIT MPE :LISTF XXX,2 << MPE command to execute >> O ... O XXX 123 32W ... << expected :LISTF output >> O ... DOIT MPE :PRINT XXX << MPE command to execute >> O ... << expected :PRINT xxx output >> DOIT How does this program work? Well, as we said before, it runs the son program with $STDIN and $STDLIST redirected to message files; "I" commands write stuff to the input message file, "O" commands write the expected output records to a special temporary file, and "DOIT" commands read the output message file and compare it against the O-command temporary file. One problem is how "DOIT" will recognize that the son program is done with its output and has issued another input prompt. If the son program always uses the same prompt (or one of a few prompts), DOIT can check for this; if, however, the son program's prompt isn't easily distinguishable from normal output, you should make the son program print a special line (e.g. "*INPUT") before doing any input when it's in "testing mode"; so long as this happens before any input, DOIT can recognize these lines and realize that the output is done. What about the MPE commands that can be used to "convert" output to files, databases, etc. into output to $STDLIST? When we test MPEX (the test-bench I'm describing is essentially the one that we use to test all of our software), this is no problem, since we can just pass MPEX an MPE command as input, and MPEX will execute it. You might do the same yourself -- make sure that the program you're testing executes MPE commands -- or you can have TESTBENC have two son processes, one the program being tested, and the other a simple program that prompts for an MPE command and executes it. The only other problem that you'll face here is executing :FCOPY or :RUN commands on MPE/V (where they can't be done with the COMMAND intrinsic); however, if you're an MPEX customer, you can actually use MPEX as this MPE-command-executing son process -- MPEX can execute :FCOPYs, :PRINTs, :RUNs, etc. This test-bench $STDIN-and-$STDLIST-redirection solution, it seems to me, would work quite well for any command-driven or interactive character-mode programs. If you want to use it to test procedures, you'll have to write a simple shell program that prompts for input parameters, calls the procedure, and prints the output parameters, and run this program as a son of the test-bench. Testing block-mode programs, I suspect, would be much more difficult; I'll have a few words about it later, but it's still an unsolved problem as far as I'm concerned. Of course, the more complicated your test-bench is, the more important it is to write a test suite for it! (A bug in the test-bench that keeps it from properly checking things could be almost unnoticeable, since it will falsely tell you that your test suite ran fine.) We test our test-bench by feeding a lot of test commands, some of which are calculated to produce errors and others to succeed, and check the results of these operation (not using the test-bench itself, of course) to see if they're as expected. AUTOMATIC TEST SUITE GENERATION No matter how sophisticated your test-bench, you'll still have to write your test cases. For simple one-line-input, one-line-output operations, there's little that you need to do beyond specifying the input and the expected output; however, for things that require a lot of set-up, or are actually conversations, with many output prompts and many inputs, you'd like a better way. One idea that some automated testing people like is having you run the program once, specifying all the right inputs, and making sure that all the outputs are correct; these inputs and outputs will then be saved, ready to be "re-played" by the test-bench, which will resubmit exactly the same inputs, and expect to get exactly the same outputs. In effect, then, the test suite will check that all subsequent executions of the program behave exactly the same way as the initial one (which was presumably correct). The way you'd do this is by modifying the test-bench program we discussed above (what? you mean to say you haven't written it yet?) to have a special "data-collection" mode that will accept user inputs, pass them to the program, collect the outputs, and create a file that can later be used by the normal mode of the test-bench. Now, there are a few problems with this, which lead me to conclude that, even if this data-collect feature is used, the test suite that it generates must be easy to modify. Firstly, the user will doubtless make errors while entering the original inputs; since the data-collect feature doesn't know what's a user error and what should be part of the test suite, there needs to be some way of editing out these errors. (Technically, you need not do this, since exactly the same inputs should yield exactly the same error outputs in the future, but if you don't edit them out, the test suite will be very unreadable and unmaintainable.) Secondly, the future output probably won't exactly match the current output -- dates and other environment information (version numbers, etc.) will doubtless change. There needs to be some way to edit the generated test suite to replace the expected output with some sort of "wildcard" characters that tell the test-bench that any output would be acceptable in this case. However, taking into account that some editing will be needed, a data-collect feature can be quite convenient for testing features that involve complicated I/O sequences. A SAMPLE TEST ENVIRONMENT Besides having the right test tools and the right test suites, it's important to internally set up your test suites and your test environment so that it is as easy as possible to run all your tests and check whether or not they succeeded. Here are a few tips that we've found handy ourselves: Have one test suite for each major feature, not one big one for your entire system. When you're working on a particular feature and want to see if it works, you'll probably want to re-run only that test suite after each change, and re-run all the test suites only at the very end. * As we mentioned before, have each test suite as self-contained as possible, but also try to have each test case within each test suite be relatively self-contained. The more a test case depends on the results of the test cases that preceded it, the harder it will be for you to fully understand what the state of your internal files, databases, etc. is at the time the test case is executed, and the harder it will be to maintain it, or even understand why the "expected results" you have for it in your test suite are really what should be expected. Of course, some test cases must not be self-contained precisely because you want to make sure that they work properly when done together, rather than separately. * Have each test suite run in its own group, with all the files needed by the program being tested redirected to the files in that group. The first thing that the test suite should do is purge the group (if it's logged on to it, this will merely purge all the files); this way, you'll be sure that this test run is not influenced by previous runs of the same test suite, and since the test suite runs in its own group, it will not be influenced by concurrently-running other test suites. * All the actual test suites and permanent support files should be in their own group, separate from the groups in which the test suites run; this way, the test suites will be able to purge their own groups, as discussed above. If the test suite files are all in a particular fileset (e.g. "[email protected]"), they can be submitted in MPEX using a %REPEAT/%STREAM/%FORFILES construct. * Each test suite should signal that it completed successfully by building a file called TESTOK, and that it failed by building a file called TESTERR (or, even better, TESTE###, where ### stands for the number of errors discovered). Then, a :LISTF [email protected],6 will show you which jobs had errors in them. In case you're afraid that a job might abort without building either a TESTOK or TESTERR file, you can build the TESTERR file at the very beginning and only purge it at the end if all went well. * Finally, if you use a test-bench program, the test-bench should send all its output, especially an indication of all the errors (what the input was, what the output was, and what the expected output was), to a disc file called, say, TESTLOG, which can easily be read, and will remain on the system even if the spool file is deleted. Thus, the configuration we use in VESOFT is: [email protected] -- command files used by the test suites. [email protected] -- MPEX test suites. [email protected] -- SECURITY test suites. [email protected] -- VEAUDIT test suites. TESTPROD.TEST.VESOFTD -- a command file that purges the VETEST account and %STREAMs [email protected][email protected][email protected]. @.MALTFILE.VETEST -- group used by test suite MALTFILE.TEST.VESOFTD. @.MBATCH.VETEST -- group used by test suite MBATCH.TEST.VESOFTD. ... TESTING SEEMINGLY HARD-TO-TEST PROGRAMS Some things are easier to test than others; procedure calls are simplest, command-driven programs are rather straightforward, "conversational" character-mode programs are a bit harder. Much depends on how easy it is to feed the program input and intercept the program's output; for example, if a program does input from tape, you might redirect it by a :FILE equation to a disc file, but how will you test the code in the program that tries to handle tape errors? If a program is supposed to submit a job, how can you tell whether the job was properly submitted? There are several general tricks that you can use to solve these problems, though these are more examples of ingenious solutions for you to emulate, not specific instructions that should always be followed: * Inputs: Have ways to "fake" hard-to-trigger input conditions, like bad tapes, control-Y, I/O errors, etc. For instance, have a "*BAD TAPE" record in a file be interpreted by the program as a tape error; if the program expects a tape to contain several things separated by EOF markers (which can't normally be emulated by disc files), have it treat an "EOF" as an EOF marker. Again, this has the same problem as the PRETENDDATE/PRETENDONLINE features that we suggested above -- what you'll really be testing is not the actual execution of the program, but the execution of the program in testing mode. However, though problems with, say, the actual condition code check that detects the tape error will not be found, all the other aspects of tape error handling will be properly tested. Outputs: Find commands or programs that can convert an output that is hard to test for into one that is easy to test for; for instance, if your program is supposed to do a :DOWN, test it by doing a :SHOWDEV afterwards to see if it is DOWNed or has a DOWN pending. If the program is supposed to do an :ABORTJOB, PAUSE for some time (since an :ABORTJOB may not immediately take effect) and then do a :SHOWJOB of that job number to make sure that the job no longer exists. If your program is supposed to shut down the system, you're out of luck... * More outputs: But maybe you're not out of luck in the system shut-down case, after all; analogously to the "fake input" suggestion above, you might have your program check to see if it's in testing mode, and if it is, print a message instead of shutting down the system (or doing something equally uncheckable-for). Again, this won't make sure that the ultimate operation is done properly (since it won't be done in this case at all), but at least it'll make sure that all of the preliminaries will be handled correctly. * General: Find ways of taking care of timing windows; for instance, if your program submits a job, a simple :SHOWJOB in the test suite won't be a proper check (since a small job might have finished by the time the :SHOWJOB is done), and having the job build a file or leave some such permanent file won't work either, since the job might still not have started up. Instead, your test suite might build an empty message file and then make sure that the job writes a record to this message file (possibly by setting up a logon UDC for that user). Your test suite can then read the message file, waiting until a record is written to it, no matter when the job actually gets around to executing. Message files are also quite useful when the test suite has to check something at a particular point in the son program's execution, and if it checks it too early or too late the results will not be quite right. This is particularly so when your code is supposed to properly handle concurrent access in a non-standard way (i.e. not by simply using FLOCK/FUNLOCK or DBLOCK/DBUNLOCK). You might want to have your programs, when run in testing mode, try to read records from message file at critical points, which will let you control when each program will hit a particular piece of code. Again, these are some sample solutions to some (though by no means all) testing problems. The $65,536 question of testing, however, still remains: How do you test VPLUS block-mode applications? Some of the above tricks might be usable -- instead of calling the VPLUS intrinsics, call procedures that, in testing mode, will do normal, unformatted terminal I/O (i.e. the input fields are to be input simply as a data string, with all the fields run together), which can then be run under test-bench control. Unfortunately, it seems that this would leave too much out of the testing (for instance, the correctness of the VPLUS calls themselves, and the correctness of any edits specified in the VPLUS forms), and the test suites would also be quite unreadable and unwritable. Someone might do something to intercept the terminal I/O from within VPLUS itself, but that's getting too complicated for me. Any ideas? CONCLUSION To sum up, a few testing maxims: * Automate testing -- both the input and the checking of the output. * Write test suites before or while you're writing the program -- that way, you can use them to do even the initial testing. * Figure out the testing tools that you need and don't skimp in building them; they can save you a lot of effort. * Make it as easy as possible to add new test cases (try to make it one test case per line), even if it means extra work up front. * Have your test cases be in job streams, not in source files, so that you can add new ones without recompiling. * Change your programs so that they can "pretend" that today's date, the batch/online flag, your logon information, and such, are something other than what they really are. Do the same for hard-to-reproduce conditions, like I/O errors, control-Y, etc. * Think about your code and come up with test cases that exercise as much of it as possible; as new bugs arise, add test cases that would have caught them. * Write verification routines for all your complicated data structures, especially including the data in your files and databases. * If feasible, write some sort of test-bench program in which you can test the behavior of other programs by feeding them input and checking their output. * Think creatively about testing features that at first glance seem difficult to check the results of. Use message files to control timing problems. * Be prepared to spend a lot of time and effort (and therefore money) on automated testing, but expect to save a lot more effort, and come out with much fewer bugs, if you do it right.
	Send Unlimited FREE SMS to Any Mobile Anywhere in INDIA, Click Here Post Resume: Click here to Upload your Resume & Apply for Jobs

	IP Logged


One Stop Testing Forum : Types Of Software Testing @ OneStopTesting : Automated Testing @ OneStopTesting