Using the OS/2 debugging interface to monitor the system


Introduction

    One of the components of OS/2 which nearly all OS/2 developers have used,
    but which very few have programmed themselves, is the debugging interface.

    Both Codeview and Multiscope use this interface to provide you with the
    ability to debug your programs.  It is this interface which enables you to
    single-step your program, to set breakpoints and to examine or modify data.

    This article is a simple introduction to using this interface, and provides
    an example program which may be of some use in its own right as well.


Overview of DosPTrace

    OS/2 is a protect mode operating system, which normally PREVENTS one process
    interfering with another one.  Unfortunately this is exactly what a debugger
    is required to do, and so in order to support debugging a standardised
    debugging interface was included in OS/2.

    The debugging interface allows one process (the debugger) to start another
    process in debug mode.  OS/2 provides the mechanism for the debugger to
    control execution of the child process, set/clear breakpoints, read/write
    registers and memory, etc.

    The mechanism is that the debugger starts the program to be debugged using
    the appropriate debugging option on DosExecPgm or DosStartSession to inform
    OS/2 that this program will be debugged.

    The debugger then controls this process using the DosPTrace function. This
    function takes one parameter - a pointer to a debug buffer. Some of the
    fields in this buffer are set up before DosPTrace is called, and others
    are filled in by the operating system upon return.
    (See source listing below for a definition of the PTRACEBUF structure)

    The two most important fields in the buffer are the 'pid' and 'cmd' fields.
    The 'cmd' field identifies the action required of DosPtrace, for example
    read memory, stop process, etc; and the 'pid' field identifies WHICH
    process the command is to refer to.

    On return the 'cmd' field is replaced with the return code.

    The first command issued for a new process should be the PT_STOP command,
    which initialises the DosPTrace buffer.  This will return with PT_LOADED,
    which indicates that the main program module has been loaded.  The command
    should then be reissued, and will return PT_LOADED for each of the DLLs
    which the program requires.  Once all the DLLs are loaded the PT_STOP
    command will return PT_SUCCESS.

    At this point the debugger can issue commands to read/write memory or
    registers, set/clear breakpoints, and to run the program.

    Control will be returned by OS/2 when the debugged program hits a breakpoint,
    causes a processor exception, or terminates.  OS/2 will fill in the buffer
    with the current register contents and an indication of which of the above
    caused the return.


    In addition to this functionality, a second debug option for both DosExecPgm
    and DosStartSession is provided to allow the debugger to control both the
    initial process and also ALL processes begun by it, whether by DosExecPgm
    or DosStartSession.  The mechanism is that DosPTrace returns with a status
    of PT_CHILDPID status, providing the PID of the descendant process.  The
    debugger must then create a second thread, or another process, to manage the
    debugging of this process.


Background to the example program

    Unfortunately, DosPTrace is not particularly well described in the OS/2
    manuals which I have seen. In particular, the use of the child process debug
    option described in the previous paragraph is incomplete - the PT_CHILDPID
    return is generally not mentioned at all!

    Space does not permit me to provide a complete description of DosPtrace -
    the range of commands which can be used, and the relevant fields for each
    command.  The best source I can suggest is to refer to as many OS/2 manuals
    as you can get hold of - and not to believe any of them...  I have found the
    only way to find out how DosPTrace works is to write programs which use it.

    For this reason I thought that a simple, working, example of a framework for
    using DosPTrace would help to make up for the inadequacies of the
    documentation, and might well be of interest, or even or use, to the readers
    of Pointers.


    While I was deciding what was the best way of going about this, it occurred
    to me that the main OS/2 shell program is, after all, just a program - it is
    PMSHELL.EXE.  After OS/2 has initialised it is in charge of starting other
    sessions, and hence programs.

    I decided my example program would use PMSHELL itself as the program to
    debug.  By using the 'child process' debug mode, all processes started after
    OS/2 has initialised will be run under control of the OS/2 debug interface
    and hence of this example program.

    I have called the example program 'BIGBRO', as in the 'big brother' of George
    Orwell's "1984", because it watches all the programs running without them
    being aware of it.


Functionality of the example program

    The program is installed in CONFIG.SYS (as described below), and the system
    rebooted.  After initialisation has completed all the processes which are
    started by PMSHELL or its descendants are run under the control of 'BIGBRO',
    which does the following:

        o logs all process starts, with PID and full program name

        o logs all process terminations, with return code

        o replaces the 'trap 0D' hard error popup with a simple-minded
          log of the registers, and a beep.

    Obviously further enhancements would be easy to add, but space restricts
    them in this article!  Data about active processes could be maintained,
    a full walkback could be performed for process exceptions, remote network
    access to BIGBRO could be added, etc.

    There are a few point which must be borne in mind when using BIGBRO:

        o each active process requires a dedicated thread in BIGBRO, which
          consumes memory, and also limits the number of active processes
          to the maximum number of threads in a single process.

        o Program loading is slower because each DLL loaded requires action
          by BIGBRO to continue.

        o The simple-minded logging creates a file which grows without limit.

        o Trap 0D errors (general protection violation) are handled by BIGBRO,
          rather than the default HARDERR handling of a full-screen popup.
          Owing to the way DosPTrace works process which fail with a trap 0D
          do not return the same error code to the parent process.

        o It is very tricky to use a debugger while BIGBRO is active - OS/2
          only allows ONE process to be in control.  Both the debugger and
          BIGBRO are potential candidates, but only one will succeed!



Notes on the program itself

    I am using Microsoft C6.00 and OS/2 1.20 and 1.30.  The compilation is:

        cl /AL /Gs /MT /W4 /Zp bigbro.c

    For other compilers you need to find out how to create multi-threaded
    programs, for example for C5.1 you need to use the "mt\xxx" include
    files and the multithreaded C runtime library.


    (1) Each process is handled by a dedicated thread, which issues the PT_STOP
        command until it gets a successful completion.  The PT_LOADED completion
        indicates a module has been loaded, and the first module is the program
        itself.

    (2) PT_FAULT return indicates a protection fault (trap 0D).  The registers
        are written to the logfile and the program is terminated.

    (3) PT_CHILDPID return indicates that this process has started a child
        process, so a fresh thread is created to deal with this.  NOTE this
        simple program doesn't do any checking on the return from _beginthread!

    (4) PT_DYING indicates the process is terminating, and so the return code is
        saved for displaying at the end.

    (5) PT_SIGNAL and PT_ENDTHREAD are ignored - they are merely informative.

    (6) 'Unexpected' return codes cause the program to be aborted as a failsafe
        - if you have programs which cause some of the other errors, such as
        floating point errors, you may wish to treat these errors differently.


Testing and installation

    The program can be tested by invoking as:

       bigbro c:\os2\cmd.exe

    this shells a command prompt at which you can execute a few programs, and
    type EXIT to quit.  Then type c:\shell.log and check that the file contains
    data about processes started and stopping.


    The program is then installed by editing CONFIG.SYS.  The line:

        'PROTSHELL=C:\OS2\PMSHELL.EXE ....'

    is changed by inserting BIGBRO after the '=' as follows:

       'PROTSHELL=C:\DEBUG\BIGBRO.EXE C:\OS2\PMSHELL.EXE ....'

    where in this example I placed BIGBRO.EXE in the C:\DEBUG directory.

    As always when changing CONFIG.SYS for any reason, KEEP A BACKUP in case of
    finger trouble.


    Now reboot your machine.  Start an OS/2 command prompt, and type out
    c:\shell.log for a list of which processes have started and terminated.



----------------------- Program source BIGBRO.C --------------------------

/*****************************************************************************/
/* Include files                                                             */
/*****************************************************************************/

#define         INCL_DOSPROCESS
#define         INCL_DOSINFOSEG
#include        <os2.h>

#include        <stdio.h>
#include        <stdlib.h>
#include        <string.h>
#include        <process.h>

/*****************************************************************************/
/* DosPTrace definitions (these are in some but not all versions of BSEDOS.H */
/*****************************************************************************/

USHORT APIENTRY DosPTrace(PBYTE pPtraceBuf);

/* structure of DosPtrace buffer */
typedef struct _PTRACEBUF {
        PID    pid;               /* Process ID.                             */
        TID    tid;               /* Thread ID, or zero.                     */
        USHORT cmd;               /* command/return code.                    */
        USHORT value;             /* supplementary info.                     */
        USHORT offv;              /* offset value.                           */
        USHORT segv;              /* segment value.                          */
        USHORT mte;               /* library module handle.                  */
        USHORT rAX, rBX, rCX, rDX, rSI, rDI, rBP; /* register contents.      */
        USHORT rDS, rES, rIP, rCS, rF, rSP, rSS;
} PTRACEBUF;
typedef PTRACEBUF FAR *PPTRACEBUF;

/* selected command values to DosPTrace() */
#define PT_GO                0x0007    /* Go                                 */
#define PT_TERM              0x0008    /* Terminate child process.           */
#define PT_STOP              0x000A    /* Stop child process.                */
#define PT_GET_MOD           0x0010    /* Get library-module name.           */

/* selected return codes from DosPTrace() */
#define PT_SUCCESS           0x0000    /* Success return code.               */
#define PT_ERROR             0xFFFF    /* Error.                             */
#define PT_SIGNAL            0xFFFE    /* About to receive signal.           */
#define PT_DYING             0xFFFA    /* Process dying.                     */
#define PT_FAULT             0xFFF9    /* General protection fault occurred. */
#define PT_LOADED            0xFFF8    /* Library module has been loaded.    */
#define PT_ENDTHREAD         0xFFF6    /* thread terminated                  */
#define PT_CHILDPID          0xFFF4    /* child process starting             */


/*****************************************************************************/
/* local variables and definitions                                           */
/*****************************************************************************/

#define STACKSIZE       2048      /* size of stack (standard OS/2 recommended*/

static FILE *logfile = NULL;      /* file to write logging information to    */

static PGINFOSEG ginfo = NULL;    /* used for obtaining time-of-day          */


/*****************************************************************************/
/* fault: process fault message                                              */
/*****************************************************************************/

void fault(PTRACEBUF *bufp, char *program)
   {
   /* Log information about the trap */
   fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: %s  TRAP at %4.4x:%4.4x\n",
        ginfo->hour, ginfo->minutes, ginfo->seconds,
        bufp->pid, program, bufp->segv, bufp->offv);

   fprintf(logfile, "  Registers: AX: %4.4x  BX: %4.4x  CX: %4.4x  DX: %4.4x\n",
                bufp->rAX, bufp->rBX, bufp->rCX, bufp->rDX);

   fprintf(logfile, "             SI: %4.4x  DI: %4.4x  BP: %4.4x  SP: %4.4x\n",
                bufp->rSI, bufp->rDI, bufp->rBP, bufp->rSP);

   fprintf(logfile, "             DS: %4.4x  ES: %4.4x  SS: %4.4x  Fl: %4.4x\n",
                bufp->rDS, bufp->rES, bufp->rSS, bufp->rF);

   /* tell the user something died! */
   DosBeep(300,  250);      DosBeep(200,  250);
   DosBeep(300,  200);      DosBeep(200,  200);
   DosBeep(300,  150);      DosBeep(200,  150);
   DosBeep(300,  100);      DosBeep(200,  100);
   DosBeep(2000, 1000);
   }


/*****************************************************************************/
/* processthread: thread to handle each process                              */
/*****************************************************************************/

void far processthread(PVOID pvoid)
   {
   PTRACEBUF buf;                 /* buffer for DosPTrace                    */
   USHORT rc;                     /* return code from DosPTrace              */
   USHORT cmd = PT_STOP;          /* current DosPTrace command               */
   char modbuf[256];              /* module name buffer                      */
   PSZ progname = (PSZ)modbuf;    /* pointer to program name                 */
   USHORT done = FALSE;           /* TRUE when process finished              */
   USHORT normal_exit = FALSE;    /* TRUE when process exits normally        */
   USHORT return_code = 0;        /* process return code on normal exit      */


   /* clear module name and set up the DosPTrace buffer */
   *progname = '\0';
   buf.pid = (PID) ((ULONG)pvoid);
   buf.tid = 0;

   for (; !done; )
      {
      buf.cmd = cmd;
      if ( (rc = DosPTrace((PBYTE)&buf)) != 0)
         {
         fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: DosPTrace error %u\n",
                 ginfo->hour, ginfo->minutes, ginfo->seconds,
                 buf.pid, rc);
         break;
         }

      switch (buf.cmd)
         {
         case PT_SUCCESS:
            if (cmd == PT_STOP)
               {
               /* PT_STOP returns SUCCESS once all DLLs loaded */
               cmd = PT_GO;
               }
            break;

         case PT_ERROR:
            done = TRUE;
            break;

         case PT_LOADED:
            if (*progname == '\0')
               {
               /* first module is the program - get its name */
               buf.cmd = PT_GET_MOD;
               buf.value = buf.mte;
               buf.offv = OFFSETOF(progname);
               buf.segv = SELECTOROF(progname);
               DosPTrace((PBYTE)&buf);

               /* print out start message */
               fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: Started %s\n",
                   ginfo->hour, ginfo->minutes, ginfo->seconds,
                   buf.pid, progname);
               }
            break;

         case PT_CHILDPID:
            /* start another thread for the new process */
            _beginthread(processthread, NULL, STACKSIZE, (PVOID)buf.value );
            break;

         case PT_ENDTHREAD:
         case PT_SIGNAL:
            break;

         case PT_DYING:
            normal_exit = TRUE;
            return_code = buf.value;
            break;

         case PT_FAULT:
            fault(&buf, progname);
            cmd = PT_TERM;
            break;

         default:
            fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: Aborted - error %i\n",
                      ginfo->hour, ginfo->minutes, ginfo->seconds,
                      buf.pid, buf.cmd);
            cmd = PT_TERM;
            break;
         }
      }

   fprintf(logfile, "%2.2i:%2.2i:%2.2i - %4.4x: Terminated %s",
      ginfo->hour, ginfo->minutes, ginfo->seconds,
      buf.pid, progname);

   if (normal_exit)
      fprintf(logfile, " returning %u", return_code);

   fprintf(logfile, "\n");
   }



/*****************************************************************************/
/* init: initialise the program                                              */
/*****************************************************************************/

short init(char **argv, PID *pidptr)
   {
   char program[256];             /* program name + arguments                */
   SEL gsel, lsel;
   RESULTCODES res;
   USHORT rc;
   USHORT len;                    /* length of program name                  */

   /* extract GINFOSEG pointer for time of day */
   DosGetInfoSeg(&gsel, &lsel);
   ginfo = MAKEPGINFOSEG(gsel);

   /* set up program name */
   strcpy(program, *argv++);
   len = strlen(program);

   /* append command line arguments, if any */
   for (; *argv != NULL; argv++)
      {
      strcat(program, " ");
      strcat(program, *argv);
      }

   program[strlen(program) + 1] = '\0';
   program[len] = '\0';

   rc = DosExecPgm(NULL, 0,
              EXEC_TRACE * 2,  /* magic number for CHILD process debug!   */
              program, NULL, &res, program);


   if (rc != 0)
      {
      fprintf(logfile, "Error %u executing %s\n", rc, program);
      }
   else
      {
      /* save PID */
      *pidptr = res.codeTerminate;
      }

   return (rc);
   }


/*****************************************************************************/
/* M A I N  P R O G R A M                                                    */
/*****************************************************************************/

int main(int argc, char **argv)
   {
   int rc;                        /* return code */
   PID mainpid;                   /* PID of PMSHELL */


   printf("Big Brother started...\n");

   argv++, argc--;                /* skip our own name */

   /* open log file and make sure PMSHELL does not get it too */
   logfile = fopen( "C:\\SHELL.LOG", "w" );
   setbuf(logfile, NULL);
   DosSetFHandState(fileno(logfile), OPEN_FLAGS_NOINHERIT);

   if ((rc = init(argv, &mainpid)) == 0)
      processthread( (PVOID)mainpid );

   return rc;
   }


        Roger Orr                                       10-Aug-1991