microbenchmarking extension: using hardware counters

Introduction

Google Summer of Code 2011

Tcl/tk community

Design Document (First Draft)

Project:

Micro-benchmarking extension: access to CPU performance counters

Developer student: Saurabh Kumar

Project Mentor: Edward Brekelbaum

Repository: https://code.google.com/p/tcl/

Details

Project Details

The goal of this project is to design and implement a Tcl extension with commands to interact with the CPU's hardware counters. We first plan to code an extension that works under Linux. The proposed Tcl extension will be based on Tcl C API. There exist tools like the Linux performance counter subsystem (Perf) that provide access to the hardware counters. Perf provides rich abstractions over the hardware counters' capabilities. It provides per task, per CPU and per-workload counters, counter groups, and it provides sampling capabilities on top of those and more. It also provides abstraction for 'software events' - such as minor/major page faults, task migrations, task context-switches and tracepoints.

There are several ways one could use the Perf tool. For example, The perf tool can be used to get access to the hardware counters by using the perf syscalls ( sys_perf_counter_open syscall etc.).

There exist an open source software package Performance Application Programming Interface (PAPI http://icl.cs.utk.edu/papi/ ) that aims to provide the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors.

It provides a nice API which makes the access of hardware counters through C/C++ code quite easy. It appears to be a wrapper around Perf to allow some additional platforms.

We plan to use this software package to get access to the hardware counters.

Implementation details

Programming Language to be used: C++

Basic Structure of the code:

We plan to create an ensemble bound to a namespace, which consists of a collection of subcommands. The following code fragment illustrates what we plan to implement:

int Perf_Init(Tcl_Interp *interp)
{
 // create namespace for the commands
 Tcl_Namespace *ns = Tcl_CreateNamespace(interp, "perf", NULL, NULL);

 // tell Tcl to grab all subcommands on import
 Tcl_Export(interp, ns, "*", 0);

 // create the subcommands

 // Initialises the perf tool
 Tcl_CreateObjCommand(interp, "perf::init", Init_Cmd, NULL, NULL);

 // Runs the tool on the file name given as command line argument
 Tcl_CreateObjCommand(interp, "perf::run_file", Run_Cmd_File, NULL, NULL);

 // Runs the tool on the tcl script given as command line argument
 Tcl_CreateObjCommand(interp, "perf::run_script", Run_Cmd_Script, NULL, NULL);

 // create the ensemble
 Tcl_CreateEnsemble(interp, "perf", ns, 0);

 // success
 return TCL_OK;
}

This code creates a “perf” namespace. We plan to implement the following three sub-commands: 1) perf::init 2) perf::run_file 2) perf::run_script

The perf::init sub-command:

The init sub-command is meant for initializing the PAPI library. It takes a list of event names as argument. This list should contain all the events which should be under investigation during the run commands. e.g. if you want to count the following events: 1) L1 data cache misses 2) Total cycles 3) Instructions issued 4) Floating point operations

the following list should be passed as an argument to the init subcommand

{PAPI_L1_DCM PAPI_TOT_CYC PAPI_TOT_IIS PAPI_FP_OPS}

So we could write like

set events_list {PAPI_L1_DCM PAPI_TOT_CYC PAPI_TOT_IIS PAPI_FP_OPS}

and then init command can be called as

perf::init $events_list;

After initializing the PAPI library, init subcommand creates a PAPI eventset for the given event_list.

So the code would be somewhat like

static int Init_Cmd(ClientData cdata, Tcl_Interp *interp, int objc, Tcl_Obj *const objv[]) {

// initaizing library
if (PAPI_library_init(PAPI_VER_CURRENT) != PAPI_VER_CURRENT)
 {
    printf("Error initializing the PAPI Library");
 exit(1);
 }
.
.
.
//creating event set
if (PAPI_create_eventset(&event_set) != PAPI_OK) {
    printf("Error in subroutine PAPIF_create_eventset");
   exit(1);
}
.
.
.
for all events
do
{
PAPI_add_event(event_set, event_code);
}
.
.
.

Apart from this the init subcommand will also complete certain other operations that should be completed before calling the run commands e.g. allocating sufficient memory for storing results etc.

After running the init subcommand tcl interpreter will respond like

% perf::init $events_list;
Hardware Counters Initialized

in case the initialization is successful or will print appropriate error message in case of some error in initialization.

The perf::run_file sub-command:

This command should follow the successful call of perf::init subcommand. After the initialization of PAPI library and the PAPI event_set, the run_file subcommand uses PAPI_start() function to start counting hardware events in an event set. It then calls appropriate functions to run the tcl script written in the the file name given as the command line argument.

After the script is over it calls PAPI_stop() to stop counting hardware events in an event set and collect the results from hardware counters. Moreover it also resets the event set and other relevant things so that perf::init can be called again later.

As a return value, it returns a Tcl object storing a string list containing the data recorded by the hardware counters.

The following code fragment illustrates the functioning.

The C++ code for this command will look like

static int Run_file_Cmd(ClientData cdata, Tcl_Interp *interp, int objc, Tcl_Obj *const objv[])  {    
.
.
.
// start the counters
 if(PAPI_start(event_set)!= PAPI_OK) {
printf("Abort After PAPIF_start: ");
    exit(1);
   }
   
   // run the script
   Tcl_EvalFile(interp,file_name);


//stop counting

     if (PAPI_stop(event_set, values)!= PAPI_OK) {
     printf("Abort After PAPIF_stop: ");
         exit(1);
     }
.
.
.
   PAPI_cleanup_eventset(event_set);
..
   PAPI_destroy_eventset(&event_set);
..
   event_set=PAPI_NULL;
..

// return the result in appropriate format
.
.
.
.  
}

The command will work as illustrated below in the tcl interpreter (assuming perf::init has been already called):

% perf::run_file file_name

On a successful run it will return the recorded data in the following format e.g.

% set events_list {PAPI_L1_DCM PAPI_TOT_CYC PAPI_TOT_IIS PAPI_FP_OPS}
.
.
.
% perf::init $events_list
Hardware Counters Initialized
.
.
.

% perf::run_file file_name
.
.
PAPI_L1_DCM 3602 | PAPI_TOT_CYC 961924 | PAPI_TOT_IIS 963808 | PAPI_FP_OPS 1604

The perf::run_script sub-command:

This command is quite similar to the run_file command, the only difference being that the command line argument is a string variable containing the tcl script to be executed.

As a return value, it returns a Tcl object storing a string list containing the data recorded by the hardware counters.

The following code fragment illustrates the functioning.

The C++ code for this command will look like

static int Run_script_Cmd(ClientData cdata, Tcl_Interp *interp, int objc, Tcl_Obj *const objv[])  {    
.
.
.
// start the counters
 if(PAPI_start(event_set)!= PAPI_OK) {
printf("Abort After PAPIF_start: ");
    exit(1);
   }
   
   // run the script
   Tcl_EvalObjEx(interp, string_argument, flags);


//stop counting

     if (PAPI_stop(event_set, values)!= PAPI_OK) {
     printf("Abort After PAPIF_stop: ");
         exit(1);
     }
.
.
.
PAPI_cleanup_eventset(event_set);
   PAPI_destroy_eventset(&event_set);
   event_set=PAPI_NULL;

// return the result in appropriate format
.
.
.
.
}

The command will work as illustrated below in the tcl interpreter (assuming perf::init has been already called):

% perf::run_script $script_string

On a successful run it will return the recorded data in the following format e.g.

% set events_list {PAPI_L1_DCM PAPI_TOT_CYC PAPI_TOT_IIS PAPI_FP_OPS}
.
.
.
% perf::init $events_list
Hardware Counters Initialized
.
.
% set script_string {puts [expr 2+3]}
.

% perf::run_script $script_string
5
PAPI_L1_DCM 394 | PAPI_TOT_CYC 54897 | PAPI_TOT_IIS 46274 | PAPI_FP_OPS 27