Wednesday, September 11, 2013

This page contains a replay of highly popular webinar series that introduces you to the world of multicore and manycore computing with Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Expert technical teams at Intel discuss development tools, programming models, vectorization, and execution models that will get your development efforts powered up to get the best out of your applications and platforms.

Topics
  • Introduction by James Reinders
  • Introduction to the Intel® Xeon Phi™ Coprocessor
  • Optimizing and Compilation for Intel® Xeon Phi™ Coprocessor
  • GNU Debugger Intel® Xeon Phi™ Coprocessor
  • Message Passing Interface (MPI) on Intel® Xeon Phi™ Coprocessor
  • Get Ready for Intel® Math Kernel Library on Intel® Xeon Phi™ Coprocessor
  • Performance analysis on Intel® Xeon Phi™ Coprocessor

Clink here to open the Intel website link

Saturday, August 17, 2013


Debugging using gdb - A walkthrough


To illustrate the debugging process, there are C and Fortran example codes at the end of the tutorial that include both a floating point error and a segmentation fault. These examples are trivial, and are simply intended to show how easy it is to use the debugger. Note that the behaviour of the debugger is the same regardless of the language one is using, so we'll just show the C example in the walk-through that follows.

first bug: an FPE

First, to illustrate what happens when the code is run as is:

gcc bugs.c
./a.out
Floating point exception
Notice the Floating point exception message, and the fact that it exited. To debug it in gdb
First compile:

gcc -Wall -O0 -g bugs.c


Now start the debugger, specifying the program we want to debug:

gdb a.out


At this point, the program will be loaded, but is not running, so start it:

(gdb) r
Starting program: /nar_sfs/work/snuser/bugs/a.out
Program received signal SIGFPE, Arithmetic exception.
0x00000000004004db in divide (d=0, e=1) at /nar_sfs/work/snuser/bugs/bugs.c:5
5 printf("%f\n",e/d);


Note that the debugger will stop at the FPE, and show which function/routine it was in, what values input arguments had, the line number of the source file where the problem occured, and the actual line of the file. In this case this output is sufficient to diagnose the problem: clearly e/d is undefined since the denominator is zero. One can also look at a stack trace, to see what has been called up till this point:

(gdb) where
#0 0x00000000004004db in divide (d=0, e=1) at /nar_sfs/work/snuser/bugs/bugs.c:5
#1 0x00000000004005e7 in main (argc=1, argv=0x7fbfffeab8) at /nar_sfs/work/snuser/bugs/bugs.c:24


An important caveat concerning the stack trace is that the debugger may display a deep stack (ie. a long list of functions that have been entered), indicating a problem triggered inside a system library. While the system library function is the last function that was executed before the program failed, it is unlikely that there is actually a bug in the system library. One should trace back through the stack to the last call from the program into the library and inspect the arguments that were given to the library function to ensure that they are sensible - typically errors in system libraries occur when the library functions are called with incorrect arguments.


In addition to the stack trace, one may look at the source code file, centered around a particular line:

(gdb) l 5
1 #include <stdio.h>
2
3 void divide(float d, float e)
4 {
5 printf("%f\n",e/d);
6 }
7
8 void arrayq(float f[], int q)
9 {
10 printf("%f\n",f[q]);

One can inspect the values of different variables in the current level of the stack:

(gdb) p d
$1 = 0
(gdb) p e
$2 = 1

Or one can go "up" the stack to look at values in the calling function/routine:

(gdb) up
#1 0x00000000004005e7 in main (argc=1, argv=0x7fbfffeab8) at /nar_sfs/work/snuser/bugs/bugs.c:24
24 divide(a,b);
(gdb) p a
$3 = 0
(gdb) p b
$4 = 1
When one is finished, it's easy to exit:
(gdb) q
The program is running. Exit anyway? (y or n) y


second bug: a segmentation fault

Now, to illustrate a segfault, change the denominator to be non-zero, eg. a=4.0 Compile the modified code, and run it to see what happens:

./a.out
0.250000
Segmentation fault



Notice the Segmentation fault message, and the fact that the job exited with code 139. To debug it in gdb:

gdb a.out
(gdb) r
Starting program: /nar_sfs/work/snuser/bugs/a.out
0.250000
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400514 in arrayq (f=0x7fbfffe980, q=12000000) at /nar_sfs/work/snuser/bugs/bugs.c:10
10 printf("%f\n",f[q]);

Note that the program stops automatically when it hits the segmentation fault, and shows you which function it is in, the values of the input variables, and the line in the source. One can then try printing out the values of the array, to see why it would have a problem:

(gdb) p f
$1 = (float *) 0x7fbfffe980
(gdb) p f[1]
$2 = 1
(gdb) p f[9]
$3 = 9
(gdb) p f[q]
Cannot access memory at address 0x7fc2dc5580


So it is clear that the program is trying to access something it shouldn't be. Note that this is lucky - had one accidently tried to access something just outs ide the array bounds:

(gdb) p f[11]
$4 = 0
(gdb) p f[1000]
$5 = 7.03598541e+22
(gdb) p f[10000]
Cannot access memory at address 0x7fc00085c0

It would have resulted in a valid number and the program would have carried on, but the results of the program would have been wrong. So one can't count on an array out of bounds to always result in a segmentation fault. Often segmentation faults occur when there are problems with pointers, since they may point to innaccessable addresses, or when a program tries to use too much memory. Using a debugger greatly helps in identifying these sorts of problems.

using core files


If a program uses a lot of memory, does not trigger an error condition in a reproducible manner, or takes a long time before it reaches the error condition then it shouldn't be debugged interactively (at least in the first instance). In these situations one should submit the debugging instrumented program to the cluster as a compute job such that it will produce a core file when it crashes. A core file contains the state of the program at the time it crashed - one can then load this file into the debugger to inspect the state and determine what caused the problem.


By default your Linux environment may not configured to produce core files. To enable core files, when using the bash shell on your system (the default shell) one must set the core limit to be non-zero. Setting it to be unlimited should suffice, eg.

ulimit -c unlimited

then when one runs a program that crashes it should indicate that it has produced (dumped) a core file, eg.

gcc -g bugs.c
./a.out
0.250000
Segmentation fault (core dumped)

The core file should appear in the present working directory with a name of the form core.PID , where PID is the process id of the program instance that crashed. Note: for anything more complex than the examples provided in this tutorial you should submit this as a job to the cluster, in which case the core file will be placed in the working directory used by the job but one must submit their job with the -f permitcoredump option specified to sqsub .


One can then load this into gdb as an additional argument to gdb, eg.

gdb a.out core.10966
#0 0x0000000000400514 in arrayq (f=0x7fbfffdfc0, q=12000000) at /home/merz/bugs/bugs.c:10
10 printf("%f\n",f[q]);
(gdb) where
#0 0x0000000000400514 in arrayq (f=0x7fbfffdfc0, q=12000000) at /home/merz/bugs/bugs.c:10
#1 0x00000000004005f3 in main (argc=1, argv=0x7fbfffe0f8) at /home/merz/bugs/bugs.c:26
(gdb) q


Note that in this case one does not need to run the program in the debugger - it will simply inspect the state of the core file and use the debugging-instrumented binary to display the type of error and where it occurs. One may then run the gdb where command to get the stack backtrace, etc., to further identify the problem.

As long as one sets their core size limit with the ulimit command before submitting their job, and submits their job with the sqsub -f permitcoredump flag, then this environment setting should propagate to their job and the program should generate a core. Keep in mind that this setting will not persist between logins, so you should either put it in your shell configuration file (eg. ~/.bash_profile ) or run it any time you log into a system if you want your programs to produce a core when they crash.

debugging interactively



If you need to view the state of the program leading up to the crash, perhaps repeatedly, then a core file won't suffice and it is suggested that one submit this as an interactive job (avoid running this on the login node!). If possible, one should try to resume the program from a checkpoint that is near to the crash to avoid waiting a long time while the program reaches the erroneous state.


One can start gdb as follows:

gdb ./a.out

One can then proceed to debug in the usual fashion:

r
(gdb) Starting program: /nar_sfs/work/snuser/bugs/a.out
Program received signal SIGSEGV, Segmentation fault.
0x0000000000400514 in arrayq (f=0x7fbfffd740, q=12000000)
at /nar_sfs/work/snuser/bugs/bugs.c:10
10 printf("%f\n",f[q]);

When you exit the debugger the job will automatically terminate

(gdb) q

Note: you may not see the (gdb} prompt, or it may appear out of order (as above), but you can proceed as though it were there.


Examples


FORTRAN CODEbugs.fC CODEbugs.c
 program bugs
     implicit none
     real a,b
     real c(10)
     integer p
     a=0.0
     b=1.0
     do p=1,10
         c(p)=p
     enddo
     p=12000000
     call divide(a,b)
     call arrayq(c,p)
 end program
 
 subroutine divide(d,e)
     implicit none
     real d,e
     print *,e/d
 end subroutine
 
 subroutine arrayq(f,g)
     implicit none
     real f(10)
     integer g
     print *,f(g)
 end subroutine
 #include <stdio.h>
 
 void divide(float d, float e)
 {
     printf("%f\n",e/d);
 }
 
 void arrayq(float f[], int q)
 {
     printf("%f\n",f[q]);
 }
 
 int main(int argc, char **argv)
 {
     float a,b;
     float c[10];
     int p;
     a=0.0;
     b=1.0;
     for (p=0;p<10;p++)
     {
         c[p]=(float)p;
     }
     p=12000000;
    divide(a,b);
    arrayq(c,p);
    return(0);
 }



Source ::

Wednesday, August 7, 2013



Matrix Market File format & PetSc Bin


If you have developed a CFD solver then most likely you have generated a sparse linear system which needs to be solved one way or the other. Apart from the standard Intel or other libraries, it would be definitely advantageous to use the PetSc library. PetSc has huge capabilities varying from linear solvers, pre-conditioners, eigenvalue analysis, to the CPU / GPU implementations.

Getting to know PetSc will take sometime but it is greatly beneficial to employ matrix and vector operations. The only requirement now is to convert the system we have to PetSc known format. This is good for initial testing only. Here are two routines developed by P.Kumar (PetSc) to convert the Matrix in COO format or Matrix Market file format to PetSc Matrix bin format, and the other to convert vectors.

MatrixMarket_to_PetScBin.c

VectorMarket_to_PetScBin.c

README.TXT

Thursday, July 18, 2013


The GPGPU Continuum from mWatts to peta flops


The GPU continuum workshop addresses the current and future challenges of the GPGPU (heterogeneous systems) community. GPUs that started as graphics accelerators are now widely used in many different computational domains, such as medical, defense, silicon inspection, computer vision, signal processing and more, due to their dramatic improvement of the performance per power for domain specific applications.
 
This workshop aims to address two important directions of future GPGPU based systems: at the high-end, it will focus on using GPUs as part of clusters, clouds, compute farms, etc., that aim to achieve peta flops of computations. At the low-end, it will focus on using GPUs as part of mobile devices, which limits the power consumption of the GPU to mWatts. We conclude the workshop with a panel discussion on the differences and similarities and on the challenges each domain faces.



The videos and slides of the workshop are here... [Link]

Monday, May 6, 2013



Wave Rider at 5.1 Mach

Tuesday, April 23, 2013



Note on Technical Writing

Writing is hard and essential for any researcher, considering the fact that it is the sole objective of research itself (i.e. to propagate knowledge). Many of us fall-behind in this aspect (so as me) and to help alleviate this are guidelines or best practices written by experts. Two of them that I find useful are;

1. NASA - Technical Report Writing
2. Style and Ethics of Communication in Science and Engineering
3. Note on Literature Review

Happy writing !

Thursday, April 11, 2013


Instrumentation using gprof

gprof is a free performance analysis tool on Linux. One is always concerned about the efficiency of the code developed. Also, timing it manually might be tedious. To elevate this we could use performance analysis tools to time the code automatically and remove possible bottlenecks. This process is called profiling and there are other advanced tools which are available. However, gprof is a start point for any beginner or early analysis.

Ok, quickly getting to the topic. The following steps could be followed for profiling a simple C or FORTRAN code.

Step 1:
Enabling profiling is as simple as adding -pg to the gcc compile flags. T

gcc mycode.c -pg

Step 2:
Now run the code normally for sufficiently large number of samples. The code could be setup to run for few samples rather than the full runtime. However, it is good to keep the execution running for few seconds for gprof to collect statistics accurately.

./a.out [arguments if any] 

Step 3:
Finally, the profiler is called and the two inputs for grpof are, the name of the executable and a file gmon.out which is the monitor file generated while execution.

gprof example1 gmon.out -p

Here -p option refers to flat graph, alternatively, -q refers to call graph.

Flat graph, prints the functions called and the runtime of each function or subroutine.
Whereas, call graph in addition gives out the detail of each function separately.
Additionally, -A option print out the code bits overlayed with the timing and number of calls. 

One could simply check the validity of the timings using time command as

time ./a.out [arguments if any]

It is to be noted that system commands in the code should be avoided as they are not timed by gprof. A discussion is found hereFor a longer and definitely better understanding, see article.







Code instrumentation for timers and other runtime data

1. Tau Link
2. Scalasca Link
3. Open Speedshop Link

Friday, April 5, 2013



Essential list of Validation & Verification cases for CFD solvers



This list is generally for mainly compressible flows & suggestions for incompressible flows is greatly appreciated !


1. Best Practices Guide

A tutorial from NASA on best practices in verification and validation of CFD solvers.
[ NASA Link ]


2. Turbulence Models 

NASA provides a series of test cases to validate & verify turbulence models from Spalart-Allamaras to LES / DNS.
NASA LARC - Turbulence models V&V ]


3. Comprehensive Test cases

A comprehensive list of test cases for various CFD solver and its components.
[ NPARC Alliance Verification and Validation Archive ]


4. Another set of Test cases

A list of test cases for NASA's CFL3D solver.
[ CFL3D Test cases ]


5. Drag prediction workshop

Test cases for production grade solvers
[ AIAA Drag prediction Workshop  ]


6. A collection of resources

From CFD-online
[ CFD-online Validation & Verification ]


7. AGARD Experimental data for CFD code validation

PDF1 4mb
PDF2 22mb

Also AIAA papers
1. AIAA 93-0002 Dryden Lectureship in Research - A perspective on CFD validation.
2. Von Karman Institute, Verification and validation of computational fluid dynamics. In association with the Thematic Network FLOWNET (European Commission DG XII) June 5 - 8, 2000; F. Grasso, J. Périaux, H. Deconinck


8. Higher-order Schemes

A list of test cases for higher-order scheme accuracy. [Wang]

9. Some other validation cases

SimJournal: Ammar Hakim’s Simulation Journal. [Ammar Hakim]


Thursday, April 4, 2013



CUDA Video Tutorials

Prerequisites : moderate level in C/C++ programming


Learn the fundamentals of parallel computing with the GPU and the CUDA programming environment! In this class, you'll learn about parallel programming by coding a series of image processing algorithms, such as you might find in Photoshop or Instagram. You'll be able to program and run your assignments on high-end GPUs, even if you don't own one yourself.

Subscribe to RSS Feed Follow me on Twitter!