Google
Web cprogramming.com
Click Here!
Starting out
Getting Started
Tutorials
Quizzes
Moving on
Advanced Tutorials
Articles
Challenges
Contests
Tips and Tricks
Jobs
Tools
What do I need?
Compilers
Editors
Debuggers
Resources
Source Code
Syntax Reference
Snippets
Links Directory
Glossary
Book Reviews
Function Lookup
Questions
Programming FAQ
Message Board
Ask an Expert
Email
An Affiliate of AIHorizon
Debugging Segmentation Faults and Pointer Problems
For new programmers, debugging errors associated with pointers can be a
nightmare. "Segmentation Fault (core dumped)" is a pretty vague error
message, and it's even worse when strange bugs start appearing that
don't cause segmentation faults -- but that result in things like memory
getting overwritten in unexpected ways.
But finding problems with pointers is easier than you'd think. Those
segfaults often turn out to be among the easiest bugs to find, and using
special tools such as Valgrind, even finding buffer overflows is
simplified.
This tutorial assumes that you have a basic knowledge of pointers such
as can be acquired by reading a pointer tutorial
. It would help to be
running a system that has a debugger such as GDB, or to at least have
sufficient familiarity with GDB-like debuggers to understand the
examples presented. Finally, for finding buffer overflows and other
invalid uses of memory, you will fare best with Valgrind
, though none of
the examples will use it.
What is a segmentation fault?
When your program runs, it has access to certain portions of memory.
First, you have local variables in each of your functions; these are
stored in the stack. Second, you may have some memory, allocated during
runtime (using either malloc, in C, or new, in C++), stored on the heap
(you may also hear it called the "free store"). Your program is only
allowed to touch memory that belongs to it -- the memory previously
mentioned. Any access outside that area will cause a segmentation fault.
Segmentation faults are commonly refered to as segfaults.
There are four common mistakes that lead to segmentation faults:
dereferencing NULL, dereferencing an uninitialized pointer,
dereferencing a pointer that has been freed (or deleted, in C++) or that
has gone out of scope (in the case of arrays declared in functions), and
writing off the end of an array.
A fifth way of causing a segfault is a recursive function that uses all
of the stack space. On some systems, this will cause a "stack overflow"
report, and on others, it will merely appear as another type of
segmentation fault.
The strategy for debugging all of these problems is the same: load the
core file into GDB, do a backtrace, move into the scope of your code,
and list the lines of code that caused the segmentation fault.
For instance, running on a Linux system, here's an example session:
% gdb example core
This just loads the program called example using the core file called
"core". The core file contains all the information needed by GDB to
reconstruct the state of execution when the invalid operation caused a
segmentation fault.
Once we've loaded up gdb, we get the following:
Some copyright info
Core was generated by `example'.
Program terminated with signal 11, Segmentation fault.
Some information about loading symbols
#0 0x0804838c in foo() () at t.cpp:4
4 *x = 3;
So, execution stopped inside the function called foo() on line 4, which
happened to be the assignment of the number 3 to the location pointed to
by x. This is a goldmine of information: we already know exactly where
the problem happened and which pointer was involved.
(gdb) list
1 void foo()
2 {
3 char *x = 0;
4 *x = 3;
5 }
6
7 int main()
8 {
9 foo();
10 return 0;
(gdb)
Since this is a somewhat contrived example, we can immediately find the
error. The pointer x is initialized to 0, equivalent to NULL (in fact,
NULL is a stand-in for 0), and we know that it's a no-no to then try to
access that pointer.
But what if it weren't so obvious? Simply printing the value of the
pointer can often lead to the solution. In this case:
(gdb) print x
$1 = 0x0
Printing out x reveals that it points to memory address 0x0 (the 0x
indicates that the value following it is in hexadecimal, traditional for
printing memory addresses). The address 0x0 is invalid -- in fact, it's
NULL. If you dereference a pointer that stores the location 0x0 then
you'll definitely get a segmentation fault, just as we did.
If we'd gotten something more complicated, such as execution crashing
inside a system call or library function (perhaps because we passed an
uninitialized pointer to fgets), we'd need to figure out where we called
the library function and what might have happened to cause a segfault
within it. Here's an example from another debugging session:
#0 0x40194f93 in strcat () from /lib/tls/libc.so.6
(gdb)
This time, the segfault occurred because of something inside strcat.
Does this mean the library function did something wrong? Nope! It means
that we probably passed a bad value to the function. To debug this, we
need to see what we passed into strcat.
So let's see what function call we made that led to the segfault.
(gdb) backtrace
#0 0x40194f93 in strcat () from /lib/tls/libc.so.6
#1 0x080483c9 in foo() () at t.cpp:6
#2 0x080483e3 in main () at t.cpp:11
(gdb)
Backtrace lists the function calls that had been made at the time the
program crashed. Each function is directly above the function that
called it. So foo was called by main in this case. The numbers on the
side (#0, #1, #2) also indicate the order of calls, from most recent to
longest ago.
To move from viewing the state within each function (encapsulated in the
idea of a stack frame), we can use the up and down commands. Right now,
we know we're in the strcat stack frame, which contains all of the local
variables of strcat, because it's the top function on the stack. We want
to move "up" (toward the higher numbers); this is the opposite of how
the stack is printed.
(gdb) up
#1 0x080483c9 in foo() () at t.cpp:6
6 strcat(x, "end");
(gdb)
This helps a little -- we know that we have a variable called x and a
constant string. We should probably lookup the strcat
function at this point to
make sure that we got the order of arguments correct. Since we did, the
problem must be with x.
(gdb) print x
$1 = 0x0
There it is again: a NULL pointer. The strcat function must be
derefencing a NULL pointer that we gave it, and even though it's a
library function, it doesn't do anything magical.
NULL pointers are generally pretty easy to work with -- once we've found
one, we know that somewhere along the line, we didn't allocate some
memory that we should have. It's just a question of where. A common
mistake is to not check the return from malloc to make sure that the
system isn't out of memory. Another common mistake is to assume that a
function that calls malloc doesn't return NULL even though it returns
the result of malloc. Note that in C++, when you call new, it will throw
an exception, bad_alloc, if sufficient memory cannot be allocated. Your
code should be prepared to handle this situation cleanly, and if you
choose to catch the exception and return NULL inside a function that
ordinarily returns a new'ed pointer, this advice still holds.
char *create_memory()
{
char *x = malloc(10);
if(x == NULL)
{
return NULL;
}
strcpy(x, "a string");
return x;
}
void use_memory()
{
char *new_memory = create_memory();
new_memory[0] = 'A'; /* make it a capital letter */
}
We did a good thing by checking to make sure that malloc succeeds before
using the memory in create_memory, but we don't check to make sure that
create_memory returns a valid pointer! Shame on us. This is a bug that
won't catch you until you're running your code on a real system unless
you explicitly test your code in low memory situations.
Dereferencing an Uninitialized Pointer
Figuring out whether or not a pointer has been initialized is a bit
harder than figuring out whether a pointer is NULL. The best way to
avoid using an uninitialized pointer is to set your pointers to NULL
when you declare them (or immediately initialize them). That way, if you
do use a pointer that hasn't had memory allocated for it, you will
immediately be able to tell.
If you don't set your pointers to NULL when you declare them, then
you'll have a much harder time of it (remember that non-static variables
aren't automatically initialized to anything in C or C++). You might
need to figure out if 0x4025e800 is valid memory. One way you can get a
sense of this in GDB is by printing out the addresses stored in other
pointers you've allocated. If they're fairly close together, you've
probably correctly allocated memory. Of course, there's no guarantee
that this rule of thumb will hold on all systems.
In some cases, your debugger can tell you that an address is invalid
based on the value stored in the pointer. For instance, in the following
example, GDB indicates that the char* x, which I set to point to the
memory address "30", is not accessible.
(gdb) print x
$1 = 0x1e
(gdb) print *x
Cannot access memory at address 0x1e
Generally, though, the best way to handle such a situation is just to
avoid having to rely on memory's being close together or obviously
invalid. Set your variables to NULL from the beginning.
Dereferencing Freed Memory
This is another tricky bug to find because you're working with memory
addresses that look valid. The best way to handle such a situation is
again preventative: set your pointer to point to NULL as soon as you've
freed it. That way, if you do try to use it later, then you'll have
another "dereferencing NULL" bug, which should be much easier to track.
Another form of this bug is the problem of dealing with memory that has
gone out of scope. If you declare a local array such as
char *return_buffer()
{
char x[10];
strncpy(x, "a string", sizeof(x));
return x;
}
then the array, x, will no longer be valid once the function returns.
This is a really tricky bug to find because once again the memory
address will look valid when you print it out in GDB. In fact, your code
might even work sometimes (or just display weird behavior by printing
whatever happens to be on the stack in the location that used to be the
memory of the array x). Generally, the way you'll know if you have this
kind of bug is that you'll get garbage when you print out the variable
even though you know that it's initialized. Watch out for the pointers
returned from functions. If that pointer is causing you trouble, check
the function and look for whether the pointer is pointing to a *local
variable* in the function. Note that it is perfectly fine to return a
pointer to memory allocated in the function using new or malloc, but not
to return a pointer to a statically declared array (e.g., char x[10]).
Tools such as Valgrind can be immensely helpful in tracking down these
bugs because they watch memory to ensure that it's valid. If it isn't,
Valgrind will alert you. Our Valgrind tutorial
goes into more
detail about finding this sort of bug.
Of course, the best solution is simply to avoid ever doing anything like
this. Technically, you could use a static buffer, which would allow you
tho have a permanent buffer you could pass around. But this is only
asking for trouble if you later decide, for whatever reason, that you
don't need it to be static (if you forget why you made it static in the
first place, for instance).
Writing off the end of the array
Generally, if you're writing off the bounds of an array, then the line
that caused the segfault in the first place should be an array access.
(There are a few times when this won't actually be the case -- notably,
if the fact that you wrote off an array causes the stack to be smashed
-- basically, overwriting
the pointer that stores where to return after the function completes.)
Of course, sometimes, you won't actually cause a segfault writing off
the end of the array. Instead, you might just notice that some of your
variable values are changing periodically and unexpectedly. This is a
tough bug to crack; one option is to set up your debugger to watch a
variable for changes and run your program until the variable's value
changes. Your debugger will break on that instruction, and you can poke
around to figure out if that behavior is unexpected.
(gdb) watch [variable name]
Hardware watchpoint 1: [variable name]
(gdb) continue
...
Hardware watchpoint 1: [variable name]
Old value = [value1]
New value = [value2]
This approach can get tricky when you're dealing with a lot of
dynamically allocated memory and it's not entirely clear what you should
watch. To simplify things, use simple test cases, keep working with the
same inputs, and *turn off randomized seeds* if you're using random
numbers!
Stack Overflows
A stack overflow
isn't the same type of pointer-related problem as the others. In this
case, you don't need to have a single explicit pointer in your program;
you just need a recursive function without a base case. Nevertheless,
this is a tutorial about segmentation faults, and on some systems, a
stack overflow will be reported as a segmentation fault. (This makes
sense because running out of memory on the stack will violate memory
segmentation.)
To diagnose a stack overflow in GDB, typically you just need to do a
backtrace:
(gdb) backtrace
#0 foo() () at t.cpp:5
#1 0x08048404 in foo() () at t.cpp:5
#2 0x08048404 in foo() () at t.cpp:5
#3 0x08048404 in foo() () at t.cpp:5
[...]
#20 0x08048404 in foo() () at t.cpp:5
#21 0x08048404 in foo() () at t.cpp:5
#22 0x08048404 in foo() () at t.cpp:5
---Type to continue, or q to quit---
If you find a single function call piling up an awfully large number of
times, this is a good indication of a stack overflow.
Typically, you need to analyze your recursive function to make sure that
all the base cases (the cases in which the function should not call
itself) are covered correctly. For instance, in computing the factorial
function
int factorial(int n)
{
// What about n < 0?
if(n == 0)
{
return 1;
}
return factorial(n-1) * n;
}
In this case, the base case of n being zero is covered, but what about n
< 0? On "valid" inputs, the function will work fine, but not on
"invalid" inputs like -1.
You also have to make sure that your base case is reachable. Even if you
have the correct base case, if you don't correctly progress toward the
base case, your function will never terminate.
int factorial(int n)
{
if(n <= 0)
{
return 1;
}
// Ooops, we forgot to subtract 1 from n
return factorial(n) * n;
}
Summary
While segmentation faults can be nasty and difficult to track down when
you are first learning to program, over time you will start to see them
as falling into a small number of patterns that are relatively easy to
track down. This tutorial hasn't covered every possible scenario for
causing segmentation faults, but it touches on many of the basic
problems you may encounter.
------------------------------------------------------------------------
*Related articles
Read more about debugging strategies
Learn more about using GDB
Using Valgrind to hunt memory errors
Read more about the psychological aspects of programming
Learn more about secure coding and avoiding pointer problems
*
-----
Interested in advertising with us
?
Please read our privacy policy .
Copyright © 1997-2005
Cprogramming.com. All rights reserved.
Geodesy Designs
Affilated with Disneyland Report: Disney News and Secrets