The Unix programs used a shared library to provide some optional functionality and this had been working fine on Linux/Solaris. The shared library performed some asynchronous functions, like logging, and used pthreads to provide the threading model. I wanted to extend the target architectures supported to include the Windows platform.
There were three possible approaches that I considered.
1) Use cygwin [1] to give a "Linux API emulation layer" or "Linux-like environment" on Windows. Alan Griffiths (and others) object to Cygwin describing itself as an 'emulation'; be that as it may Cygwin does provide a very familiar environment for programmers coming to Windows from Unix.
2) Use pthreads-win32 [2] to provide the threading functions I needed on Windows (this library is LGPL-licensed)
3) Port the codebase to use Windows APIs.
The first approach was the easiest, as the make files and compiler settings were compatible with Linux, but it required target machines to have cygwin installed. I wasn't entirely sure I wanted this restriction (apparently you can get away with copying a single DLL but I've not been able to verify that this is officially supported). The second approach seemed better: pthreads-win32 is a fairly mature library which seems to have a good record and it can be linked with the program or shipped as a single DLL to the target machines.
The last approach was not favourable as there was quite an overhead in producing (and maintaining) a separate code base for two different platforms. Obviously I could abstract away some of these differences - but this would effectively be trying to write the code for one of the first two approaches on my own!
However problems occurred with a program which dynamically loaded the sharable library. The program worked fine until it unloaded the library, at which point it just hung. Ctrl-c and Ctrl-break didn't stop it either; I had to use task manager to kill the process. One of the times I was killing the program this way I noticed that the column for 'Threads' (selectable using View->Select Columns->Thread count) was showing four threads not the two I was expecting. Further investigation revealed that the thread count went up by one each time I pressed Ctrl-c or Ctrl-break. What was going on?
It looked like some sort of multi-threaded deadlock was occurring and my initial thought was that I had found a bug in pthreads-win32. So I decided to try out cygwin, hoping that this environment would not have the same problem. I was fairly quickly able to compile my sharable library and test program using gcc on cygwin. I discovered though that the behaviour of the program was exactly the same - even down to creating extra threads when I typed ctrl-c - so it looked like something more complicated was going on than just a bug in pthreads-win32.
I tried to attach the Visual Studio 2003 debugger to the hung process, but wasn't able to get it to connect successfully. It simply didn't debug - and if I tried to break into the process I got a dialog box saying "Unable to break into the process 'testwindows.exe'. Please wait until the debugee has finished loading, and try again."
I tried again, this time starting testWindows.exe under the debugger rather than attaching when the program surfaced. When it hung I tried to break into the program and got a message box saying "The process appears to be deadlocked (or is not running any user-mode code). All threads have been stopped." I was however able to view the list of threads - one thread was inside thread_join() and the other was inside __endthreadex.
I abandoned the Visual Studio debugger as it wasn't helping me resolve the issue and reached for a different tool - WinDbg. This is one of the Microsoft Debugging Tools [3] and gives a lower level, but sometimes more informative, view of your process than the Visual Studio toolset. This time I hit paydirt. When the test program hung I attached Windbg and got the following couple of lines embedded in the output from the debugger:
Break-in sent, waiting 30 seconds...
WARNING: Break-in timed out, suspending.
This is usually caused by another thread holding the loader lock
I executed the Windbg command "!locks" and it displayed the following:
CritSec ntdll!LdrpLoaderLock+0 at 77FC2340
LockCount 2
RecursionCount 1
OwningThread f38
EntryCount 2
ContentionCount 2
*** Locked
Scanned 102 critical sections
At this point the debugging session had provided enough information for me to understand what was the cause of the problem.
The Windows API uses a per-process lock internally (as the output from windbg shows this lock is defined inside ntdll.dll, the lowest level user-space library in the system). This critical section is locked while loading and unloading sharable libraries and this same lock is also used during thread exit. (There is more information about this lock on msdn [4].)
My program was calling FreeLibrary and the implementation of this call locks the loader lock and then calls the DLL entry point to notify the DLL that it is being unloaded. The C++ runtime hooks into this call in order to run destructors for any static objects including the destructor for the 'theWorker'. This destructor signalled the asynchronous thread to exit (by setting timeToEnd to true and waking the thread up) and then joined against the thread. However the thread exit also needs the internal loader lock (since thread detach is notified to shared libraries using the same mechanism as when unloading libraries).
A classic deadlock has occurred: the main program thread has the loader lock and is waiting for the worker thread to complete; the worker thread needs the loader lock to complete its exit.
It seems that the designers of Windows wished to ensure that the thread 'main' function was only active in one thread at a time. This does have some advantages - for example it means easy avoidance of common race conditions initialising and terminating shared libraries. The downside is the possibility of deadlock.
Since the loader lock is an internal implementation decision of the Win32 API it is hard to see how the pthread-win32 library or cygwin could avoid the deadlock since the use of the loader lock is outside direct application control.
It also means that option (3) - re-implementing the library using Win32 calls - would not help either since the problem was with the lock used by the Win32 API itself.
The sharable library has only one entry point: 'callLibrary()'. This makes sure the Worker instance is initialised and then calls a method on this instance. The worker makes use of a helper thread to perform its function; in this example the worker thread simply waits for 'timeToEnd' to be set - the original code used some internal queueing to pass work items to the helper thread.
The driving program is also very simple. It loads the library, looks up the address of the entry point (callLibrary) and invokes it. Then it unloads the library and exits.
The code looks like this:
------------------------------- Loadable.cpp --------------------------------
#include
#include
#include
class Worker
{
public:
Worker();
~Worker();
void dosomething();
private:
void start();
void stop();
void run();
static void *threadStart( void * );
pthread_t thread;
pthread_mutex_t mutex;
pthread_cond_t cond;
bool timeToEnd;
bool started;
};
std::auto_ptr theWorker;
Worker::Worker()
: timeToEnd( false )
, started( false )
{
std::cout << "Worker ctor" << std::endl;
pthread_mutex_init( &mutex, 0 );
pthread_cond_init( &cond, 0 );
}
Worker::~Worker()
{
std::cout << "Worker dtor" << std::endl;
stop();
pthread_cond_destroy( &cond );
pthread_mutex_destroy( &mutex );
}
void Worker::dosomething()
{
start();
std::cout << "Queue something" << std::endl;
}
void Worker::start()
{
pthread_mutex_lock( &mutex );
if ( ! started )
{
pthread_create( &thread, 0, threadStart, this );
started = true;
}
pthread_mutex_unlock( &mutex );
}
void Worker::stop()
{
if ( ! started )
return;
pthread_mutex_lock( &mutex );
timeToEnd = true;
pthread_mutex_unlock( &mutex );
pthread_cond_signal( &cond );
pthread_join( thread, 0 );
started = false;
}
void Worker::run()
{
pthread_mutex_lock( &mutex );
while ( ! timeToEnd )
{
pthread_cond_wait( &cond, &mutex );
}
pthread_mutex_unlock( &mutex );
}
void *Worker::threadStart( void * arg )
{
Worker *self = static_cast(arg);
self->run();
std::cout << "Thread exiting" << std::endl;
return 0;
}
extern "C"
{
void
#if defined (_MSC_VER)
__declspec( dllexport )
#endif
callLibrary()
{
if ( ! theWorker.get() )
{
theWorker.reset( new Worker );
}
theWorker->dosomething();
}
}
------------------------------- testWindows.cpp ---------------------------
#include
#include
#include
int main( int argc, char ** argv )
{
std::cout << "About to load library" << std::endl;
std::string library( argc == 1 ? "Loadable" : argv[1] );
HMODULE handle = LoadLibrary( library.c_str() );
if ( ! handle )
{
std::cout << "Failed to load '" << library << "': " << GetLastError() << std::endl;
return 1;
}
std::cout << "Loaded library '" << library << "'" << std::endl;
PROC pCall = GetProcAddress( handle, "callLibrary" );
if ( pCall )
{
typedef void (*pfn)();
pfn pfnCall = (pfn)pCall;
std::cout << "About to call library" << std::endl;
pfnCall();
}
// Fake some work before we close the library...
Sleep( 5 * 1000 );
PROC pClose = GetProcAddress( handle, "closeLibrary" );
if ( pClose )
{
typedef void (*pfn)();
pfn pfnClose = (pfn)pClose;
std::cout << "About to close library" << std::endl;
pfnClose();
}
std::cout << "About to unload library" << std::endl;
if ( FreeLibrary( handle ) == 0 )
{
std::cout << "Failed to unload: " << GetLastError() << std::endl;
return 1;
}
std::cout << "Unloaded library" << std::endl;
// Fake some more work before we eventually exit...
Sleep( 5 * 1000 );
std::cout << "About to exit" << std::endl;
return 0;
}
------------------------------- end ---------------------------
When I ran this program on Windows here is the output I saw:
About to load library
Loaded library 'Loadable'
About to call library
Worker ctor
Queue something
About to unload library
Worker dtor
Thread exiting
and the program was hung.
I still had my hang -- how could I prevent it? The fundamental problem was trying to stop a thread while unloading a shared library.
At first I thought about taking out the pthread_join() - mark the thread as stopping and then return from the destructor. Unfortunately this produces a nasty race condition - we are calling the destructor while handling the unload of the DLL, after we return from FreeLibrary the DLL's code and data has gone from memory and so if the second thread is still running it will get an access violation as there is no longer any code for it to execute!
Eventually I decided that the best solution was to add a closeLibrary() call to the shared library that would stop the thread and to ensure that client programs using dynamic loading called this function before unloading the DLL. This still left a problem if the client code neglected to call this function of course, but at least a diagnostic message could be produced warning that the library had been unloaded without closing.
Al slightly safer approach is for the library to call LoadLibrary *on itself* when it creates the thread and FreeLibrary when closeLibrary() completes. This ensures that if main program tries to unload the library without calling closeLibrary() the library will stay in memory and avoid the free-running thread causing a crash.
Microsoft published an MSJ article some years ago which solved the problem by splitting the single DLL into two; the second DLL contained the actual worker threads each of which called LoadLibrary on the second DLL itself to keep it in memory while they executed. These worker threads called FreeLibraryAndExitThread when they completed so the second DLL was only unloaded asynchronously by Windows when all the threads completed. This would have been rather more of a change to the sharable library than I had time for (and also the use of FreeLibraryAndExitThread would make the resultant code rather more Windows-specific than I wanted).
Firstly, the interaction of threads, sharable libraries and C++ destructors is a bit of a minefield. Although my program worked happily on Unix this might just be luck -- I don't think there are any guarantees in the standards that the code should work. Actually, this article could be viewed as another example of the singleton anti-pattern since the single instance of the worker thread object is the root cause of the problem.
Secondly, when debugging having multiple tools in your toolkit is a good thing. In this example the Visual Studio debugger was unable to provide much assistance with this particular problem; but WinDbg produced helpful information as soon as I used it. In fact, I find that WindDbg produces additional information so often that sometimes I attach it non-invasively to a process already under the Visual Studio debugger. This allows both debuggers to be used to investigate the state of the stopped program.
And finally, this exercise shows the truth of the phrase "abstractions are leaky" [5]. I was trying to use pthreads-win32 and cygwin to provide Unix-like threading a non-Unix operating system. In this particular case an implementation detail of the underlying operating system leaked into the application and caused a failure. It is hard to prepare for this in advance, since it is difficult to predict where the abstraction is going to break down, and it also indicates that the better your knowledge of the underlying platform-specific implementation the more likely you are to be able to recognise and solve such problems.
Thanks are due to Alan Griffiths, Phil Bass and others who reviewed a draft of this article and suggested improvements.