A friend and I have recently been working on a library that can replace P/Invoke in the C# language - there are many shortcomings in the existing solutions, and when you're targeting multiple platforms, the headaches just get worse.

In short, we're using a delegate-based approach, together with the platform's native dynamic library loading systems (libdl and kernel32's LoadLibrary, respectively). The code is available under GPLv3 on Github, or via a commercial license.

While working on disposal code for the native libraries, we came across a peculiar instance where calling dlclose on our library handle broke DllImport statements that accessed the same library, causing SEGFAULT at runtime when we attempted to call them. In our testing code, we had the following setup (an equivalent one, at least):

test.c

int TestFunc()
{
    return 1;
}

test.cs

using System;
using System.Runtime.InteropServices;

public static class Program
{
    [DllImport("libTest.so")]
    public static extern int TestFunc();

    [DllImport("dl")]
    public static extern IntPtr dlopen(string fileName, int flags);

    [DllImport("dl")]
    public static extern IntPtr dlsym(IntPtr handle, string name);

    [DllImport("dl")]
    public static extern int dlclose(IntPtr handle);

    [DllImport("dl")]
    public static extern IntPtr dlerror();
    
    public delegate int TestFunc_dt();

    public static void Main()
    {
        string library = "libTest.so";

        var handle = dlopen(library, 1);
        
        var func = Marshal.GetDelegateForFunctionPointer<TestFunc_dt>(dlsym(handle, "TestFunc"));
        Console.WriteLine("DlSym Call: {0}", func());
        
        dlclose(handle);
        Console.WriteLine("DllImport Call: {0}", TestFunc());
    }
}

This code, when ran, segfaults at the call to the DllImported function. What gives?

DlSym Call: 1
<segfault>

Commenting out the dlclose call lets it run without problems.

DlSym Call: 1
DllImport Call: 1

After a bit of headscratching, I noticed the following line:

string library = "libTest.so";

On Windows, this would have been perfectly valid - kernel32 searches the current application directory for libraries to load first. Not so with libdl - in fact, it doesn't look at any local directories at all, unless they've been added to the LD_LIBRARY_PATH environment variable.

The cause, then, seemed obvious - we're not getting a valid library handle when calling dlopen! However, that couldn't be right - we could retrieve a function pointer, and not only that, we could instantiate a delegate to it that worked just as expected. wat.

I took a look at the library handle, and sure enough:

Console.WriteLine("Library loaded via Dlopen: {0}", handle);
Library loaded via Dlopen: 0
DlSym Call: 1
DllImport Call: 1

wat.

At this point, the headscratching continued. Quickly, we changed over to an absolute path, which produced a valid handle, and the problem went away. dlclose worked as expected, and we had no problem calling the DllImport function.

What was going on? We didn't have a valid library handle, and yet we could retrieve a function pointer (from a NULL library handle, no less!) that worked fine.

Now, libdl has some interesting handling when it comes to NULL inputs. dlopen accepts, according to the documentation, NULL as a valid input to the library name.

dlopen()

If filename is NULL, then the returned handle is for the main program.

An interesting behaviour, to say the least - you can actually introspect the running program and treat it as a dynamic library, loading function pointers from it. In our case, however, the input was most certainly not NULL - we were passing an invalid path or library name, and getting a NULL handle back.

dlsym, on the other hand, doesn't mention anything about defined behaviour for a NULL input handle, which is what we were giving it in this instance. At this point, you can probably guess where the trail was leading us.

Did dlsym work the same way? Would giving it a NULL handle result in main program introspection, and search it for the given symbol? That still didn't make much sense - we were working with a managed C# program running under the CLR, after all, and we weren't defining anything in unmanaged code. Or were we?

[DllImport("libTest.so")]
public static extern int TestFunc();

It can't be, can it?

using System;
using System.Runtime.InteropServices;

public static class Program
{
    //[DllImport("libTest.so")]
    //public static extern int TestFunc();

    [DllImport("dl")]
    public static extern IntPtr dlopen(string fileName, int flags);

    [DllImport("dl")]
    public static extern IntPtr dlsym(IntPtr handle, string name);

    [DllImport("dl")]
    public static extern int dlclose(IntPtr handle);

    [DllImport("dl")]
    public static extern IntPtr dlerror();
    
    public delegate int TestFunc_dt();

    public static void Main()
    {
        string library = "libTest.so";

        var handle = dlopen(library, 1);
        
        var func = Marshal.GetDelegateForFunctionPointer<TestFunc_dt>(dlsym(handle, "TestFunc"));
        Console.WriteLine("Library loaded via Dlopen: {0}", handle);
        Console.WriteLine("DlSym Call: {0}", func());
        
        dlclose(handle);
        //Console.WriteLine("DllImport Call: {0}", TestFunc());
    }
}
Unhandled Exception:
System.ArgumentNullException: Value cannot be null.
Parameter name: ptr
  at System.Runtime.InteropServices.Marshal.GetDelegateForFunctionPointer (System.IntPtr ptr, System.Type t) [0x00068] in <2e7c1c96edae44d496118948ca617c11>:0 
  at System.Runtime.InteropServices.Marshal.GetDelegateForFunctionPointer[TDelegate] (System.IntPtr ptr) [0x00000] in <2e7c1c96edae44d496118948ca617c11>:0 
  at Program.Main () [0x0001a] in <73e65435aac14958b130ed2e0a8de941>:0 
[ERROR] FATAL UNHANDLED EXCEPTION: System.ArgumentNullException: Value cannot be null.
Parameter name: ptr
  at System.Runtime.InteropServices.Marshal.GetDelegateForFunctionPointer (System.IntPtr ptr, System.Type t) [0x00068] in <2e7c1c96edae44d496118948ca617c11>:0 
  at System.Runtime.InteropServices.Marshal.GetDelegateForFunctionPointer[TDelegate] (System.IntPtr ptr) [0x00000] in <2e7c1c96edae44d496118948ca617c11>:0 
  at Program.Main () [0x0001a] in <73e65435aac14958b130ed2e0a8de941>:0 

Oh boy.

So, dlsym has undocumented behaviour that lets it introspect the main program when passed a NULL handle, and DllImport - somehow - is built in such a way that it loads the symbol into the main program on a level that dlsym can see and load it. It would be very interesting to learn how the mechanics of this works behind the scenes, but we didn't have time to investigate further.

On top of this, we were calling dlclose on a NULL handle. Thus, it would seem that the Linux version of libdl doesn't do any checking on its input value, and tries to free the given NULL handle and its associated symbols. This passes, for some reason, and doesn't produce any adverse effects until we call the DllImported version of the function, that dlclose has now unloaded in the CLR.

(an interesting aside is that this was also reproducible on Windows, using the kernel32 API)

The end result? AdvancedDLSupport would never actually hit this edge case - we were already checking for NULL handles when loading libraries, and if we did get one, we threw an exception.

The issue had come up when prototyping, and hadn't been revisited until now. We did a pass over our checking code to make sure, congratulated ourselves for learning something new, called it a day, and that was the end of it.

What a wild ride. From invalid handles to undocumented behaviour, the world of interop really is a jungle.

AdvancedDLSupport is available on Github, Myget, and soon, Nuget.