What would it take to dynamically load an elf shared object at runtime and execute functions, without compile-time knowledge of the function? There are approximately 3300 shared object files on my computer. Imagine having runtime access to all that functionality without needing a compilation step.
#include <dlfcn.h>
Gives access to dlopen()
. Two problems:
dlsym()
requires you to know the name of the symbol.How can we programmatically assemble parameters to prepare to call a function? We can use libffi to assemble function parameters for calling. From this example:
ffi_call(&cif, function_pointer, &rc, values);
Where function_pointer
is something we got from dlopen()
.
nm -D /usr/lib/x86_64-linux-gnu/libgtk-3.so | grep ' T '
But how to do this programmatically? What does nm
itself use?
ldd /usr/bin/nm
linux-vdso.so.1 (0x00007fff1b773000)
libbfd-2.28-system.so => /usr/lib/x86_64-linux-gnu/libbfd-2.28-system.so (0x00007f4324599000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f432437f000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f432417b000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4323ddc000)
/lib64/ld-linux-x86-64.so.2 (0x00007f4324aed000)
The libbfd
looks interesting. From the link here it seems to do what we might expect. From some random stackoverflow answer:
#include <bfd.h>
bfd *abfd;
asection *p;
char *filename = "/path/to/my/file";
if ((abfd = bfd_openr(filename, NULL)) == NULL) {
/* ... error handling */
}
if (!bfd_check_format (abfd, bfd_object)) {
/* ... error handling */
}
for (p = abfd->sections; p != NULL; p = p->next) {
bfd_vma base_addr = bfd_section_vma(abfd, p);
bfd_size_type size = bfd_section_size (abfd, p);
const char *name = bfd_section_name(abfd, p);
flagword flags = bfd_get_section_flags(abfd, p);
if (flags & SEC_CODE) {
printf("%s: addr=%p size=%d\n", name, base_addr, size);
}
}
Here is where things get a little tricky. The shared objects themselves don't have information about types. For example a struct may be defined but when it's compiled that sub-type information may get lost and the function simply receives a pointer to some blob in memory. From what I can tell the only way to know is to parse the header file.
Some prior art:
Could use GCC::TranslationUnit Perl module... Which uses -fdump-translation-unit
flag:
gcc `pkg-config --cflags gtk+-3.0` `pkg-config --libs gtk+-3.0` -fdump-translation-unit -o test test.c
# head test.c.001t.tu
@1 type_decl name: @2 type: @3 chain: @4
@2 identifier_node strg: int lngt: 3
@3 integer_type name: @1 size: @5 algn: 32
prec: 32 sign: signed min : @6
max : @7
@4 type_decl name: @8 type: @9 chain: @10
@5 integer_cst type: @11 int: 32
@6 integer_cst type: @3 int: -2147483648
@7 integer_cst type: @3 int: 2147483647
@8 identifier_node strg: char lngt: 4
An example for the GtkWindow struct:
@52321 identifier_node strg: _GtkWindow lngt: 10
@52322 identifier_node strg: bin lngt: 3
@52323 record_type name: @52373 unql: @52374 size: @1878
algn: 64 tag : struct flds: @52375
@52324 field_decl name: @9177 type: @52376 scpe: @52273
srcp: gtkwindow.h:57 size: @22
algn: 64 bpos: @1878
From the header:
struct _GtkWindow
{
GtkBin bin;
GtkWindowPrivate *priv;
};
Looking at record type @52373
:
@52373 type_decl name: @52430 type: @52323 scpe: @154
srcp: gtkbin.h:45 chain: @52431
It's name is @52430
:
@52430 identifier_node strg: GtkBin lngt: 6
Looking at size: @1878
:
@1878 integer_cst type: @11 int: 384
And type: @11
:
@11 integer_type name: @18 size: @19 algn: 128
prec: 128 sign: unsigned min : @20
max : @21
So it looks like this translation unit file has all the information needed to understand what size ints are being used, map all the typedefs back to their core types, etc. Advantages are that gcc is doing all the preprocessing and parsing of the text. Disadvantages are that you need to compile something to get this dump, and then you need to parse the dump. This could be something done once per target architecture and then stored in some DB.
What process ids are dynamically linking to a particular object?
lsof /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0
What objects are linked to a given process id?
lsof -p 22159 | grep .so
It's theoretically possible to dynamically link and dynamically call functions from shared objects (naturally, computers are flexible). The main blocker to this is the knowledge of types for function calls. What I would like is a C library that can somehow provide detailed type information for a given object. This could be made either by parsing or compiling header files which would prove difficult because:
The most fruitful path here would be to use parts of LLVM. However we seem to be missing a pure C version of this functionality. Another avenue is to use existing tools in a once-off operation, and save this information to some global database. This could even be a SQL database like SQLite, making it really transparent. I am tempted to make this database and provide it online for all distributed libraries in Debian. It would need the following dimensions:
For example, there would potentially need to be a separate definition for a combination of all of the above. An example:
"x86_64" + "libX11.so" + "6.3.0" + "XCreateWindow" + "DEFAULT"
Which would identify a specific symbol in the shared object. For that symbol there would be a structured machine readable record describing various types and alignments. Software could read this definition, dynamically load the library and assemble function arguments or structures based on this information. Programs could download the SQLite file of this database, or select a subset of the database for their architecture or list of libs. This database could also be installed via Debian packaging.