This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

arm-none-eabi-nm: some symbols are not related to any source file

In my embedded project I compile `amazon-freertos/lib/FreeRTOS-Plus-TCP/FreeRTOS_Sockets.c` in this way:

/opt/gcc-arm-none-eabi-8-2019-q3-update/bin/arm-none-eabi-gcc \
	-std=gnu11 \
	-mcpu=cortex-m7 \
	-mthumb \
	-mapcs \
	-mfloat-abi=hard \
	-mfpu=fpv5-d16 \
	-fno-common \
	-fno-math-errno \
	-fsingle-precision-constant \
	-fno-trapping-math \
	-fno-signaling-nans \
	-fno-builtin \
	-fstrict-aliasing \
	-fstack-usage \
	-Wstack-usage=300 \
	-DCPU_MIMXRT1051DVL6B  \
	-D__FREERTOS__=1 \
	-DFSL_RTOS_FREE_RTOS \
	-DFSL_FEATURE_PHYKSZ8081_USE_RMII50M_MODE \
	-D__MCUXPRESSO \
	-D__USE_CMSIS \
	-DARM_MATH_CM7 \
	-D__NEWLIB__ \
	-DDEBUG=0 \
	-IDSP/source/ \
	-Iamazon-freertos/lib/FreeRTOS-Plus-TCP/ \
	-Iamazon-freertos/lib/FreeRTOS-Plus-TCP/portable/BufferManagement/ \
	-Iamazon-freertos/lib/FreeRTOS-Plus-TCP/portable/NetworkInterface/imxrt105x/ \
	-Iamazon-freertos/lib/FreeRTOS/ \
	-Iamazon-freertos/lib/include/ \
	-Iamazon-freertos/lib/FreeRTOS/portable/GCC/ARM_CM4F/ \
	-Iamazon-freertos/lib/FreeRTOS/portable/MemMang/ \
	-Og \
	-g3 \
	-Wall \
	-ffunction-sections \
	-fdata-sections \
	-c \
	-MMD \
	-MP \
	-Werror \
	-D"ARCPRINTF( ... )=(void)0" \
	--specs=nano.specs  \
	-Wa,-anhlmsd=build/DSP/amazon-freertos/lib/FreeRTOS-Plus-TCP/FreeRTOS_Sockets.lst \
	-o build/DSP/amazon-freertos/lib/FreeRTOS-Plus-TCP/FreeRTOS_Sockets.o amazon-freertos/lib/FreeRTOS-Plus-TCP/FreeRTOS_Sockets.c 

I have `ipconfigUSE_TCP` set to 1 in `amazon-freertos/lib/FreeRTOS-Plus-TCP/include/FreeRTOSIPConfig.h`

`FreeRTOS_Sockets.c` declares `xBoundUDPSocketsList` and `xBoundTCPSocketsList`

/* The list that contains mappings between sockets and port numbers.  Accesses
to this list must be protected by critical sections of one kind or another. */
List_t xBoundUDPSocketsList;

#if ipconfigUSE_TCP == 1
	List_t xBoundTCPSocketsList;
#endif /* ipconfigUSE_TCP == 1 */

Once I have my elf executable linked, run this command:

$ /opt/gcc-arm-none-eabi-8-2019-q3-update/bin/arm-none-eabi-nm -a -l -n -t x --print-size image/DSP.elf | grep -E '^[[:xdigit:]]{8} [[:xdigit:]]{8} B' | grep SocketsList
2001ac7c 00000014 B xBoundTCPSocketsList
2001ac90 00000014 B xBoundUDPSocketsList	/home/max/Lavori/4202/src/repos/toremove/FW/amazon-freertos/lib/FreeRTOS-Plus-TCP/FreeRTOS_Sockets.c:162

Both symbols exist in the executable, but one (`xBoundTCPSocketsList`) does not seem to belong to any .c source.
Both appear in the map file:

$ grep -n -A 1 -E 'xBoundTCPSocketsList|xBoundUDPSocketsList' image/DSP.map
61974: .bss.xBoundTCPSocketsList
61975-                0x000000002001ac7c       0x14 ./build/DSP/amazon-freertos/lib/FreeRTOS-Plus-TCP/FreeRTOS_Sockets.o
61976:                0x000000002001ac7c                xBoundTCPSocketsList
61977: .bss.xBoundUDPSocketsList
61978-                0x000000002001ac90       0x14 ./build/DSP/amazon-freertos/lib/FreeRTOS-Plus-TCP/FreeRTOS_Sockets.o
61979:                0x000000002001ac90                xBoundUDPSocketsList

Even addr2line fails:

$ arm-none-eabi-addr2line -a -e image/DSP.elf  2001ac7c 2001ac90
0x2001ac7c
??:0
0x2001ac90
/home/max/Lavori/4202/src/repos/toremove/FW/amazon-freertos/lib/FreeRTOS-Plus-TCP/FreeRTOS_Sockets.c:162

even the `FreeRTOS_Sockets.lst` doesn't tell me anything more:

 7337              		.global	xBoundTCPSocketsList
 7338              		.global	xBoundUDPSocketsList
 7339              		.section	.bss.xBoundTCPSocketsList,"aw",%nobits
 7340              		.align	2
 7341              		.set	.LANCHOR2,. + 0
 7344              	xBoundTCPSocketsList:
 7345 0000 00000000 		.space	20
 7345      00000000 
 7345      00000000 
 7345      00000000 
 7345      00000000 
 7346              		.section	.bss.xBoundUDPSocketsList,"aw",%nobits
 7347              		.align	2
 7348              		.set	.LANCHOR1,. + 0
 7351              	xBoundUDPSocketsList:
 7352 0000 00000000 		.space	20
 7352      00000000 
 7352      00000000 
 7352      00000000 
 7352      00000000 

There are many other symbols present in the executable but which do not seem to be associated with any .c source file.

Why this behavior? What changes between the two symbols `xBoundTCPSocketsList` and `xBoundUDPSocketsList`? Am I getting it wrong or omitting some debugging parameters when compiling? How do I get either nm or some other way to get the .c source where a symbol is declared?

best regards

Max

  • The variable xBoundTCPSocketsList is declared as extern in a header FreeRTOS_IP_Private.h. It is defined in the file FreeRTOS_Sockets.c which also happens to include FreeRTOS_IP_Private.h. As a result, the extern declaration and the definition of the variable are forced to appear, in that order, in the same file FreeRTOS_Sockets.c. This situation causes the dwarf debug information about the variable to be split into two entries. It is likely that the tools nm/objdump aren't able to cope with this split. I do not know if there's a switch that they take to force them to consider such splits.

    But, I was able to workaround this situation by avoiding inserting the extern declaration for the variable in the FreeRTOS_Sockets.c file. Below are the steps:

    1. At the top of FreeRTOS_Sockets.c file, right before the first #include and right after the license block, define a new, non-existing macro, for e.g. FREERTOS_SOCKETS_C_FILE. At the very end of the file, add a line to undefine that same macro. See code snippet below.
    2. In the FreeRTOS_IP_Private.h file, restrict the extern declaration of that variable to be included in the files which do not define the macro FREERTOS_SOCKETS_C_FILE. 

    /* FreeRTOS_IP_Private.h */
    
    /* Replace the preprocessor directives surrounding
     * the extern declaration, as shown below
     */
     
    /* Defined in FreeRTOS_Sockets.c */
    #if ( ipconfigUSE_TCP == 1 ) && (!defined(FREERTOS_SOCKETS_C_FILE))
    	extern List_t xBoundTCPSocketsList;
    #endif

    Edit: I think it is also possible to modify the step#1 to define-undefine the new macro, insideFreeRTOS_Sockets.c, just around the #include "FreeRTOS_IP_Private.h", instead of around the whole file.

    Step#1 can then be alternatively written, in a more succint manner, as:

    /* FreeRTOS_Sockets.c */
    
    /* ... */
    
    #define FREERTOS_SOCKETS_C_FILE
    #include "FreeRTOS_IP_Private.h"
    #undef FREERTOS_SOCKETS_C_FILE
    
    /* ... */

    Edit2: Here is a bug in gcc, where an extern declaration and the corresponding definition, both seperate statements but still in the same file - a situation seen above, of an array resulted in gcc not placing the array bounds in the dwarf debug information, causing gdb to output incorrect size of the array.

  • It is defined in the file FreeRTOS_Sockets.c which also happens to include FreeRTOS_IP_Private.h.

    That, surely, is a standard practice?

  • True. Extern-declaring a global inside a header, and then including that header (or pasting the extern-declaration) in the source files that need to refer to that global is a standard practice.

    (1) The files, where the global isn't defined, have no other way to reference it.

    (2) The file, where the global is defined, may or may not need the extern-declaration.

    (2a) It may need the extern-decl, for e.g., if there's some code in the file, which needs to refer to the global, but which lies in the file before the position where the global is defined. By including the header, or by providing the same extern-declaration on the top, such a piece of code can refer to the global without having to move the definition of the global.

    But, this situation (2a) causes gcc to split the the dwarf debug info about the variable into two entries, with one of them chained to the other through an DW_AT_specification attribute. The tools nm/objdump/addr2line seem to be troubled by such splitting/chaining. The gcc bug, mentioned in my first reply, proves that gcc/binutils are troubled by situation (2a) in more than one ways.

    However, see here, where a dwarf engineer says, referring to his/her language expert:

    "If this is for C/C++, for example, my shaky language lawyer suggests that it is not valid for both an external declaration and an allocating declaration to occur in the same unit, ...".

    IIUC, it says that the extern-decl-statement and the corresponding definition-statement, if they are separate, may not appear together in the same compilation unit; IOW, it suggests that situation (2a) is 'forbidden'/'invalid'.

    Edit: formatting the quote.

    Edit2: CLang doesn't split the dwarf-debug-info entry in the situation (2a):

    extern int xTCP;
    int xTCP=20;
    int main() { return xTCP; }


    [user@mach bin]$ gcc -Og -gdwarf-2 -g3 a.c
    [user@mach bin]$ objdump --dwarf
    . . .
    <1><31>: Abbrev Number: 2 (DW_TAG_variable)
        <32>   DW_AT_name        : (indirect string, offset: 0x2629): xTCP
        <36>   DW_AT_decl_file   : 1
        <37>   DW_AT_decl_line   : 1
        <38>   DW_AT_decl_column : 12
        <39>   DW_AT_type        : <0x3f>
        <3d>   DW_AT_external    : 1
        <3e>   DW_AT_declaration : 1
    . . .
     <1><46>: Abbrev Number: 4 (DW_TAG_variable)
        <47>   DW_AT_specification: <0x31>
        <4b>   DW_AT_decl_line   : 2
        <4c>   DW_AT_decl_column : 5
        <4d>   DW_AT_location    : 9 byte block: 3 28 40 0 0 0 0 0 0        (DW_OP_addr: 4028)
        . . .
     DW_MACRO_start_file - lineno: 0 filenum: 1 filename: a.c

    [user@mach bin]$ ./clang -Og -gdwarf-2 -g3 a.c
    [user@mach bin]$ ./llvm-dwarfdump --debug-info --name=xTCP
    a.out:	file format ELF64-x86-64
    
    0x0000002e: DW_TAG_variable
                  DW_AT_name	("xTCP")
                  DW_AT_type	(0x00000044 "int")
                  DW_AT_external	(0x01)
                  DW_AT_decl_file	("/home/user/llvm/usr/bin/a.c")
                  DW_AT_decl_line	(2)
                  DW_AT_location	(DW_OP_addr 0x4028)

  • OK, so I can build another example

    this is a simple source:

    extern int g_my_externd_global;
    int g_my_externd_global;
    int g_my_private_global;
    int main(void)
    {
    	g_my_externd_global=1;
    	g_my_private_global=1;
    	return 0;
    }
    

    then I compile and link in two distinct stage:

    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-gcc -g3 -gdwarf-4 -mthumb -mcpu=cortex-m4 -nostdlib -ffunction-sections -fdata-sections -O1 -fno-common -c main.c -o example-O1.o 
    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-nm -s -l example-O1.o 
    00000000 B g_my_externd_global
    00000000 B g_my_private_global	/home/max/Dropbox/4202/prog/test-dwarf/main.c:3
    00000000 T main	/home/max/Dropbox/4202/prog/test-dwarf/main.c:4
    00000000 n wm4.0.771cfe8abdd6d09387c8f60d9b1eb4ff
    

    This is basically the same behaviour.


    then I try to compile and link using -O0 instead of -O1

    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-gcc -g3 -gdwarf-4 -mthumb -mcpu=cortex-m4 -nostdlib -ffunction-sections -fdata-sections -O0 -fno-common -c main.c -o example-O0.o 
    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-nm -s -l example-O0.o 
    00000000 B g_my_externd_global	/home/max/Dropbox/4202/prog/test-dwarf/main.c:1
    00000000 B g_my_private_global	/home/max/Dropbox/4202/prog/test-dwarf/main.c:3
    00000000 T main	/home/max/Dropbox/4202/prog/test-dwarf/main.c:4
    00000000 n wm4.0.ca8be4477412fcb0d732c8d55bace892
    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-gcc -g3 -gdwarf-4 -mthumb -mcpu=cortex-m4 -nostdlib -ffunction-sections -fdata-sections -O0 -fno-common example-O0.o -o example-O0.elf
    /opt/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-nm -s -l example-O0.elf
    0001802c B __bss_end__
    0001802c B _bss_end__
    00018024 B __bss_start
    00018024 B __bss_start__
    00018024 T __data_start
    00018024 T _edata
    0001802c B __end__
    0001802c B _end
    00018024 B g_my_externd_global
    00018028 B g_my_private_global	/home/max/Dropbox/4202/prog/test-dwarf/main.c:3
    00008000 T main	/home/max/Dropbox/4202/prog/test-dwarf/main.c:4
    00080000 N _stack
             U _start
    

    nm can correctly find the source code and line of both variables if object file is given.

    But when we try to analyse the elf file the behaviour changes.


    At last I tried to compile using -O0 and -fcommon (instead of -fno-common)

    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-gcc -g3 -gdwarf-4 -mthumb -mcpu=cortex-m4 -nostdlib -ffunction-sections -fdata-sections -O0 -fcommon -c main.c -o example-O0-fcommon.o 
    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-nm -s -l example-O0-fcommon.o 
    00000004 C g_my_externd_global
    00000004 C g_my_private_global
    00000000 T main	/home/max/Dropbox/4202/prog/test-dwarf/main.c:4
    00000000 n wm4.0.ca8be4477412fcb0d732c8d55bace892
    

    In this case no variable has file name and line.


    I also used objdump to analyse the content of .debug_info section:

    https://i.postimg.cc/4XFPPnmK/Screenshot-2020-03-13-15-57-41.png

    The differences do not seem to me to justify the different behaviors

    best regards

    Max

  • Long post below.


    Conclusion:

    It seems binutils can't handle the split/linked nature of the debug-records for variables whose (separate) extern-decl and definition are found in the same compilation unit. Additionally, it considers two formally-same but semantically-different entities (namely, the zero addresses) as equal, when they aren't. A boolean can be added to prevent the differences between the entities from collapsing.

    Maybe there's a way to instruct gcc to not split the records in the first place. (Edit: Unlikely, as it explicitly creates a new tuple for the definition, although this is done in order to support the info for C++ class-level static variables where the language forces the (non-const) static variable to be separately declared and defined).

    Further investigation is needed for the fcommon-test. But it is likely, given the special nature of *COM* label (it isn't even a proper section in the .o file), that the lack of info can be easily explained away. You may want to debug binutils yourselves to find answer to that and other questions if any. It isn't necessary to build a cross-compiler in this case.

    If the focus is on reviewing, reading and understanding software, cross-referencers (lxr, cscope, ctags, etags, or similar), and tools like Coccinelle, which can derive various relationships from the source-code, or even grep if the situation calls for it, are better options.


    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-gcc -g3 -gdwarf-4 -mthumb -mcpu=cortex-m4 -nostdlib -ffunction-sections -fdata-sections -O0 -fno-common -c main.c -o example-O0.o 
    max@jarvis:~/Dropbox/4202/prog/test-dwarf$ arm-none-eabi-nm -s -l example-O0.o 
    00000000 B g_my_externd_global	/home/max/Dropbox/4202/prog/test-dwarf/main.c:1
    00000000 B g_my_private_global	/home/max/Dropbox/4202/prog/test-dwarf/main.c:3

    Below is applicable to the O0-fno-common-test pasted above.


    Given that the problem under consideration is about binutils' inability to print the expected info, one may be inclined to see all instances where it does print the expected info, as a sign that binutils is 'working as expected' at least in those instances.

    That sign is not necessarily accurate.

    Let P be the proposition "binutils doesn't print the expected info", and Q be the proposition "binutils is faulty".

    Then, the proposition "P implies Q" can be generally accepted as true, assuming the expectations are valid (they are in this case).

    But that says nothing about the truth-value of the proposition "~P implies ~Q".

    The info for g_my_private_global was printed by binutils by its working as designed. It was able to fetch the information from the unbroken debug-record for that variable.

    The info for g_my_externd_global happened to be printed, unintentionally, as if by a fluke. The phrase "even a stopped clock is right twice a day" explains binutils' behaviour here.


    For simplicity, consider that the dwarf-debug-records maintain a (name, addr, file, line) tuple for each variable. The tool nm builds a query tuple, and tries to find a tuple in the dwarf-debug-records that match/satify the query.

    But a variable, with its extern-decl-followed-by-definition in the same compilation-unit/source-file, has its debug-record split into two tuples.

    For e.g., g_my_externd_global has two tuples instead of one:

    • t0 = (name, AAA, file, line); points to the extern-decl.
    • t1 = (BBB, addr, file, line); points to the definition.

    Here, AAA denotes an invalid address; address isn't available at the extern-decl site.
    BBB denotes an invalid name; name isn't collected from the definition site. (Note that a field (not shown here) in t1 links t1 to t0, so t1 indirectly has the name too).

    binutils represents AAA by the value 0, and BBB by the value NULL. Therefore, the tuples for the g_my_externd_global variable are:

    • t0 = ("g_my_externd_global", 0, file, line).
    • t1 = (NULL, &g_my_externd_global, file, line).

    When nm requests the info for g_my_externd_global, it passes a query-tuple q = ("g_my_externd_global", &g_my_externd_global).

    Tuple t1 cannot satisfy q because t1.name != q.name. Any chance of getting the site/location of the definition thus evaporates because the definition-holding tuple is rejected.

    If q.addr is 0, then t0 does satisfy q (although inadvertently).

    With q.addr set to 0, t0 should ideally not satisfy q because the semantics of the two zero-address-values are different:

    • In a query tuple, when addr is 0, it means that the variable's address is 0.
    • In a dwarf-debug-record tuple, when addr is 0, it means that the variable's address isn't yet available in this tuple.

    But, binutils does not seem to consider such semantic differences; it claims that t0 satifies q, and extracts/prints the (file, line) info from t0. All one gets then is the site of extern-decl.


    When is &g_my_externd_global zero?

    When nm generates the query, it calculates the address of the variable (in a couple of different ways).

    Relevant for us is the equation, addr = section->vma + offset_var.

    • section is the section inside which the variable resides.
    • offset_var is the offset of the variable, relative to the containing-section's start.

    If one dumps the section info for a .o file, it can be seen that section->vma is 0 for every section. Therefore, a variable that resides at offset 0 in its containing section (i.e. at the very start of the section) has an address = 0+0 = 0.

    In case of g_my_externd_global, it so happens, perhaps because its definition appears first in the source, that it is the first variable to be placed in .bss (after resolving the effect of -fdata-sections on it). Thus, its address is calculated as 0.


    To see the behaviour, one can run a few tests:

    /* GT0 */
    
    extern int g_my_externd_global;
    extern int g_my_externd_global1;
    extern int g_my_externd_global2;
    extern int g_my_externd_global3;
    
    /* Note below the order in which the symbols are defined. */
    int g_my_externd_global;	/* Consider this line as slot#1 */
    int g_my_externd_global1;
    int g_my_externd_global2;
    int g_my_externd_global3;
    int g_my_private_global;
    
    /* Result: (file,line) printed for g_my_externd_global and g_my_private_global.*/

    Now swap g_my_externd_global with another externd global:

    /* GT1 */
    int g_my_externd_global3;
    int g_my_externd_global1;
    int g_my_externd_global2;
    int g_my_externd_global;
    int g_my_private_global;
    
    /* Result: (file,line) printed for g_my_externd_global3 and g_my_private_global. */

    Now swap g_my_externd_global3 with g_my_private_global:

    /* GT2 */
    
    int g_my_private_global;
    int g_my_externd_global1;
    int g_my_externd_global2;
    int g_my_externd_global;
    int g_my_externd_global3;
    
    /* Result: (file,line) printed for g_my_private_global, but not for any of the externd variables. */


    To have binutils print the tuples, find and modify the function lookup_symbol_in_variable_table inside binutils-2.34/bfd/dwarf2.c.

    The function is of the form:

    for
      if
        break;

    Add a printf:

    for {
        printf("q = (%s,%lx), t = (%s, %lx)\n", name, addr, each->name, each->addr);
        if
            break;
    }

    (Re)build binutils, and run the newly-built nm/nm-new on the .o file. Sample output for the GT0 test below:

    0000000000000000 B g_my_externd_global
    q = (g_my_externd_global,0), t = (g_my_private_global, 10)
    q = (g_my_externd_global,0), t = ((null), c)
    q = (g_my_externd_global,0), t = ((null), 8)
    q = (g_my_externd_global,0), t = ((null), 4)
    q = (g_my_externd_global,0), t = ((null), 0) <----------------- Valid 0 address here, but name-mismatch.
    q = (g_my_externd_global,0), t = (g_my_externd_global3, 0)
    q = (g_my_externd_global,0), t = (g_my_externd_global2, 0)
    q = (g_my_externd_global,0), t = (g_my_externd_global1, 0)
    q = (g_my_externd_global,0), t = (g_my_externd_global, 0) <---- Match! But these two zeroes here aren't 'equal'.
            /home/user/extern/main.c:1 <--------------------------- Success!? Not really.

    Edit: Added a comment about gcc in the conclusion above.

    Edit2: gdb may be of some help:

    [user@mach extern]$ gdb -q example-O0.elf
    Reading symbols from example-O0.elf...
    (gdb) info variables g_my_*
    All variables matching regular expression "g_my_*":
    
    File main.c:
    5:	int g_my_externd_global;
    6:	int g_my_externd_global1;
    7:	int g_my_externd_global2;
    8:	int g_my_externd_global3;
    9:	int g_my_private_global;
    (gdb)

  • FYI: A bug has been opened, presumably by the OP. A fix was committed yesterday.

  • Yes, I opened the report. I tried the patch on both x86 and ARM (recompiling your toolchain arm-none-eabi-). I published the results.
    Nm fails to retrieve file names only for object files compiled with -fcommon, while the related elf files are OK.

    So I assume there's still something to fix.

    In the bug report I've attached also the python script I used to do the tests