Hello,I'm trying to compile an old code with an AArch64 processor but it's leading to memory leaks. After some troubleshooting, I have been able to locate one potential hot spot that includes the following definitions:
#define MAXCHAR 255
#define MAXSHORT 32767
#define MINSHORT -32768
#define MAXTABLE 32500
#define BITS_PER_WORD 32
#define WORDSIZE(n) (((n)+(BITS_PER_WORD-1))/BITS_PER_WORD)
#define BIT(r, n) ((((r)[(n)>>5])>>((n)&31))&1)
#define SETBIT(r, n) ((r)[(n)>>5]|=((unsigned)1<<((n)&31)))
These values are unlikely to be accurate for an AArch64 CPU and I was wondering if anyone has any suggestions on what values to assign (note: the compiler is GCC-11). Thank you.
Is the code part of Berkeley Yacc? Here they say that "BITS_PER_WORD is the number of bits in a C unsigned". The size of unsigned on aarch64 is 32 bits, so the definitions remain valid. Moreover, armv8 defines the "Word" datatype as an integer of size 32 bits, which coincides with the definition of the word as utilized by the program.
The term WORD in the snippet does not seem to mean the actual machine word, which is indeed 64 bits in size for aarch64.
As long as the parameters passed to these macros are unsigned integers or arrays of unsigned integers, the program should behave properly. For instance, the function set_EFF in here calls SETBIT on an array of unsigned; the function should behave as expected on aarch64 without any changes.
Edit: The comments inside defs.h do seem to say that these definitions are 'machine-dependent'. Even if we assume that the term WORD implies the machine word, no changes are necessary for aarch64. The C unsigned type is at least 16 bits; the code might have, at one point in time, tried to optimize itself by creating arrays of machine-word-sized integers, where the machines had either 16-bit machine-word, or 32-bit machine word, and where the size of unsigned differed respectively. Although the machine word on aarch64 is 64-bit in size, the unsigned 'abstraction' is still 32-bit, and aarch64 does allow accessing the lower 32-bit portion of its 64-bit registers by accessing w###.
Edit2: The code base within FreeBSD source is likely to have fixes that may help run the program on 64-bit architectures.
Hi @a.surati,The snippet is a component of a larger postprocessing tool (NCL) but it seems to be somewhat replicating BYACC. The memory leak seems to be coming from what they named as nyacc (github.com/.../nyacc), which probably stands for NCL Yacc. If these definitions are not the root of the issue, I wonder if some function is not passing (arrays of) unsigned integers as you mention. Thanks.
The latest byacc is available here. The latest change at this time is from Aug 2021, where the author acknowledges a fix for some memory leaks in its CHANGES file.
Some of these leaks may not be a problem. If the allocation is supposed to be done only once during the lifetime of the nyacc program, and there's no memory pressure, then freeing the memory isn't necessary - all memory utilized by the program gets freed anyways when it exits.
If one wants to change the macros, then the type of the corresponding arrays must also be changed from 'unsigned' to 'unsigned long'.
The modified macros are:
#define BITS_PER_WORD 64
#define BIT(r, n) ((((r)[(n)>>6])>>((n)&63))&1)
#define SETBIT(r, n) ((r)[(n)>>6]|=((unsigned long)1<<((n)&63)))
Is it possible for NCL to just replace the nyacc with the latest available byacc? The nyacc package is indeed byacc, although an ancient one, as confirmed by the NO_WARRANTY license file included with nyacc.
Hi @a.surati. Unfortunately the developers haven't updated their Yacc over the years and, once you start pulling the thread, there are lots of things that could be modified or simply replaced. My guess is that substituting nyacc with the latest version of byacc would break not one but several things. Thanks.