Optimisations of AVR programs using avr-gcc
Programming Techniques
Data types
- Prefer short data types. If you have a small counter, prefer unsigned char or uint8_t to int. Together with some compiler option that will be a big win.
- Group logically related information into structures (smaller than 64 bytes total size). That's not just good programming practice, but together with the next tip it will allow the compiler to produce smaller code.
- Use a pointer to a structure when possible because it produces tighter code. A pointer to a structure makes GCC use indirect memory addressing. The onus is on the programmer to check whether direct or indirect addressing is appropriate in each case, and further to check whether the code compiles to the intended addressing mode. The following compiler trick will force GCC to use a local pointer instead of the address of a structure.
#define FIX_POINTER(_ptr) __asm__ __volatile__("" : "=b" (_ptr) : "0" (_ptr))
struct my_struct global_structure;
//…
void my_function(void)
{
struct my_struct *ptr = &global_structure;
FIX_POINTER(ptr);
// from now on ptr will be used to access the members of the structure.
//…
}
Using ptr to access members of global_structure leads sometimes to smaller code. - Often used variables (especially if in ISRs) can be stored in registers instead of variables. You can bind a register with GCC by using the following definition:
uint8_t my_variable __asm__("r12");
Loops and conditional statements
Loops can be written in an optimised way. Atmel suggests for (;;) {} for endless loops. For loops with an ending condition GCC produces slightly smaller code when the do {} while (condition); statement is used instead of a while (condition) {}.
Using pre-increments (++i) wisely can also reduce your code size. Using pre-increments or post-increments in statements makes no difference at all: The expression ++i; will generate the same amount of code as i++;. But when you use the result of the increment in a loop condition or a conditional statement then the situation changes. To see how, consider the following code:
if (i--) {
// do something
}
This code will be translated into something like:
- load the value of i into a temporary variable
- decrement i
- compare the temporary variable against zero
- if zero, jump to the instruction immediately after the block
When we use pre-decrement form (if (--i) { … }), we can potentially save two operations: the copy operation and the compare. In fact, the generated code might read like:
- decrement i (this already sets the zero flag)
- if zero, jump to the instruction immediately after the block
Compiler Flags
- Use short types
- -funsigned-char
- -funsigned-bitfields
- -fpack-struct
- -fshort-enums
- Set the cost of inline calls
- --param inline-call-cost=2
- -finline-limit=3
- -fno-inline-small-functions
- Prevents or limits the compiler from automatically inlining small functions automatically for speed. Doing so will usually make the program faster, but if the routine is used a lot you'll end up with a larger FLASH usage.
- Don't include unused function and data
- -ffunction-sections
- -fdata-sections
- Generally used with --gc-sections. This causes each function to be placed into a separate internal memory section, which --gc-sections can then discard if the section (function) is unreferenced. Those two together will only help if you have a bunch of unused functions in your code, such as when you are using a library compiled with the option and linked statically.
- Note that if you have any "naked" functions in the placed in an .initN or .finiN section, you should also mark the function with a "used" attribute to prevent it's being optimised out. (Thanks to David Boone.)
- Compile a freestanding program
- -ffreestanding and void main() __attribute__ ((noreturn));
- Tell the compiler your main function is not returning. This saves some bytes on the stack.
- Linker Relaxation
- -Wl,--relax
- Almost always a win. Note that this is a linker flag that is passed to the linker using the -Wl flag.
- Enables linker relaxations. By default, the linker links functions will a full CALL statement, which is wasteful if two functions are near each other. Relaxations will do more in the future, but currently (AFAIK) just replace CALL statements with RCALL where possible to save a few bytes.
- Call Prologues/Epilogues
- -mcall-prologues
- Your application must be large enough for this to be a win. Take a close look at the code size before and after to make sure that this will actually decrease your code size.
- Whole Program Optimisation
- --combine -fwhole-program
- Using these two flags will turn on whole program optimisation, but can only be done on C code (which is typical for an AVR). The catch is that you have to modify your Makefile to have it execute only a single command-line call to avr-gcc with *all* of your C files and all compiler and linker flags. GCC will then combine, compile, and link all the code into the final ELF file, but this also allows it to do whole program optimisation.
- Loop optimisation
- -fno-tree-scev-cprop
- This option is new in GCC 4.3.0. It will vectorise the outer loop when multiple loops are nested for a size benefit.
- Wide types
- -fno-split-wide-types
- When using a type that occupies multiple registers, such as long long on a 32-bit system, split the registers apart and allocate them independently. This normally generates better code for those types, but may make debugging more difficult. The option -fno-split-wide-types will stop this.
Examples of compilation flags
The following section shows a set of compiler options that were useful on some of my projects. Please note that the effect of one option can decrease the code size as well as increasing it. The effect varies from project to project, so the proposed set of options should be taken with a pinch of salt.
avr-gcc v4.2.2
-std=gnu99 -W -Wall -pedantic -Wstrict-prototypes -Wundef -Werror
-funsigned-char -funsigned-bitfields -ffunction-sections -fpack-struct -fshort-enums
-ffreestanding -Os -g -gdwarf-2
--combine -fwhole-program
-Wl,--relax,--gc-sections
avr-gcc v4.3.3
-std=gnu99 -W -Wall -pedantic -Wstrict-prototypes -Wundef -Werror
-funsigned-char -funsigned-bitfields -ffunction-sections -fpack-struct -fshort-enums
-ffreestanding -Os -g -gdwarf-2
--combine -fwhole-program
-fno-inline-small-functions -fno-split-wide-types -fno-tree-scev-cprop
-Wl,--relax,--gc-sections