Where to begin
By convention, compiled programs start executing their main() function. For
example, in this Rust program:
fn print_world() {
println!(" world");
}
fn main() {
print!("hello");
print_world();
}
the first thing that happens is "hello" is printed because it is the first
thing that happens in main(). It doesn't matter that print_world() is
defined first.
Perhaps you know that main() is not actually the first thing that executes
when your program runs. The first thing that happens is a _start() function is
executed which calls main(). _start() is defined by the C runtime which is
injected into your program by the linker. See Matt Godbolt's 2018 CppCon talk
"The Bits Between the Bits: How We Get to main()".
You can demonstrate this by writing a small assembly program. To keep things
small, our program will simply exit with a distinctive code (I like 123) so
that we can recognize that something happened.
_start:
movl $60, %eax
movq $123, %rdi
syscall
.global _start
Run this with:
clang -o main main.s -nostdlib -nodefaultlibs && ./main
# exit 123
What is so special about _start? Is it compulsory to use a symbol called
_start, or could we use a symbol called begin?
In short: you can use any symbol as the entry point to a program, you just have to write a linker script to do so.
A linker script is an argument to the linker that specifies exactly how the final executable should be laid out. Writing your own linker script gives you full control over the order of sections in the executable. This is important when you are writing something like an operating system where your binary must have a defined shape to be used by the bootloader.
ld has a default linker script that it uses when no linker script is provided.
It can be inspected by doing:
$ ld --verbose
GNU ld (GNU Binutils) 2.46.0
Supported emulations:
elf_x86_64
elf_i386
elf32_x86_64
elf_iamcu
i386pep
i386pe
elf64bpf
using internal linker script:
==================================================
/* Script for -z combreloc -z separate-code */
/* Copyright (C) 2014-2026 Free Software Foundation, Inc.
Copying and distribution of this script, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved. */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("/usr/x86_64-pc-linux-gnu/lib64"); SEARCH_DIR("/usr/lib"); SEARCH_DIR("/usr/local/lib"); SEARCH_DIR("/usr/x86_64-pc-linux-gnu/lib");
...
Note the line ENTRY(_start). This tells the linker to use the address of the
_start symbol as the entrypoint. The linker puts this address in the header of
the executable and the operating system (or whatever is running this program)
will jump to that address after loading the executable into memory. We can
modify this script and replace ENTRY(_start) with ENTRY(begin) to accomplish
our goal.
ENTRY(begin)
SECTIONS
{
. = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
.text : { *( .text ) }
. = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
}
I took the default linker script and removed everything that didn't cause a
warning if it was missing. I don't fully understand the SEGMENT_START and
DATA_SEGMENT_ALIGN lines, I'll look into those someday. The line
.text : { *( .text ) }
simply copies the .text section (where the program's instructions are) to the
output .text section.
Now, we can write this assembly language program:
begin:
movl $60, %eax
movq $123, %rdi
syscall
.global begin
Run with:
clang -o main main.s -nostdlib -nodefaultlibs -Wl,-T,link.ld && ./main
# exit 123