By convention, compiled programs start executing their main() function. For example, in this Rust program:

fn print_world() {
    println!(" world");
}

fn main() {
    print!("hello");
    print_world();
}

the first thing that happens is "hello" is printed because it is the first thing that happens in main(). It doesn't matter that print_world() is defined first.

Perhaps you know that main() is not actually the first thing that executes when your program runs. The first thing that happens is a _start() function is executed which calls main(). _start() is defined by the C runtime which is injected into your program by the linker. See Matt Godbolt's 2018 CppCon talk "The Bits Between the Bits: How We Get to main()".

You can demonstrate this by writing a small assembly program. To keep things small, our program will simply exit with a distinctive code (I like 123) so that we can recognize that something happened.

_start:
  movl $60,  %eax
  movq  $123,  %rdi
  syscall

.global _start

Run this with:

clang -o main main.s -nostdlib -nodefaultlibs && ./main
# exit 123

What is so special about _start? Is it compulsory to use a symbol called _start, or could we use a symbol called begin?

In short: you can use any symbol as the entry point to a program, you just have to write a linker script to do so.

A linker script is an argument to the linker that specifies exactly how the final executable should be laid out. Writing your own linker script gives you full control over the order of sections in the executable. This is important when you are writing something like an operating system where your binary must have a defined shape to be used by the bootloader.

ld has a default linker script that it uses when no linker script is provided. It can be inspected by doing:

$ ld --verbose
GNU ld (GNU Binutils) 2.46.0
  Supported emulations:
   elf_x86_64
   elf_i386
   elf32_x86_64
   elf_iamcu
   i386pep
   i386pe
   elf64bpf
using internal linker script:
==================================================
/* Script for -z combreloc -z separate-code */
/* Copyright (C) 2014-2026 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.  */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("/usr/x86_64-pc-linux-gnu/lib64"); SEARCH_DIR("/usr/lib"); SEARCH_DIR("/usr/local/lib"); SEARCH_DIR("/usr/x86_64-pc-linux-gnu/lib");
...

Note the line ENTRY(_start). This tells the linker to use the address of the _start symbol as the entrypoint. The linker puts this address in the header of the executable and the operating system (or whatever is running this program) will jump to that address after loading the executable into memory. We can modify this script and replace ENTRY(_start) with ENTRY(begin) to accomplish our goal.

ENTRY(begin)
SECTIONS
{
  . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  .text : { *( .text ) }
  . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
}

I took the default linker script and removed everything that didn't cause a warning if it was missing. I don't fully understand the SEGMENT_START and DATA_SEGMENT_ALIGN lines, I'll look into those someday. The line

.text : { *( .text ) }

simply copies the .text section (where the program's instructions are) to the output .text section.

Now, we can write this assembly language program:

begin:
  movl $60,  %eax
  movq  $123,  %rdi
  syscall

.global begin

Run with:

clang -o main main.s -nostdlib -nodefaultlibs -Wl,-T,link.ld && ./main
# exit 123