Linux Schedulers - Ensuring Optimal System Performance

October 28, 2023

by Davor Lozic, CTO

Introduction

The Linux scheduler is a crucial component in the architecture of one of the world's most popular operating systems. It determines the efficiency and responsiveness of various systems, ranging from mobile devices to powerful servers. Although the scheduler is mostly invisible to end-users, it is responsible for managing how processes share the CPU resources of the system. The task of the scheduler is simple in description but complex in execution: it decides which process runs at any given time, deciding a balance between quick response times and efficient CPU resource utilization.

At its core, a scheduler must navigate the trade-off between time division and throughput maximization. Prioritizing a highly responsive system can lead to frequent context switching, adding to overhead to overall CPU efficiency. On the other hand, focusing just on maximizing throughput might result in periods where the system appears less responsive or interactive tasks, like interface movements, where OS seems unresponsive. Good scheduler is about finding the balance between them.

Basics of Linux scheduling and process states

As already mentioned, Linux scheduler is responsible for determining which processes should run, for how long, and in what order. This ultimately affects the system's overall performance and responsiveness. The scheduler achieves this by using various algorithms and policies that are tailored to the nature of the task and the overall system goals. Key challenge for the scheduler is to maintain a balance between the time-sensitive requirements of interactive processes, like user interfaces, and the resource demands of background processes, such as system services or batch jobs.

Processes in Linux can exist in various states, reflecting different stages in their lifecycle. The primary states are:

Running: actively executing or ready to execute when a CPU becomes available
Sleeping: blocked, waiting for an event or resource
Stopped: suspended from execution, typically by a signal
Zombie: completed execution but awaiting the parent process to retrieve its exit status.

Understanding these states is important as the scheduler's decision-making heavily relies on them. For example, a running process may be preempted by a higher-priority task, shifting it back to a ready state. Similarly, a sleeping process might transition to running once the resource it is waiting for becomes available. The scheduler manages these transitions, constantly overseeing and orchestrating the interplay of the processes. It responds to system activity and user commands, ensuring system stability and efficiency.

Some historical context

Starting from the earliest versions of Linux, the scheduling mechanism has undergone significant transformations, each aimed at optimizing system performance based on the prevailing computing environments and user demands.

The journey began with the simplest round-robin scheduling in Linux 0.01 (around 1991), designed for minimalistic task management. This approach evolved into a more refined strategy with the Linux 1.3.xx and 2.2 series (1995 - 1999), establishing basic priority systems, introducing fundamental concepts of task scheduling that would be built upon in the future.

An important moment in Linux's scheduler development was the introduction of the O(1) scheduler (2003) in the Linux 2.5 kernel series. This scheduler offered constant time complexity regardless of the number of tasks, greatly improving system responsiveness and scalability, particularly for systems handling numerous concurrent processes. However, as the computing world progressed, the need for a more adaptive and fairness-oriented scheduler became apparent, especially in desktop and interactive computing.

Creation of Completely Fair Scheduler (CFS), introduced in Linux kernel 2.6.23 (2007), was important historical moment as well, replacing the O(1) scheduler. It aimed to provide equitable CPU time distribution based on task priority and weight. By using a red-black tree for task management, CFS dynamically adjusted the CPU time allocated to each process, ensuring a more balanced and proportional distribution of resources. This change greatly benefited general-purpose and desktop computing, where interactive responsiveness is crucial.

Name of Scheduler	Published Year	Specialized For	Description
Linux 0.01 Scheduler	1991	Basic task management	The first and simplest Linux scheduler, based on a round-robin algorithm without process priorities.
Linux 1.3.xx Scheduler	Mid-1990s	Improved task prioritization	An evolution of the initial scheduler, introducing a basic priority system for tasks.
Linux 2.2 Scheduler	1999	Enhanced priority handling	Improved upon earlier versions with better process priority management and scheduling efficiency.
O(1) Scheduler	2003	High-volume task management	Named for its constant-time complexity, it significantly improved performance for systems with many tasks.
Completely Fair Scheduler (CFS)	2007	Fair resource distribution, General-purpose	Utilizes a red-black tree for managing tasks, focusing on equitable CPU time distribution according to task priority and weight.
Brain Fuck Scheduler (BFS)	2009	Desktop performance, Lower-latency	Designed for better desktop performance with a focus on lower latency and simplicity; not merged into the mainline kernel.
Multiple Queue Skiplist Scheduler (MuQSS)	2016	Desktop performance, Task management efficiency	Successor to BFS, offering optimizations for desktop performance and efficiency using a skiplist data structure; also not part of the mainline kernel.

Figure 1: List of most popular Linux schedulers

Over time, other specialized schedulers like the Brain Fuck Scheduler (BFS) in 2009. and its successor, the Multiple Queue Skiplist Scheduler (MuQSS) in 2016. emerged to target specific use cases such as desktop performance. Next section will delve into details of the most commonly used scheduler, CFS.

Completely Fair Scheduler (CFS)

Introduced in 2007, CFS brought a significant change in process scheduling in Linux. Developed by Ingo Molnar, CFS replaced the earlier O(1) Scheduler and focused on achieving maximum fairness in CPU time allocation among processes. Unlike its predecessors, CFS prioritized fair distribution of CPU time rather than just task prioritization and time slicing.

Internally, CFS utilizes a time-division multiplexing scheme. Each process is assigned a slice of time on the CPU, known as its virtual runtime. The scheduler maintains a red-black tree, a self-balancing binary tree data structure, for all runnable tasks. Each node in the tree represents a process, and the key is the virtual runtime. The process with the smallest virtual runtime becomes the leftmost node and is chosen to run next. This design ensures that the process with the least runtime is given priority, promoting fairness. The virtual runtime of each process is updated as it runs, moving the process deeper into the right side of the tree. This allows processes with lower runtimes to emerge and receive their turn.

Figure 2: Red-black tree in CFS

CFS also incorporates sleeper fairness in its task scheduling. This feature ensures that interactive tasks, which frequently sleep and wake up, are not deprived of their fair share of CPU time. Additionally, CFS dynamically adjusts the time slice of each process based on the system load, further refining its granularity. This makes CFS highly effective in both server and desktop environments. The adoption of a balanced and fairness-oriented scheduling approach significantly enhanced the responsiveness and overall user experience of Linux systems, especially in environments with heavy multitasking and diverse process loads.

Real-time schedulers in Linux

Real-time scheduling in Linux is a crucial feature for applications that have strict timing constraints. This is especially important in embedded systems, telecommunications, and multimedia. Real-time schedulers are specifically designed to provide deterministic task scheduling. Their goal is to ensure that high-priority tasks receive CPU time within predictable time frames. This is essential, as certain applications like a heart-rate monitor cannot afford to wait for background tasks.

Unlike conventional schedulers that prioritize fair time sharing among all processes, real-time schedulers prioritize tasks based on their urgency and importance. They often prioritize these factors over notions of fairness. In the Linux kernel, real-time scheduling is primarily handled by two scheduling classes: SCHED_FIFO (First In, First Out) and SCHED_RR (Round Robin).

SCHED_FIFO is a simple scheduling algorithm where each process is assigned a priority. The CPU is given to the highest priority process, and within the same priority level, the first process to come is the first to be served (hence FIFO). This scheduler does not enforce time slicing; thus, a high-priority task can run indefinitely, as long as no higher-priority task is ready to run. This can lead to starvation of lower-priority tasks, but it's acceptable in environments where certain tasks must be guaranteed to run without interruption.

In contrast, SCHED_RR is similar to SCHED_FIFO, but adds time slicing at each priority level. This means that tasks at the same priority level are scheduled in a round-robin fashion, each running for a fixed slice of time before passing control to the next task in line. This mechanism provides a more balanced approach among tasks of the same priority, preventing the monopolization of CPU resources.

The introduction of the CONFIG_PREEMPT_RT patch (real-time patch) into the Linux kernel brought significant enhancements to real-time scheduling. This patch transforms Linux into a preemptive kernel, allowing a higher-priority task to preempt a lower-priority task. Preemption improves the responsiveness of real-time tasks, essential in systems where delay or lag cannot be tolerated.

Conclusion

The exploration of the Linux scheduler highlights the complex challenges involved in managing CPU resources in diverse computing environments. From its early stages in Linux 0.01 to the current dominance of the Completely Fair Scheduler (CFS) and real-time alternatives like SCHED_FIFO and SCHED_RR, the Linux scheduling mechanism has evolved to meet changing user demands. Each iteration of the scheduler has introduced new strategies to balance responsiveness, fairness, and throughput.

While CFS prioritizes fair distribution of CPU time and excels in general-purpose and desktop systems, real-time schedulers like SCHED_FIFO and SCHED_RR specialize to environments requiring strict timing and predictability, such as embedded systems and critical multimedia applications.

The scheduler, although often unseen, plays a crucial role in efficient process management and system performance, showcasing Linux's relevance and adaptability in the ever-changing landscape of technology.

Our offices

Follow us

Linux Schedulers - Ensuring Optimal System Performance

Introduction

Basics of Linux scheduling and process states

Some historical context

Completely Fair Scheduler (CFS)

Real-time schedulers in Linux

Conclusion

More articles

Number System: Binary, Decimal, Octal, Hexadecimal

Binary 101: The Language of Computers

Tell us about your project

Our offices