Kernel-4.18.0-80.el8_idle-states

==========================================
ARM idle states binding description
==========================================

==========================================
1 - Introduction
==========================================

ARM systems contain HW capable of managing power consumption dynamically,
where cores can be put in different low-power states (ranging from simple
wfi to power gating) according to OS PM policies. The CPU states representing
the range of dynamic idle states that a processor can enter at run-time, can be
specified through device tree bindings representing the parameters required
to enter/exit specific idle states on a given processor.

According to the Server Base System Architecture document (SBSA, [3]), the
power states an ARM CPU can be put into are identified by the following list:

  • Running
  • Idle_standby
  • Idle_retention
  • Sleep
  • Off

The power states described in the SBSA document define the basic CPU states on
top of which ARM platforms implement power management schemes that allow an OS
PM implementation to put the processor in different idle states (which include
states listed above; “off” state is not an idle state since it does not have
wake-up capabilities, hence it is not considered in this document).

Idle state parameters (eg entry latency) are platform specific and need to be
characterized with bindings that provide the required information to OS PM
code so that it can build the required tables and use them at runtime.

The device tree binding definition for ARM idle states is the subject of this
document.

===========================================
2 - idle-states definitions
===========================================

Idle states are characterized for a specific system through a set of
timing and energy related properties, that underline the HW behaviour
triggered upon idle states entry and exit.

The following diagram depicts the CPU execution phases and related timing
properties required to enter and exit an idle state:

..[EXEC]|[PREP]|[ENTRY]|[IDLE]|[EXIT]|[EXEC]..
| | | | |

    |<------ entry ------->|
    |       latency        |
                      |<- exit ->|
                      |  latency |
    |<-------- min-residency -------->|
           |<-------  wakeup-latency ------->|

    Diagram 1: CPU idle state execution phases

EXEC: Normal CPU execution.

PREP: Preparation phase before committing the hardware to idle mode
like cache flushing. This is abortable on pending wake-up
event conditions. The abort latency is assumed to be negligible
(i.e. less than the ENTRY + EXIT duration). If aborted, CPU
goes back to EXEC. This phase is optional. If not abortable,
this should be included in the ENTRY phase instead.

ENTRY: The hardware is committed to idle mode. This period must run
to completion up to IDLE before anything else can happen.

IDLE: This is the actual energy-saving idle period. This may last
between 0 and infinite time, until a wake-up event occurs.

EXIT: Period during which the CPU is brought back to operational
mode (EXEC).

entry-latency: Worst case latency required to enter the idle state. The
exit-latency may be guaranteed only after entry-latency has passed.

min-residency: Minimum period, including preparation and entry, for a given
idle state to be worthwhile energywise.

wakeup-latency: Maximum delay between the signaling of a wake-up event and the
CPU being able to execute normal code again. If not specified, this is assumed
to be entry-latency + exit-latency.

These timing parameters can be used by an OS in different circumstances.

An idle CPU requires the expected min-residency time to select the most
appropriate idle state based on the expected expiry time of the next IRQ
(ie wake-up) that causes the CPU to return to the EXEC phase.

An operating system scheduler may need to compute the shortest wake-up delay
for CPUs in the system by detecting how long will it take to get a CPU out
of an idle state, eg:

wakeup-delay = exit-latency + max(entry-latency - (now - entry-timestamp), 0)

In other words, the scheduler can make its scheduling decision by selecting
(eg waking-up) the CPU with the shortest wake-up latency.
The wake-up latency must take into account the entry latency if that period
has not expired. The abortable nature of the PREP period can be ignored
if it cannot be relied upon (e.g. the PREP deadline may occur much sooner than
the worst case since it depends on the CPU operating conditions, ie caches
state).

An OS has to reliably probe the wakeup-latency since some devices can enforce
latency constraints guarantees to work properly, so the OS has to detect the
worst case wake-up latency it can incur if a CPU is allowed to enter an
idle state, and possibly to prevent that to guarantee reliable device
functioning.

The min-residency time parameter deserves further explanation since it is
expressed in time units but must factor in energy consumption coefficients.

The energy consumption of a cpu when it enters a power state can be roughly
characterised by the following graph:

           |
           |
           |
       e   |
       n   |                                      /---
       e   |                               /------
       r   |                        /------
       g   |                  /-----
       y   |           /------
           |       ----
           |      /|
           |     / |
           |    /  |
           |   /   |
           |  /    |
           | /     |
           |/      |
      -----|-------+----------------------------------
          0|       1                              time(ms)

    Graph 1: Energy vs time example

The graph is split in two parts delimited by time 1ms on the X-axis.
The graph curve with X-axis values = { x | 0 < x < 1ms } has a steep slope
and denotes the energy costs incurred whilst entering and leaving the idle
state.
The graph curve in the area delimited by X-axis values = {x | x > 1ms } has
shallower slope and essentially represents the energy consumption of the idle
state.

min-residency is defined for a given idle state as the minimum expected
residency time for a state (inclusive of preparation and entry) after
which choosing that state become the most energy efficient option. A good
way to visualise this, is by taking the same graph above and comparing some
states energy consumptions plots.

For sake of simplicity, let’s consider a system with two idle states IDLE1,
and IDLE2:

      |
      |
      |
      |                                                  /-- IDLE1
   e  |                                              /---
   n  |                                         /----
   e  |                                     /---
   r  |                                /-----/--------- IDLE2
   g  |                    /-------/---------
   y  |        ------------    /---|
      |       /           /----    |
      |      /        /---         |
      |     /    /----             |
      |    / /---                  |
      |   ---                      |
      |  /                         |
      | /                          |
      |/                           |                  time
   ---/----------------------------+------------------------
      |IDLE1-energy < IDLE2-energy | IDLE2-energy < IDLE1-energy
                                   |
                            IDLE2-min-residency

    Graph 2: idle states min-residency example

In graph 2 above, that takes into account idle states entry/exit energy
costs, it is clear that if the idle state residency time (ie time till next
wake-up IRQ) is less than IDLE2-min-residency, IDLE1 is the better idle state
choice energywise.

This is mainly down to the fact that IDLE1 entry/exit energy costs are lower
than IDLE2.

However, the lower power consumption (ie shallower energy curve slope) of idle
state IDLE2 implies that after a suitable time, IDLE2 becomes more energy
efficient.

The time at which IDLE2 becomes more energy efficient than IDLE1 (and other
shallower states in a system with multiple idle states) is defined
IDLE2-min-residency and corresponds to the time when energy consumption of
IDLE1 and IDLE2 states breaks even.

The definitions provided in this section underpin the idle states
properties specification that is the subject of the following sections.

===========================================
3 - idle-states node
===========================================

ARM processor idle states are defined within the idle-states node, which is
a direct child of the cpus node [1] and provides a container where the
processor idle states, defined as device tree nodes, are listed.

  • idle-states node

    Usage: Optional - On ARM systems, it is a container of processor idle

            states nodes. If the system does not provide CPU
            power management capabilities or the processor just
            supports idle_standby an idle-states node is not
            required.
    

    Description: idle-states node is a container node, where its

           subnodes describe the CPU idle states.
    

    Node name must be “idle-states”.

    The idle-states node’s parent node must be the cpus node.

    The idle-states node’s child nodes can be:

    • one or more state nodes

      Any other configuration is considered invalid.

      An idle-states node defines the following properties:

    • entry-method
      Value type:
      Usage and definition depend on ARM architecture version.

        # On ARM v8 64-bit this property is required and must
          be one of:
           - "psci" (see bindings in [2])
        # On ARM 32-bit systems this property is optional
      

The nodes describing the idle states (state) can only be defined within the
idle-states node, any other configuration is considered invalid and therefore
must be ignored.

===========================================
4 - state node
===========================================

A state node represents an idle state description and must be defined as
follows:

  • state node

    Description: must be child of the idle-states node

    The state node name shall follow standard device tree naming
    rules ([5], 2.2.1 “Node names”), in particular state nodes which
    are siblings within a single common parent must be given a unique name.

    The idle state entered by executing the wfi instruction (idle_standby
    SBSA,[3][4]) is considered standard on all ARM platforms and therefore
    must not be listed.

    With the definitions provided above, the following list represents
    the valid properties for a state node:

    • compatible
      Usage: Required
      Value type:
      Definition: Must be “arm,idle-state”.

    • local-timer-stop
      Usage: See definition
      Value type:
      Definition: if present the CPU local timer control logic is

            lost on state entry, otherwise it is retained.
      
    • entry-latency-us
      Usage: Required
      Value type:
      Definition: u32 value representing worst case latency in

            microseconds required to enter the idle state.
            The exit-latency-us duration may be guaranteed
            only after entry-latency-us has passed.
      
    • exit-latency-us
      Usage: Required
      Value type:
      Definition: u32 value representing worst case latency

            in microseconds required to exit the idle state.
      
    • min-residency-us
      Usage: Required
      Value type:
      Definition: u32 value representing minimum residency duration

            in microseconds, inclusive of preparation and
            entry, for this idle state to be considered
            worthwhile energy wise (refer to section 2 of
            this document for a complete description).
      
    • wakeup-latency-us:
      Usage: Optional
      Value type:
      Definition: u32 value representing maximum delay between the

            signaling of a wake-up event and the CPU being
            able to execute normal code again. If omitted,
            this is assumed to be equal to:
      
            entry-latency-us + exit-latency-us
      
            It is important to supply this value on systems
            where the duration of PREP phase (see diagram 1,
            section 2) is non-neglibigle.
            In such systems entry-latency-us + exit-latency-us
            will exceed wakeup-latency-us by this duration.
      
    • status:
      Usage: Optional
      Value type:
      Definition: A standard device tree property [5] that indicates

            the operational status of an idle-state.
            If present, it shall be:
            "okay": to indicate that the idle state is
                operational.
            "disabled": to indicate that the idle state has
                been disabled in firmware so it is not
                operational.
            If the property is not present the idle-state must
            be considered operational.
      
    • idle-state-name:
      Usage: Optional
      Value type:
      Definition: A string used as a descriptive name for the idle

            state.
      

      In addition to the properties listed above, a state node may require
      additional properties specifics to the entry-method defined in the
      idle-states node, please refer to the entry-method bindings
      documentation for properties definitions.

===========================================
4 - Examples
===========================================

Example 1 (ARM 64-bit, 16-cpu system, PSCI enable-method):

cpus {
#size-cells = <0>;
#address-cells = <2>;

CPU0: cpu@0 {
    device_type = "cpu";
    compatible = "arm,cortex-a57";
    reg = <0x0 0x0>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_0_0 &CPU_SLEEP_0_0
               &CLUSTER_RETENTION_0 &CLUSTER_SLEEP_0>;
};

CPU1: cpu@1 {
    device_type = "cpu";
    compatible = "arm,cortex-a57";
    reg = <0x0 0x1>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_0_0 &CPU_SLEEP_0_0
               &CLUSTER_RETENTION_0 &CLUSTER_SLEEP_0>;
};

CPU2: cpu@100 {
    device_type = "cpu";
    compatible = "arm,cortex-a57";
    reg = <0x0 0x100>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_0_0 &CPU_SLEEP_0_0
               &CLUSTER_RETENTION_0 &CLUSTER_SLEEP_0>;
};

CPU3: cpu@101 {
    device_type = "cpu";
    compatible = "arm,cortex-a57";
    reg = <0x0 0x101>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_0_0 &CPU_SLEEP_0_0
               &CLUSTER_RETENTION_0 &CLUSTER_SLEEP_0>;
};

CPU4: cpu@10000 {
    device_type = "cpu";
    compatible = "arm,cortex-a57";
    reg = <0x0 0x10000>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_0_0 &CPU_SLEEP_0_0
               &CLUSTER_RETENTION_0 &CLUSTER_SLEEP_0>;
};

CPU5: cpu@10001 {
    device_type = "cpu";
    compatible = "arm,cortex-a57";
    reg = <0x0 0x10001>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_0_0 &CPU_SLEEP_0_0
               &CLUSTER_RETENTION_0 &CLUSTER_SLEEP_0>;
};

CPU6: cpu@10100 {
    device_type = "cpu";
    compatible = "arm,cortex-a57";
    reg = <0x0 0x10100>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_0_0 &CPU_SLEEP_0_0
               &CLUSTER_RETENTION_0 &CLUSTER_SLEEP_0>;
};

CPU7: cpu@10101 {
    device_type = "cpu";
    compatible = "arm,cortex-a57";
    reg = <0x0 0x10101>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_0_0 &CPU_SLEEP_0_0
               &CLUSTER_RETENTION_0 &CLUSTER_SLEEP_0>;
};

CPU8: cpu@100000000 {
    device_type = "cpu";
    compatible = "arm,cortex-a53";
    reg = <0x1 0x0>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_1_0 &CPU_SLEEP_1_0
               &CLUSTER_RETENTION_1 &CLUSTER_SLEEP_1>;
};

CPU9: cpu@100000001 {
    device_type = "cpu";
    compatible = "arm,cortex-a53";
    reg = <0x1 0x1>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_1_0 &CPU_SLEEP_1_0
               &CLUSTER_RETENTION_1 &CLUSTER_SLEEP_1>;
};

CPU10: cpu@100000100 {
    device_type = "cpu";
    compatible = "arm,cortex-a53";
    reg = <0x1 0x100>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_1_0 &CPU_SLEEP_1_0
               &CLUSTER_RETENTION_1 &CLUSTER_SLEEP_1>;
};

CPU11: cpu@100000101 {
    device_type = "cpu";
    compatible = "arm,cortex-a53";
    reg = <0x1 0x101>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_1_0 &CPU_SLEEP_1_0
               &CLUSTER_RETENTION_1 &CLUSTER_SLEEP_1>;
};

CPU12: cpu@100010000 {
    device_type = "cpu";
    compatible = "arm,cortex-a53";
    reg = <0x1 0x10000>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_1_0 &CPU_SLEEP_1_0
               &CLUSTER_RETENTION_1 &CLUSTER_SLEEP_1>;
};

CPU13: cpu@100010001 {
    device_type = "cpu";
    compatible = "arm,cortex-a53";
    reg = <0x1 0x10001>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_1_0 &CPU_SLEEP_1_0
               &CLUSTER_RETENTION_1 &CLUSTER_SLEEP_1>;
};

CPU14: cpu@100010100 {
    device_type = "cpu";
    compatible = "arm,cortex-a53";
    reg = <0x1 0x10100>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_1_0 &CPU_SLEEP_1_0
               &CLUSTER_RETENTION_1 &CLUSTER_SLEEP_1>;
};

CPU15: cpu@100010101 {
    device_type = "cpu";
    compatible = "arm,cortex-a53";
    reg = <0x1 0x10101>;
    enable-method = "psci";
    cpu-idle-states = <&CPU_RETENTION_1_0 &CPU_SLEEP_1_0
               &CLUSTER_RETENTION_1 &CLUSTER_SLEEP_1>;
};

idle-states {
    entry-method = "psci";

    CPU_RETENTION_0_0: cpu-retention-0-0 {
        compatible = "arm,idle-state";
        arm,psci-suspend-param = <0x0010000>;
        entry-latency-us = <20>;
        exit-latency-us = <40>;
        min-residency-us = <80>;
    };

    CLUSTER_RETENTION_0: cluster-retention-0 {
        compatible = "arm,idle-state";
        local-timer-stop;
        arm,psci-suspend-param = <0x1010000>;
        entry-latency-us = <50>;
        exit-latency-us = <100>;
        min-residency-us = <250>;
        wakeup-latency-us = <130>;
    };

    CPU_SLEEP_0_0: cpu-sleep-0-0 {
        compatible = "arm,idle-state";
        local-timer-stop;
        arm,psci-suspend-param = <0x0010000>;
        entry-latency-us = <250>;
        exit-latency-us = <500>;
        min-residency-us = <950>;
    };

    CLUSTER_SLEEP_0: cluster-sleep-0 {
        compatible = "arm,idle-state";
        local-timer-stop;
        arm,psci-suspend-param = <0x1010000>;
        entry-latency-us = <600>;
        exit-latency-us = <1100>;
        min-residency-us = <2700>;
        wakeup-latency-us = <1500>;
    };

    CPU_RETENTION_1_0: cpu-retention-1-0 {
        compatible = "arm,idle-state";
        arm,psci-suspend-param = <0x0010000>;
        entry-latency-us = <20>;
        exit-latency-us = <40>;
        min-residency-us = <90>;
    };

    CLUSTER_RETENTION_1: cluster-retention-1 {
        compatible = "arm,idle-state";
        local-timer-stop;
        arm,psci-suspend-param = <0x1010000>;
        entry-latency-us = <50>;
        exit-latency-us = <100>;
        min-residency-us = <270>;
        wakeup-latency-us = <100>;
    };

    CPU_SLEEP_1_0: cpu-sleep-1-0 {
        compatible = "arm,idle-state";
        local-timer-stop;
        arm,psci-suspend-param = <0x0010000>;
        entry-latency-us = <70>;
        exit-latency-us = <100>;
        min-residency-us = <300>;
        wakeup-latency-us = <150>;
    };

    CLUSTER_SLEEP_1: cluster-sleep-1 {
        compatible = "arm,idle-state";
        local-timer-stop;
        arm,psci-suspend-param = <0x1010000>;
        entry-latency-us = <500>;
        exit-latency-us = <1200>;
        min-residency-us = <3500>;
        wakeup-latency-us = <1300>;
    };
};

};

Example 2 (ARM 32-bit, 8-cpu system, two clusters):

cpus {
#size-cells = <0>;
#address-cells = <1>;

CPU0: cpu@0 {
    device_type = "cpu";
    compatible = "arm,cortex-a15";
    reg = <0x0>;
    cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
};

CPU1: cpu@1 {
    device_type = "cpu";
    compatible = "arm,cortex-a15";
    reg = <0x1>;
    cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
};

CPU2: cpu@2 {
    device_type = "cpu";
    compatible = "arm,cortex-a15";
    reg = <0x2>;
    cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
};

CPU3: cpu@3 {
    device_type = "cpu";
    compatible = "arm,cortex-a15";
    reg = <0x3>;
    cpu-idle-states = <&CPU_SLEEP_0_0 &CLUSTER_SLEEP_0>;
};

CPU4: cpu@100 {
    device_type = "cpu";
    compatible = "arm,cortex-a7";
    reg = <0x100>;
    cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
};

CPU5: cpu@101 {
    device_type = "cpu";
    compatible = "arm,cortex-a7";
    reg = <0x101>;
    cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
};

CPU6: cpu@102 {
    device_type = "cpu";
    compatible = "arm,cortex-a7";
    reg = <0x102>;
    cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
};

CPU7: cpu@103 {
    device_type = "cpu";
    compatible = "arm,cortex-a7";
    reg = <0x103>;
    cpu-idle-states = <&CPU_SLEEP_1_0 &CLUSTER_SLEEP_1>;
};

idle-states {
    CPU_SLEEP_0_0: cpu-sleep-0-0 {
        compatible = "arm,idle-state";
        local-timer-stop;
        entry-latency-us = <200>;
        exit-latency-us = <100>;
        min-residency-us = <400>;
        wakeup-latency-us = <250>;
    };

    CLUSTER_SLEEP_0: cluster-sleep-0 {
        compatible = "arm,idle-state";
        local-timer-stop;
        entry-latency-us = <500>;
        exit-latency-us = <1500>;
        min-residency-us = <2500>;
        wakeup-latency-us = <1700>;
    };

    CPU_SLEEP_1_0: cpu-sleep-1-0 {
        compatible = "arm,idle-state";
        local-timer-stop;
        entry-latency-us = <300>;
        exit-latency-us = <500>;
        min-residency-us = <900>;
        wakeup-latency-us = <600>;
    };

    CLUSTER_SLEEP_1: cluster-sleep-1 {
        compatible = "arm,idle-state";
        local-timer-stop;
        entry-latency-us = <800>;
        exit-latency-us = <2000>;
        min-residency-us = <6500>;
        wakeup-latency-us = <2300>;
    };
};

};

===========================================
5 - References
===========================================

[1] ARM Linux Kernel documentation - CPUs bindings
Documentation/devicetree/bindings/arm/cpus.txt

[2] ARM Linux Kernel documentation - PSCI bindings
Documentation/devicetree/bindings/arm/psci.txt

[3] ARM Server Base System Architecture (SBSA)
http://infocenter.arm.com/help/index.jsp

[4] ARM Architecture Reference Manuals
http://infocenter.arm.com/help/index.jsp

[5] Devicetree Specification
https://www.devicetree.org/specifications/