1 Xen Hypervisor Command Line Options

This document covers the command line options which the Xen Hypervisor.

1.1 Types of parameter

Most parameters take the form option=value. Different options on the command line should be space delimited. All options are case sensitive, as are all values unless explicitly noted.

1.1.1 Boolean (<boolean>)

All boolean option may be explicitly enabled using a value of > yes, on, true, enable or 1

They may be explicitly disabled using a value of > no, off, false, disable or 0

In addition, a boolean option may be enabled by simply stating its name, and may be disabled by prefixing its name with no-.

####Examples

Enable noreboot mode > noreboot=true

Disable x2apic support (if present) > x2apic=off

Enable synchronous console mode > sync_console

Explicitly specifying any value other than those listed above is undefined, as is stacking a no- prefix with an explicit value.

1.1.2 Integer (<integer>)

An integer parameter will default to decimal and may be prefixed with a - for negative numbers. Alternatively, a hexadecimal number may be used by prefixing the number with 0x, or an octal number may be used if a leading 0 is present.

Providing a string which does not validly convert to an integer is undefined.

1.1.3 Size (<size>)

A size parameter may be any integer, with a single size suffix

Without a size suffix, the default will be kilo. Providing a suffix other than those listed above is undefined.

1.1.4 String

Many parameters are more complicated and require more intricate configuration. The detailed description of each individual parameter specify which values are valid.

1.1.5 List

Some options take a comma separated list of values.

1.1.6 Combination

Some parameters act as combinations of the above, most commonly a mix of Boolean and String. These are noted in the relevant sections.

1.2 Parameter details

1.2.1 acpi

= force | ht | noirq | <boolean> | verbose

String, or Boolean to disable.

By default, Xen will scan the DMI data and blacklist certain systems which are known to have broken ACPI setups. Providing acpi=force will cause Xen to ignore the blacklist and attempt to use all ACPI features.

Using acpi=ht causes Xen to parse the ACPI tables enough to enumerate all CPUs, but will not use other ACPI features. This is not common, and only has an effect if your system is blacklisted.

The acpi=noirq option causes Xen to not parse the ACPI MADT table looking for IO-APIC entries. This is also not common, and any system which requires this option to function should be blacklisted. Additionally, this will not prevent Xen from finding IO-APIC entries from the MP tables.

Further, any of the boolean false options can be used to disable ACPI usage entirely.

Because responsibility for ACPI processing is shared between Xen and the domain 0 kernel this option is automatically propagated to the domain 0 command line.

Finally, acpi=verbose will enable per-processor information logging which may otherwise be too noisy in particular on large systems.

1.2.2 acpi_apic_instance

= <integer>

Specify which ACPI MADT table to parse for APIC information, if more than one is present.

1.2.3 acpi_pstate_strict (x86)

= <boolean>

Default: false

Enforce checking that P-state transitions by the ACPI cpufreq driver actually result in the nominated frequency to be established. A warning message will be logged if that isn’t the case.

1.2.4 acpi_skip_timer_override (x86)

= <boolean>

Instruct Xen to ignore timer-interrupt override.

1.2.5 acpi_sleep (x86)

= s3_bios | s3_mode

s3_bios instructs Xen to invoke video BIOS initialization during S3 resume.

s3_mode instructs Xen to set up the boot time (option vga=) video mode during S3 resume.

1.2.6 allow_unsafe (x86)

= <boolean>

Default: false

Force boot on potentially unsafe systems. By default Xen will refuse to boot on systems with the following errata:

1.2.7 altp2m (Intel)

= <boolean>

Default: false

Permit multiple copies of host p2m.

1.2.8 apic (x86)

= bigsmp | default

Override Xen’s logic for choosing the APIC driver. By default, if there are more than 8 CPUs, Xen will switch to bigsmp over default.

1.2.9 apicv (Intel)

= <boolean>

Default: true

Permit Xen to use APIC Virtualisation Extensions. This is an optimisation available as part of VT-x, and allows hardware to take care of the guests APIC handling, rather than requiring emulation in Xen.

1.2.10 apic_verbosity (x86)

= verbose | debug

Increase the verbosity of the APIC code from the default value.

1.2.11 arat (x86)

= <boolean>

Default: true

Permit Xen to use “Always Running APIC Timer” support on compatible hardware in combination with cpuidle. This option is only expected to be useful for developers wishing Xen to fall back to older timing methods on newer hardware.

1.2.12 argo

= List of [ <bool>, mac-permissive=<bool> ]

Controls for the Argo hypervisor-mediated interdomain communication service.

The functionality that this option controls is only available when Xen has been compiled with the build setting for Argo enabled in the build configuration.

Argo is a interdomain communication mechanism, where Xen acts as the central point of authority. Guests may register memory rings to recieve messages, query the status of other domains, and send messages by hypercall, all subject to appropriate auditing by Xen. Argo is disabled by default.

1.2.13 asid (x86)

= <boolean>

Default: true

Permit Xen to use Address Space Identifiers. This is an optimisation which tags the TLB entries with an ID per vcpu. This allows for guest TLB flushes to be performed without the overhead of a complete TLB flush.

1.2.14 async-show-all (x86)

= <boolean>

Default: false

Forces all CPUs’ full state to be logged upon certain fatal asynchronous exceptions (watchdog NMIs and unexpected MCEs).

1.2.15 ats (x86)

= <boolean>

Default: false

Permits Xen to set up and use PCI Address Translation Services. This is a performance optimisation for PCI Passthrough.

WARNING: Xen cannot currently safely use ATS because of its synchronous wait loops for Queued Invalidation completions.

1.2.16 availmem

= <size>

Default: 0 (no limit)

Specify a maximum amount of available memory, to which Xen will clamp the e820 table.

1.2.17 badpage

= List of [ <integer> | <integer>-<integer> ]

Specify that certain pages, or certain ranges of pages contain bad bytes and should not be used. For example, if your memory tester says that byte 0x12345678 is bad, you would place badpage=0x12345 on Xen’s command line.

1.2.18 bootscrub

= idle | <boolean>

Default: idle

Scrub free RAM during boot. This is a safety feature to prevent accidentally leaking sensitive VM data into other VMs if Xen crashes and reboots.

In idle mode, RAM is scrubbed in background on all CPUs during idle-loop with a guarantee that memory allocations always provide scrubbed pages. This option reduces boot time on machines with a large amount of RAM while still providing security benefits.

1.2.19 bootscrub_chunk

= <size>

Default: 128M

Maximum RAM block size chunks to be scrubbed whilst holding the page heap lock and not running softirqs. Reduce this if softirqs are not being run frequently enough. Setting this to a high value may cause boot failure, particularly if the NMI watchdog is also enabled.

1.2.20 cet

= List of [ shstk=<bool>, ibt=<bool> ]

Applicability: x86

Controls for the use of Control-flow Enforcement Technology. CET is group a of hardware features designed to combat Return-oriented Programming (ROP, also call/jmp COP/JOP) attacks.

CET is incompatible with 32bit PV guests. If any CET sub-options are active, they will override the pv=32 boolean to false. Backwards compatibility can be maintained with the pv-shim mechanism.

1.2.21 clocksource (x86)

= pit | hpet | acpi | tsc

If set, override Xen’s default choice for the platform timer. Having TSC as platform timer requires being explicitly set. This is because TSC can only be safely used if CPU hotplug isn’t performed on the system. On some platforms, the “maxcpus” option may need to be used to further adjust the number of allowed CPUs. When running on platforms that can guarantee a monotonic TSC across sockets you may want to adjust the “tsc” command line parameter to “stable:socket”.

1.2.22 cmci-threshold (Intel)

= <integer>

Default: 2

Specify the event count threshold for raising Corrected Machine Check Interrupts. Specifying zero disables CMCI handling.

1.2.23 cmos-rtc-probe (x86)

= <boolean>

Default: false

Flag to indicate whether to probe for a CMOS Real Time Clock irrespective of ACPI indicating none to be there.

1.2.24 com1 (x86)

1.2.25 com2 (x86)

= <baud>[/<base-baud>][,[DPS][,[<io-base>|pci|amt][,[<irq>|msi][,[<port-bdf>][,[<bridge-bdf>]]]]]]

Both option com1 and com2 follow the same format.

A typical setup for most situations might be com1=115200,8n1

In addition to the above positional specification for UART parameters, name=value pair specfications are also supported. This is used to add flexibility for UART devices which require additional UART parameter configurations.

The comma separation still delineates positional parameters. Hence, unless the parameter is explicitly specified with name=value option, it will be considered a positional parameter.

The syntax consists of com1=(comma-separated positional parameters),(comma separated name-value pairs)

The accepted name keywords for name=value pairs are:

The following are examples of correct specifications:

com1=115200,8n1,0x3f8,4
com1=115200,8n1,0x3f8,4,reg-width=4,reg-shift=2
com1=baud=115200,parity=n,stop-bits=1,io-base=0x3f8,reg-width=4

1.2.26 conring_size

= <size>

Default: conring_size=16k

Specify the size of the console ring buffer.

1.2.27 console

= List of [ vga | com1[H,L] | com2[H,L] | pv | dbgp | ehci | xhci | none ]

Default: console=com1,vga

Specify which console(s) Xen should use.

vga indicates that Xen should try and use the vga graphics adapter.

com1 and com2 indicates that Xen should use serial ports 1 and 2 respectively. Optionally, these arguments may be followed by an H or L. H indicates that transmitted characters will have their MSB set, while received characters must have their MSB set. L indicates the converse; transmitted and received characters will have their MSB cleared. This allows a single port to be shared by two subsystems (e.g. console and debugger).

pv indicates that Xen should use Xen’s PV console. This option is only available when used together with pv-in-pvh.

dbgp or ehci indicates that Xen should use a USB2 debug port.

xhci indicates that Xen should use a USB3 debug port.

none indicates that Xen should not use a console. This option only makes sense on its own.

1.2.28 console_timestamps

= none | date | datems | boot | raw

Default: none

Can be modified at runtime

Specify which timestamp format Xen should use for each console line.

For compatibility with the older boolean parameter, specifying console_timestamps alone will enable the date option.

1.2.29 console_to_ring

= <boolean>

Default: false

Flag to indicate whether all guest console output should be copied into the console ring buffer.

1.2.30 conswitch

= <switch char>[x]

Default: conswitch=a

Can be modified at runtime

Specify which character should be used to switch serial input between Xen and dom0. The required sequence is CTRL-<switch char> three times.

The optional trailing x indicates that Xen should not automatically switch the console input to dom0 during boot. Any other value, including omission, causes Xen to automatically switch to the dom0 console during dom0 boot. Use conswitch=ax to keep the default switch character, but for xen to keep the console.

1.2.31 core_parking

= power | performance

Default: power

1.2.32 cpu_type (x86)

= arch_perfmon

If set, force use of the performance counters for oprofile, rather than detecting available support.

1.2.33 cpufreq

= none | {{ <boolean> | xen } { [:[powersave|performance|ondemand|userspace][,[<maxfreq>]][,[<minfreq>]]] } [,verbose]} | dom0-kernel | hwp[:[<hdc>][,verbose]]

Default: xen

Indicate where the responsibility for driving power states lies. Note that the choice of dom0-kernel is deprecated and not supported by all Dom0 kernels.

There is also support for ;-separated fallback options: cpufreq=hwp;xen,verbose. This first tries hwp and falls back to xen if unavailable. Note: The verbose suboption is handled globally. Setting it for either the primary or fallback option applies to both irrespective of where it is specified.

Note: grub2 requires to escape or quote ‘;’, so "cpufreq=hwp;xen" should be specified within double quotes inside grub.cfg. Refer to the grub2 documentation for more information.

1.2.34 cpuid (x86)

= List of comma separated booleans

This option allows for fine tuning of the facilities Xen will use, after accounting for hardware capabilities as enumerated via CPUID.

Unless otherwise noted, options only have any effect in their negative form, to hide the named feature(s). Ignoring a feature using this mechanism will cause Xen not to use the feature, nor offer them as usable to guests.

Currently accepted:

The Speculation Control hardware features srbds-ctrl, md-clear, ibrsb, stibp, ibpb, l1d-flush and ssbd are used by default if available and applicable. They can all be ignored.

rdrand and rdseed have multiple interactions.

1.2.35 cpuid_mask_cpu

= fam_0f_rev_[cdefg] | fam_10_rev_[bc] | fam_11_rev_b

Applicability: AMD

If none of the other cpuid_mask_* options are given, Xen has a set of pre-configured masks to make the current processor appear to be family/revision specified.

See below for general information on masking.

Warning: This option is not fully effective on Family 15h processors or later.

1.2.36 cpuid_mask_ecx

1.2.37 cpuid_mask_edx

1.2.38 cpuid_mask_ext_ecx

1.2.39 cpuid_mask_ext_edx

1.2.40 cpuid_mask_l7s0_eax

1.2.41 cpuid_mask_l7s0_ebx

1.2.42 cpuid_mask_thermal_ecx

1.2.43 cpuid_mask_xsave_eax

= <integer>

Applicability: x86. Default: ~0 (all bits set)

The availability of these options are model specific. Some processors don’t support any of them, and no processor supports all of them. Xen will ignore options on processors which are lacking support.

These options can be used to alter the features visible via the CPUID instruction. Settings applied here take effect globally, including for Xen and all guests.

Note: Since Xen 4.7, it is no longer necessary to mask a host to create migration safety in heterogeneous scenarios. All necessary CPUID settings should be provided in the VM configuration file. Furthermore, it is recommended not to use this option, as doing so causes an unnecessary reduction of features at Xen’s disposal to manage guests.

1.2.44 cpuidle (x86)

= <boolean>

1.2.45 cpuinfo (x86)

= <boolean>

1.2.46 crash-debug-debugkey

1.2.47 crash-debug-hwdom

1.2.48 crash-debug-kexeccmd

1.2.49 crash-debug-panic

1.2.50 crash-debug-watchdog

= <string>

Can be modified at runtime

Specify debug-key actions in cases of crashes. Each of the parameters applies to a different crash reason. The <string> is a sequence of debug key characters, with + having the special meaning of a 10 millisecond pause.

crash-debug-debugkey will be used for crashes induced by the C debug key (i.e. manually induced crash).

crash-debug-hwdom denotes a crash of dom0.

crash-debug-kexeccmd is an explicit request of dom0 to continue with the kdump kernel via kexec. Only available on hypervisors built with CONFIG_KEXEC.

crash-debug-panic is a crash of the hypervisor.

crash-debug-watchdog is a crash due to the watchdog timer expiring.

It should be noted that dumping diagnosis data to the console can fail in multiple ways (missing data, hanging system, …) depending on the reason of the crash, which might have left the hypervisor in a bad state. In case a debug-key action leads to another crash recursion will be avoided, so no additional debug-key actions will be performed in this case. A crash in the early boot phase will not result in any debug-key action, as the system might not yet be in a state where the handlers can work.

So e.g. crash-debug-watchdog=0+0r would dump dom0 state twice with 10 milliseconds between the two state dumps, followed by the run queues of the hypervisor, if the system crashes due to a watchdog timeout.

Depending on the reason of the system crash it might happen that triggering some debug key action will result in a hang instead of dumping data and then doing a reboot or crash dump.

1.2.51 crashinfo_maxaddr

= <size>

Default: 4G

Specify the maximum address to allocate certain structures, if used in combination with the low_crashinfo command line option.

1.2.52 crashkernel

= <ramsize-range>:<size>[,...][{@,<}<offset>] = <size>[{@,<}<offset>] = <size>,below=offset

Specify sizes and optionally placement of the crash kernel reservation area. The <ramsize-range>:<size> pairs indicate how much memory to set aside for a crash kernel (<size>) for a given range of installed RAM (<ramsize-range>). Each <ramsize-range> is of the form <start>-[<end>].

A trailing @<offset> specifies the exact address this area should be placed at, whereas < in place of @ just specifies an upper bound of the address range the area should fall into.

< and below are synonyomous, the latter being useful for grub2 systems which would otherwise require escaping of the < option

1.2.53 credit2_balance_over

= <integer>

1.2.54 credit2_balance_under

= <integer>

1.2.55 credit2_cap_period_ms

= <integer>

Default: 10

Domains subject to a cap receive a replenishment of their runtime budget once every cap period interval. Default is 10 ms. The amount of budget they receive depends on their cap. For instance, a domain with a 50% cap will receive 50% of 10 ms, so 5 ms.

1.2.56 credit2_load_precision_shift

= <integer>

Default: 18

Specify the number of bits to use for the fractional part of the values involved in Credit2 load tracking and load balancing math.

1.2.57 credit2_load_window_shift

= <integer>

Default: 30

Specify the number of bits to use to represent the length of the window (in nanoseconds) we use for load tracking inside Credit2. This means that, with the default value (30), we use 2^30 nsec ~= 1 sec long window.

Load tracking is done by means of a variation of exponentially weighted moving average (EWMA). The window length defined here is what tells for how long we give value to previous history of the load itself. In fact, after a full window has passed, what happens is that we discard all previous history entirely.

A short window will make the load balancer quick at reacting to load changes, but also short-sighted about previous history (and hence, e.g., long term load trends). A long window will make the load balancer thoughtful of previous history (and hence capable of capturing, e.g., long term load trends), but also slow in responding to load changes.

The default value of 1 sec is rather long.

1.2.58 credit2_runqueue

= cpu | core | socket | node | all

Default: socket

Specify how host CPUs are arranged in runqueues. Runqueues are kept balanced with respect to the load generated by the vCPUs running on them. Smaller runqueues (as in with core) means more accurate load balancing (for instance, it will deal better with hyperthreading), but also more overhead.

Available alternatives, with their meaning, are: * cpu: one runqueue per each logical pCPUs of the host; * core: one runqueue per each physical core of the host; * socket: one runqueue per each physical socket (which often, but not always, matches a NUMA node) of the host; * node: one runqueue per each NUMA node of the host; * all: just one runqueue shared by all the logical pCPUs of the host

Regardless of the above choice, Xen attempts to respect sched_credit2_max_cpus_runqueue limit, which may mean more than one runqueue for the all value. If that isn’t intended, raise the sched_credit2_max_cpus_runqueue value.

1.2.59 dbgp

= ehci[ <integer> | @pci<bus>:<slot>.<func> ] = xhci[ <integer> | @pci<bus>:<slot>.<func> ][,share=<bool>|hwdom]

Specify the USB controller to use, either by instance number (when going over the PCI busses sequentially) or by PCI device (must be on segment 0).

Use ehci for EHCI debug port, use xhci for XHCI debug capability. XHCI driver will wait indefinitely for the debug host to connect - make sure the cable is connected. The share option for xhci controls who else can use the controller: * no: use the controller exclusively for console, even hardware domain (dom0) cannot use it * hwdom: hardware domain may use the controller too, ports not used for debug console will be available for normal devices; this is the default * yes: the controller can be assigned to any domain; it is not safe to assign the controller to untrusted domain

Choosing share=hwdom (the default) or share=yes allows a domain to reset the controller, which may cause small portion of the console output to be lost.

The share=yes configuration is not security supported.

1.2.60 debug_stack_lines

= <integer>

Default: 20

Limits the number lines printed in Xen stack traces.

1.2.61 debugtrace

= [cpu:]<size>

Default: 128

Specify the size of the console debug trace buffer. By specifying cpu: additionally a trace buffer of the specified size is allocated per cpu. The debug trace feature is only enabled in debugging builds of Xen.

1.2.62 dit (x86/Intel)

= <boolean>

Default: CONFIG_DIT_DEFAULT

Specify whether Xen and guests should operate in Data Independent Timing mode (Intel calls this DOITM, Data Operand Independent Timing Mode). Note that enabling this option cannot guarantee anything beyond what underlying hardware guarantees (with, where available and known to Xen, respective tweaks applied).

1.2.63 dma_bits

= <integer>

Specify the bit width of the DMA heap.

1.2.64 dom0

= List of [ pv | pvh, shadow=<bool>, verbose=<bool>,
            cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86)

= List of [ sve=<integer> ] (Arm64)

Controls for how dom0 is constructed on x86 systems.

Enables features on dom0 on Arm systems.

1.2.65 dom0-cpuid

= List of comma separated booleans

Applicability: x86

This option allows for fine tuning of the facilities dom0 will use, after accounting for hardware capabilities and Xen settings as enumerated via CPUID.

Options are accepted in positive and negative form, to enable or disable specific features. All selections via this mechanism are subject to normal CPU Policy safety and dependency logic.

This option is intended for developers to opt dom0 into non-default features, and is not intended for use in production circumstances. If using this option is necessary to fix an issue, please report a bug.

1.2.66 dom0-iommu

= List of [ passthrough=<bool>, strict=<bool>, map-inclusive=<bool>,
            map-reserved=<bool>, none ]

Controls for the dom0 IOMMU setup.

1.2.67 dom0_ioports_disable (x86)

= List of <hex>-<hex>

Specify a list of IO ports to be excluded from dom0 access.

1.2.68 dom0_max_vcpus

Either:

= <integer>.

The number of VCPUs to give to dom0. This number of VCPUs can be more than the number of PCPUs on the host. The default is the number of PCPUs.

Or:

= <min>-<max> where <min> and <max> are integers.

Gives dom0 a number of VCPUs equal to the number of PCPUs, but always at least <min> and no more than <max>. Using <min> may give more VCPUs than PCPUs. <min> or <max> may be omitted and the defaults of 1 and unlimited respectively are used instead.

For example, with dom0_max_vcpus=4-8:

   Number of
PCPUs | Dom0 VCPUs
 2    |  4
 4    |  4
 6    |  6
 8    |  8
10    |  8

1.2.69 dom0_mem (ARM)

= <size>

Set the amount of memory for the initial domain (dom0). It must be greater than zero. This parameter is required.

1.2.70 dom0_mem (x86)

= List of ( min:<sz> | max:<sz> | <sz> )

Set the amount of memory for the initial domain (dom0). If a size is positive, it represents an absolute value. If a size is negative, it is subtracted from the total available memory.

If <sz> is not specified, the default is all the available memory minus some reserve. The reserve is 1/16 of the available memory or 128 MB (whichever is smaller).

The amount of memory will be at least the minimum but never more than the maximum (i.e., max overrides the min option). If there isn’t enough memory then as much as possible is allocated.

max:<sz> also sets the maximum reservation (the maximum amount of memory dom0 can balloon up to). If this is omitted then the maximum reservation is unlimited.

For example, to set dom0’s initial memory allocation to 512MB but allow it to balloon up as far as 1GB use dom0_mem=512M,max:1G

<sz> is: <size> | [<size>+]<frac>% <frac> is an integer < 100

So <sz> being 1G+25% on a 256 GB host would result in 65 GB.

If you use this option then it is highly recommended that you disable any dom0 autoballooning feature present in your toolstack. See the xl.conf(5) man page or Xen Best Practices.

This option doesn’t have effect if pv-shim mode is enabled.

1.2.71 dom0_nodes (x86)

= List of [ <integer> | relaxed | strict ]

Default: strict

Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created and memory assigned to Dom0 will be adjusted to match the node restrictions set up here. Note that the values to be specified here are ACPI PXM ones, not Xen internal node numbers. relaxed sets up vCPU affinities to prefer but be not limited to the specified node(s).

1.2.72 dom0_vcpus_pin

= <boolean>

Default: false

Pin dom0 vcpus to their respective pcpus

1.2.73 dtuart (ARM)

= path [:options]

Default: ""

Specify the full path in the device tree for the UART. If the path doesn’t start with /, it is assumed to be an alias. The options are device specific.

1.2.74 e820-mtrr-clip (x86)

= <boolean>

Flag that specifies if RAM should be clipped to the highest cacheable MTRR.

Default: true on Intel CPUs, otherwise false

1.2.75 e820-verbose (x86)

= <boolean>

Default: false

Flag that enables verbose output when processing e820 information and applying clipping.

1.2.76 edd (x86)

= off | on | skipmbr

Control retrieval of Extended Disc Data (EDD) from the BIOS during boot.

1.2.77 edid (x86)

= no | force

Either force retrieval of monitor EDID information via VESA DDC, or disable it (edid=no). This option should not normally be required except for debugging purposes.

1.2.78 efi

= List of [ rs=<bool>, attr=no|uc ]

Controls for interacting with the system Extended Firmware Interface.

1.2.79 ept

= List of [ ad=<bool>, pml=<bool>, exec-sp=<bool> ]

Applicability: Intel

Extended Page Tables are a feature of Intel’s VT-x technology, whereby hardware manages the virtualisation of HVM guest pagetables. EPT was introduced with the Nehalem architecture.

1.2.80 extra_guest_irqs

= [<domU number>][,<dom0 number>]

Default: 32,<variable>

Change the number of PIRQs available for guests. The optional first number is common for all domUs, while the optional second number (preceded by a comma) is for dom0. Changing the setting for domU has no impact on dom0 and vice versa. For example to change dom0 without changing domU, use extra_guest_irqs=,512. The default value for Dom0 and an eventual separate hardware domain is architecture dependent. Note that specifying zero as domU value means zero, while for dom0 it means to use the default.

1.2.81 ext_regions (Arm)

= <boolean>

Default : true

Flag to enable or disable support for extended regions for Dom0 and Dom0less DomUs.

Extended regions are ranges of unused address space exposed to the guest as “safe to use” for special memory mappings. Disable if your board device tree is incomplete.

1.2.82 flask

= permissive | enforcing | late | disabled

Default: enforcing

Specify how the FLASK security server should be configured. This option is only available if the hypervisor was compiled with FLASK support. This can be enabled by running either: - make -C xen config and enabling XSM and FLASK. - make -C xen menuconfig and enabling ‘FLux Advanced Security Kernel support’ and ‘Xen Security Modules support’

1.2.83 font

= <height> where height is 8x8 | 8x14 | 8x16

Specify the font size when using the VESA console driver.

1.2.84 force-ept (Intel)

= <boolean>

Default: false

Allow EPT to be enabled when VMX feature VM_ENTRY_LOAD_GUEST_PAT is not present.

Warning: Due to CVE-2013-2212, VMX feature VM_ENTRY_LOAD_GUEST_PAT is by default required as a prerequisite for using EPT. If you are not using PCI Passthrough, or trust the guest administrator who would be using passthrough, then the requirement can be relaxed. This option is particularly useful for nested virtualization, to allow the L1 hypervisor to use EPT even if the L0 hypervisor does not provide VM_ENTRY_LOAD_GUEST_PAT.

1.2.85 gnttab

= List of [ max-ver:<integer>, transitive=<bool>, transfer=<bool> ]

Default (Arm): gnttab=max-ver:1 Default (x86,PV): gnttab=max-ver:2,transitive,transfer Default (x86,HVM): gnttab=max-ver:2,transitive

Control various aspects of the grant table behaviour available to guests.

The usage of gnttab v2 is not security supported on ARM platforms.

1.2.86 gnttab_max_frames

= <integer>

Default: 64

Can be modified at runtime

Specify the default upper bound on the number of frames which any domain may use as part of its grant table unless a different value is specified at domain creation.

Note this value is the effective upper bound for dom0.

1.2.87 gnttab_max_maptrack_frames

= <integer>

Default: 1024

Can be modified at runtime

Specify the default upper bound on the number of frames which any domain may use as part of its maptrack array unless a different value is specified at domain creation.

Note this value is the effective upper bound for dom0.

1.2.88 global-pages

= <boolean>

Applicability: x86
Default: true unless running virtualized on AMD or Hygon hardware

Control whether to use global pages for PV guests, and thus the need to perform TLB flushes by writing to CR4. This is a performance trade-off.

AMD SVM does not support selective trapping of CR4 writes, which means that a global TLB flush (two CR4 writes) takes two VMExits, and massively outweigh the benefit of using global pages to begin with. This case is easy for Xen to spot, and is accounted for in the default setting.

Other cases where this option might be a benefit is on VT-x hardware when selective CR4 writes are not supported/enabled by the hypervisor, or in any virtualised case using shadow paging. These are not easy for Xen to spot, so are not accounted for in the default setting.

1.2.89 guest_loglvl

= <level>[/<rate-limited level>] where level is none | error | warning | info | debug | all

Default: guest_loglvl=none/warning

Can be modified at runtime

Set the logging level for Xen guests. Any log message with equal more more importance will be printed.

The optional <rate-limited level> option instructs which severities should be rate limited.

1.2.90 hap (x86)

= <boolean>

Default: true

Flag to globally enable or disable support for Hardware Assisted Paging (HAP)

1.2.91 hap_1gb (x86)

= <boolean>

Default: true

Flag to enable 1 GB host page table support for Hardware Assisted Paging (HAP).

1.2.92 hap_2mb (x86)

= <boolean>

Default: true

Flag to enable 2 MB host page table support for Hardware Assisted Paging (HAP).

1.2.93 hardware_dom

= <domid>

Default: 0

Enable late hardware domain creation using the specified domain ID. This is intended to be used when domain 0 is a stub domain which builds a disaggregated system including a hardware domain with the specified domain ID. This option is supported only when compiled with XSM on x86.

1.2.94 hest_disable

= <boolean>

Default: false

Control Xens use of the APEI Hardware Error Source Table, should one be found.

1.2.95 highmem-start (x86)

= <size>

Specify the memory boundary past which memory will be treated as highmem (x86 debug hypervisor only).

1.2.96 hmp-unsafe (arm)

= <boolean>

Default : false

Say yes at your own risk if you want to enable heterogenous computing (such as big.LITTLE). This may result to an unstable and insecure platform, unless you manually specify the cpu affinity of all domains so that all vcpus are scheduled on the same class of pcpus (big or LITTLE but not both). vcpu migration between big cores and LITTLE cores is not supported. See docs/misc/arm/big.LITTLE.txt for more information.

When the hmp-unsafe option is disabled (default), CPUs that are not identical to the boot CPU will be parked and not used by Xen.

1.2.97 hpet

= List of [ <bool> | broadcast=<bool> | legacy-replacement=<bool> ]

Applicability: x86

Controls Xen’s use of the system’s High Precision Event Timer. By default, Xen will use an HPET when available and not subject to errata. Use of the HPET can be disabled by specifying hpet=0.

1.2.98 hpetbroadcast (x86)

= <boolean>

Deprecated alternative of hpet=broadcast.

1.2.99 hvm_debug (x86)

= <integer>

The specified value is a bit mask with the individual bits having the following meaning:

Bit  0 - debug level 0 (unused at present)
Bit  1 - debug level 1 (Control Register logging)
Bit  2 - debug level 2 (VMX logging of MSR restores when context switching)
Bit  3 - debug level 3 (unused at present)
Bit  4 - I/O operation logging
Bit  5 - vMMU logging
Bit  6 - vLAPIC general logging
Bit  7 - vLAPIC timer logging
Bit  8 - vLAPIC interrupt logging
Bit  9 - vIOAPIC logging
Bit 10 - hypercall logging
Bit 11 - MSR operation logging

Recognized in debug builds of the hypervisor only.

1.2.100 hvm_fep (x86)

= <boolean>

Default: false

Allow use of the Forced Emulation Prefix in HVM guests, to allow emulation of arbitrary instructions.

This option is intended for development and testing purposes.

Warning As this feature opens up the instruction emulator to arbitrary instruction from an HVM guest, don’t use this in production system. No security support is provided when this flag is set.

1.2.101 hvm_port80 (x86)

= <boolean>

Default: true

Specify whether guests are to be given access to physical port 80 (often used for debugging purposes), to override the DMI based detection of systems known to misbehave upon accesses to that port.

1.2.102 idle_latency_factor (x86)

= <integer>

1.2.103 ioapic_ack (x86)

= old | new

Default: new unless directed-EOI is supported

1.2.104 iommu

= List of [ <bool>, verbose, debug, force, required,
            quarantine=<bool>|scratch-page,
            sharept, superpages, intremap, intpost, crash-disable,
            snoop, qinval, igfx, amd-iommu-perdev-intremap,
            dom0-{passthrough,strict} ]

All sub-options are boolean in nature.

I/O Memory Memory Units perform a function similar to the CPU MMU (hence the name), but typically exist as a discrete device, integrated as part of a PCI Root Complex. The most common configuration is to have one IOMMU per package (for on-die PCIe devices and directly attached PCIe lanes), and one IOMMU covering the remaining I/O in the system.

The functionality in an IOMMU commonly falls into two orthogonal categories:

  1. DMA remapping which uses a pagetable-like hierarchical structure and maps I/O Virtual Addresses (DFNs - Device Frame Numbers in Xen’s terminology) to System Physical Addresses (MFNs - Machine Frame Numbers in Xen’s terminology).

  2. Interrupt Remapping, which controls incoming Message Signalled Interrupt requests, including their routing to specific CPUs.

IOMMU functionality can be used to provide a translation which the hardware device driver isn’t aware of (e.g. PCI Passthrough and a native driver inside the guest) and/or to enforce fine-grained control over the memory and interrupts which a device is attempting to access.

By default, IOMMUs are configured for use if they are available. An overall boolean (e.g. iommu=no) can override this and leave the IOMMUs disabled.

The following options are specific to Intel VT-d hardware:

The following options are specific to AMD-Vi hardware:

WARNING: The dom0-passthrough and dom0-strict booleans are both deprecated, and superseded by dom0-iommu={passthrough,strict} respectively - using both the old and new command line options in combination is undefined.

1.2.105 iommu_dev_iotlb_timeout

= <integer>

Default: 1000

Specify the timeout of the device IOTLB invalidation in milliseconds. By default, the timeout is 1000 ms. When you see error ‘Queue invalidate wait descriptor timed out’, try increasing this value.

1.2.106 iommu_inclusive_mapping

= <boolean>

WARNING: This command line option is deprecated, and superseded by dom0-iommu=map-inclusive - using both options in combination is undefined.

1.2.107 irq-max-guests (x86)

= <integer>

Default: 32

Maximum number of guests any individual IRQ could be shared between, i.e. a limit on the number of guests it is possible to start each having assigned a device sharing a common interrupt line. Accepts values between 1 and 255.

1.2.108 irq_ratelimit (x86)

= <integer>

1.2.109 irq_vector_map (x86)

1.2.110 ivmd (x86)

= <start>[-<end>][=<bdf1>[-<bdf1'>][,<bdf2>[-<bdf2'>][,...]]][;<start>...]

Define IVMD-like ranges that are missing from ACPI tables along with the device(s) they belong to, and use them for 1:1 mapping. End addresses can be omitted when exactly one page is meant. The ranges are inclusive when start and end are specified. Note that only PCI segment 0 is supported at this time, but it is fine to specify it explicitly.

‘start’ and ‘end’ values are page numbers (not full physical addresses), in hexadecimal format (can optionally be preceded by “0x”).

Omitting the optional (range of) BDF spcifiers signals that the range is to be applied to all devices.

Usage example: If device 0:0:1d.0 requires one page (0xd5d45) to be reserved, and devices 0:0:1a.0…0:0:1a.3 collectively require three pages (0xd5d46 thru 0xd5d48) to be reserved, one usage would be:

ivmd=d5d45=0:1d.0;0xd5d46-0xd5d48=0:1a.0-0:1a.3

Note: grub2 requires to escape or quote special characters, like ‘;’ when multiple ranges are specified - refer to the grub2 documentation.

1.2.111 ivrs_hpet[<hpet>] (AMD)

=[<seg>:]<bus>:<device>.<func>

Force the use of [<seg>:]<bus>:<device>.<func> as device ID of HPET <hpet> instead of the one specified by the IVHD sub-tables of the IVRS ACPI table.

1.2.112 ivrs_ioapic[<ioapic>] (AMD)

=[<seg>:]<bus>:<device>.<func>

Force the use of [<seg>:]<bus>:<device>.<func> as device ID of IO-APIC <ioapic> instead of the one specified by the IVHD sub-tables of the IVRS ACPI table.

1.2.113 lapic (x86)

= <boolean>

Force the use of use of the local APIC on a uniprocessor system, even if left disabled by the BIOS.

1.2.114 lapic_timer_c2_ok (x86)

= <boolean>

1.2.115 ler (x86)

= <boolean>

Default: false

This option is intended for debugging purposes only. Enable MSR_DEBUGCTL.LBR in hypervisor context to be able to dump the Last Interrupt/Exception To/From record with other registers.

1.2.116 lock-depth-size

= <integer>

Default: lock-depth-size=64

Specifies the maximum number of nested locks tested for illegal recursions. Higher nesting levels still work, but recursion testing is omitted for those levels. In case an illegal recursion is detected the system will crash immediately. Specifying 0 will disable all testing of illegal lock nesting.

This option is available for hypervisors built with CONFIG_DEBUG_LOCKS only.

1.2.117 loglvl

= <level>[/<rate-limited level>] where level is none | error | warning | info | debug | all

Default: loglvl=info

Can be modified at runtime

Set the logging level for Xen. Any log message with equal more more importance will be printed.

The optional <rate-limited level> option instructs which severities should be rate limited.

1.2.118 low_crashinfo

= none | min | all

Default: none if not specified at all, or to min if low_crashinfo is present without qualification.

This option is only useful for hosts with a 32bit dom0 kernel, wishing to use kexec functionality in the case of a crash. It represents which data structures should be deliberately allocated in low memory, so the crash kernel may find find them. Should be used in combination with crashinfo_maxaddr.

1.2.119 low_mem_virq_limit

= <size>

Default: 64M

Specify the threshold below which Xen will inform dom0 that the quantity of free memory is getting low. Specifying 0 will disable this notification.

1.2.120 maxcpus

= <integer>

Specify the maximum number of CPUs that should be brought up.

This option is ignored in pv-shim mode.

WARNING: On Arm big.LITTLE systems, when hmp-unsafe option is enabled, this command line option does not guarantee on which CPU types will be used.

1.2.121 max_cstate (x86)

= <integer>[,<integer>]

Specify the deepest C-state CPUs are permitted to be placed in, and optionally the maximum sub C-state to be used used. The latter only applies to the highest permitted C-state.

1.2.122 max_gsi_irqs (x86)

= <integer>

Specifies the number of interrupts to be use for pin (IO-APIC or legacy PIC) based interrupts. Any higher IRQs will be available for use via PCI MSI.

1.2.123 max_lpi_bits (arm)

= <integer>

Specifies the number of ARM GICv3 LPI interrupts to allocate on the host, presented as the number of bits needed to encode it. This must be at least 14 and not exceed 32, and each LPI requires one byte (configuration) and one pending bit to be allocated. Defaults to 20 bits (to cover at most 1048576 interrupts).

1.2.124 mce (x86)

= <boolean>

Default: true

Allows to disable the use of Machine Check Exceptions. Note that doing so may result in silent shutdown of the system in case an event occurs which would have resulted in raising a Machine Check Exception. Silent here is as far as Xen is concerned; firmware may offer to retrieve some collected data.

1.2.125 mce_fb (Intel)

= <boolean>

Default: false

Force broadcasting of Machine Check Exceptions, suppressing the use of Local MCE functionality available in newer Intel hardware.

1.2.126 mce_verbosity (x86)

= verbose

Specify verbose machine check output.

1.2.127 mem (x86)

= <size>

Specify the maximum address of physical RAM. Any RAM beyond this limit is ignored by Xen.

1.2.128 memop-max-order

= [<domU>][,[<ctldom>][,[<hwdom>][,<ptdom>]]]

x86 default: 9,18,12,12 ARM default: 9,18,10,10

Change the maximum order permitted for allocation (or allocation-like) requests issued by the various kinds of domains (in this order: ordinary DomU, control domain, hardware domain, and - when supported by the platform - DomU with pass-through device assigned).

1.2.129 mmcfg (x86)

= <boolean>[,amd-fam10]

Default: 1

Specify if the MMConfig space should be enabled.

1.2.130 mmio-relax (x86)

= <boolean> | all

Default: false

By default, domains may not create cached mappings to MMIO regions. This option relaxes the check for Domain 0 (or when using all, all PV domains), to permit the use of cacheable MMIO mappings.

1.2.131 msi (x86)

= <boolean>

Default: true

Force Xen to (not) use PCI-MSI, even if ACPI FADT says otherwise.

1.2.132 mtrr.show (x86)

= <boolean>

Default: false

Print boot time MTRR state.

1.2.133 mwait-idle (x86)

= <boolean>

Default: true

Use the MWAIT idle driver (with model specific C-state knowledge) instead of the ACPI based one.

1.2.134 nmi (x86)

= ignore | dom0 | fatal

Default: fatal for a debug build, or dom0 for a non-debug build

Specify what Xen should do in the event of an NMI parity or I/O error. ignore discards the error; dom0 causes Xen to report the error to dom0, while ‘fatal’ causes Xen to print diagnostics and then hang.

1.2.135 noapic (x86)

Instruct Xen to ignore any IOAPICs that are present in the system, and instead continue to use the legacy PIC. This is not recommended with pvops type kernels.

Because responsibility for APIC setup is shared between Xen and the domain 0 kernel this option is automatically propagated to the domain 0 command line.

1.2.136 invpcid (x86)

= <boolean>

Default: true

By default, Xen will use the INVPCID instruction for TLB management if it is available. This option can be used to cause Xen to fall back to older mechanisms, which are generally slower.

1.2.137 load-balance-ratelimit

= <integer>

The minimum interval between load balancing events on a given pcpu, in microseconds. A value of ‘0’ will disable rate limiting. Maximum value 1 second. At the moment only credit honors this parameter. Default 1ms.

1.2.138 noirqbalance (x86)

= <boolean>

Disable software IRQ balancing and affinity. This can be used on systems such as Dell 1850/2850 that have workarounds in hardware for IRQ routing issues.

1.2.139 nolapic (x86)

= <boolean>

Default: false

Ignore the local APIC on a uniprocessor system, even if enabled by the BIOS.

1.2.140 no-real-mode (x86)

= <boolean>

Do not execute real-mode bootstrap code when booting Xen. This option should not be used except for debugging. It will effectively disable the vga option, which relies on real mode to set the video mode.

1.2.141 noreboot

= <boolean>

Do not automatically reboot after an error. This is useful for catching debug output. Defaults to automatically reboot after 5 seconds.

1.2.142 nosmp (x86)

= <boolean>

Disable SMP support. No secondary processors will be booted. Defaults to booting secondary processors.

This option is ignored in pv-shim mode.

1.2.143 nr_irqs (x86)

= <integer>

1.2.144 numa (x86)

= on | off | fake=<integer> | noacpi

Default: on

1.2.145 pci

= List of [ serr=<bool>, perr=<bool> ]

Default: Signaling left as set by firmware.

Override the firmware settings, and explicitly enable or disable the signalling of PCI System and Parity errors.

1.2.146 pci-phantom

=[<seg>:]<bus>:<device>,<stride>

Mark a group of PCI devices as using phantom functions without actually advertising so, so the IOMMU can create translation contexts for them.

All numbers specified must be hexadecimal ones.

This option can be specified more than once (up to 8 times at present).

1.2.147 pci-passthrough (arm)

= <boolean>

Default: false

Flag to enable or disable support for PCI passthrough

1.2.148 pcid (x86)

= <boolean> | xpti=<bool>

Default: xpti

Can be modified at runtime (change takes effect only for domains created afterwards)

If available, control usage of the PCID feature of the processor for 64-bit pv-domains. PCID can be used either for no domain at all (false), for all of them (true), only for those subject to XPTI (xpti) or for those not subject to XPTI (no-xpti). The feature is used only in case INVPCID is supported and not disabled via invpcid=false.

1.2.149 ple_gap

= <integer>

1.2.150 ple_window (Intel)

= <integer>

1.2.151 preferred-cstates (x86)

= ( <integer> | List of ( C1 | C1E | C2 | ... )

This is a mask of C-states which are to be used preferably. This option is applicable only on hardware were certain C-states are exclusive of one another.

1.2.152 psr (Intel)

= List of ( cmt:<boolean> | rmid_max:<integer> | cat:<boolean> | cos_max:<integer> | cdp:<boolean> )

Default: psr=cmt:0,rmid_max:255,cat:0,cos_max:255,cdp:0

Platform Shared Resource(PSR) Services. Intel Haswell and later server platforms offer information about the sharing of resources.

To use the PSR monitoring service for a certain domain, a Resource Monitoring ID(RMID) is used to bind the domain to corresponding shared resource. RMID is a hardware-provided layer of abstraction between software and logical processors.

To use the PSR cache allocation service for a certain domain, a capacity bitmasks(CBM) is used to bind the domain to corresponding shared resource. CBM represents cache capacity and indicates the degree of overlap and isolation between domains. In hypervisor a Class of Service(COS) ID is allocated for each unique CBM.

The following resources are available:

1.2.153 pv

= List of [ 32=<bool> ]

Applicability: x86

Controls for aspects of PV guest support.

1.2.154 pv-linear-pt (x86)

= <boolean>

Default: true

Only available if Xen is compiled with CONFIG_PV_LINEAR_PT support enabled.

Allow PV guests to have pagetable entries pointing to other pagetables of the same level (i.e., allowing L2 PTEs to point to other L2 pages). This technique is often called “linear pagetables”, and is sometimes used to allow operating systems a simple way to consistently map the current process’s pagetables into its own virtual address space.

Linux and MiniOS don’t use this technique. NetBSD and Novell Netware do; there may be other custom operating systems which do. If you’re certain you don’t plan on having PV guests which use this feature, turning it off can reduce the attack surface.

1.2.155 pv-l1tf (x86)

= List of [ <bool>, dom0=<bool>, domu=<bool> ]

Default: false on believed-unaffected hardware, or in pv-shim mode. domu on believed-affected hardware.

Mitigations for L1TF / XSA-273 / CVE-2018-3620 for PV guests.

For backwards compatibility, we may not alter an architecturally-legitimate pagetable entry a PV guest chooses to write. We can however force such a guest into shadow mode so that Xen controls the PTEs which are reachable by the CPU pagewalk.

Shadowing is performed at the point where a PV guest first tries to write an L1TF-vulnerable PTE. Therefore, a PV guest kernel which has been updated with its own L1TF mitigations will not trigger shadow mode if it is well behaved.

If CONFIG_SHADOW_PAGING is not compiled in, this mitigation instead crashes the guest when an L1TF-vulnerable PTE is written, which still allows updated, well-behaved PV guests to run, despite Shadow being compiled out.

In the pv-shim case, Shadow is expected to be compiled out, and a malicious guest kernel can only leak data from the shim Xen, rather than the host Xen.

1.2.156 pv-shim (x86)

= <boolean>

Default: false

This option is intended for use by a toolstack, when choosing to run a PV guest compatibly inside an HVM container.

In this mode, the kernel and initrd passed as modules to the hypervisor are constructed into a plain unprivileged PV domain.

1.2.157 rcu-idle-timer-period-ms

= <integer>

Default: 10

How frequently a CPU which has gone idle, but with pending RCU callbacks, should be woken up to check if the grace period has completed, and the callbacks are safe to be executed. Expressed in milliseconds; maximum is 100, and it can’t be 0.

1.2.158 reboot (x86)

= t[riple] | k[bd] | a[cpi] | p[ci] | P[ower] | e[fi] | n[o] [, [w]arm | [c]old]

Default: 0

Specify the host reboot method.

warm instructs Xen to not set the cold reboot flag.

cold instructs Xen to set the cold reboot flag.

no instructs Xen to not automatically reboot after panics or crashes.

triple instructs Xen to reboot the host by causing a triple fault.

kbd instructs Xen to reboot the host via the keyboard controller.

acpi instructs Xen to reboot the host using RESET_REG in the ACPI FADT.

pci instructs Xen to reboot the host using PCI reset register (port CF9).

Power instructs Xen to power-cycle the host using PCI reset register (port CF9).

‘efi’ instructs Xen to reboot using the EFI reboot call (in EFI mode by default it will use that method first).

xen instructs Xen to reboot using Xen’s SCHEDOP hypercall (this is the default when running nested Xen)

1.2.159 rmrr

= start<-end>=[s1]bdf1[,[s1]bdf2[,...]];start<-end>=[s2]bdf1[,[s2]bdf2[,...]]

Define RMRR units that are missing from ACPI table along with device they belong to and use them for 1:1 mapping. End addresses can be omitted and one page will be mapped. The ranges are inclusive when start and end are specified. If segment of the first device is not specified, segment zero will be used. If other segments are not specified, first device segment will be used. If a segment is specified for other than the first device and it does not match the one specified for the first one, an error will be reported.

‘start’ and ‘end’ values are page numbers (not full physical addresses), in hexadecimal format (can optionally be preceded by “0x”).

Usage example: If device 0:0:1d.0 requires one page (0xd5d45) to be reserved, and device 0:0:1a.0 requires three pages (0xd5d46 thru 0xd5d48) to be reserved, one usage would be:

rmrr=d5d45=0:0:1d.0;0xd5d46-0xd5d48=0:0:1a.0

Note: grub2 requires to escape or use quotations if special characters are used, namely ‘;’, refer to the grub2 documentation if multiple ranges are specified.

1.2.160 ro-hpet (x86)

= <boolean>

Default: true

Map the HPET page as read only in Dom0. If disabled the page will be mapped with read and write permissions.

1.2.161 sched

= credit | credit2 | arinc653 | rtds | null

Default: sched=credit2

Choose the default scheduler. Note the default scheduler is selectable via Kconfig and depends on enabled schedulers. Check CONFIG_SCHED_DEFAULT to see which scheduler is the default.

1.2.162 sched_credit2_max_cpus_runqueue

= <integer>

Default: 16

Defines how many CPUs will be put, at most, in each Credit2 runqueue.

Runqueues are still arranged according to the host topology (and following what indicated by the ‘credit2_runqueue’ parameter). But we also have a cap to the number of CPUs that share each runqueues.

A value that is a submultiple of the number of online CPUs is recommended, as that would likely produce a perfectly balanced runqueue configuration.

1.2.163 sched_credit2_migrate_resist

= <integer>

1.2.164 sched_credit_tslice_ms

= <integer>

Set the timeslice of the credit1 scheduler, in milliseconds. The default is 30ms. Reasonable values may include 10, 5, or even 1 for very latency-sensitive workloads.

1.2.165 sched-gran (x86)

= cpu | core | socket

Default: sched-gran=cpu

Set the scheduling granularity. In case the granularity is larger than 1 (e.g. coreon a SMT-enabled system, or socket) multiple vcpus are assigned statically to a “scheduling unit” which will then be subject to scheduling. This assignment of vcpus to scheduling units is fixed.

cpu: Vcpus will be scheduled individually on single cpus (e.g. a hyperthread using x86/Intel terminology)

core: As many vcpus as there are cpus on a physical core are scheduled together on a physical core.

socket: As many vcpus as there are cpus on a physical sockets are scheduled together on a physical socket.

Note: a value other than cpu will result in rejecting a runtime modification attempt of the “smt” setting.

Note: for AMD x86 processors before Fam17 the terminology in the official data sheets is different: a cpu is named “core” and multiple “cores” are running in the same “compute unit”. As from Fam17 on AMD is using the same names as Intel (“thread” and “core”) the topology levels are named “cpu”, “core” and “socket” even on older AMD processors.

1.2.166 sched_ratelimit_us

= <integer>

In order to limit the rate of context switching, set the minimum amount of time that a vcpu can be scheduled for before preempting it, in microseconds. The default is 1000us (1ms). Setting this to 0 disables it altogether.

1.2.167 sched_smt_power_savings

= <boolean>

Normally Xen will try to maximize performance and cache utilization by spreading out vcpus across as many different divisions as possible (i.e, numa nodes, sockets, cores threads, &c). This often maximizes throughput, but also maximizes energy usage, since it reduces the depth to which a processor can sleep.

This option inverts the logic, so that the scheduler in effect tries to keep the vcpus on the smallest amount of silicon possible; i.e., first fill up sibling threads, then sibling cores, then sibling sockets, &c. This will reduce performance somewhat, particularly on systems with hyperthreading enabled, but should reduce power by enabling more sockets and cores to go into deeper sleep states.

1.2.168 scrub-domheap

= <boolean>

Default: false

Scrub domains’ freed pages. This is a safety net against a (buggy) domain accidentally leaking secrets by releasing pages without proper sanitization.

1.2.169 serial_tx_buffer

= <size>

Default: 16kB

Set the serial transmit buffer size.

1.2.170 serrors (ARM)

= diverse | panic

Default: diverse

This parameter is provided to administrators to determine how the hypervisor handles SErrors.

1.2.171 shim_mem (x86)

= List of ( min:<size> | max:<size> | <size> )

Set the amount of memory that xen-shim uses. Only has effect if pv-shim mode is enabled. Note that this value accounts for the memory used by the shim itself plus the free memory slack given to the shim for runtime allocations.

By default, the amount of free memory slack given to the shim for runtime usage is 1MB.

1.2.172 smap (x86)

= <boolean> | hvm

Default: true unless running in pv-shim mode on AMD or Hygon hardware

Flag to enable Supervisor Mode Access Prevention Use smap=hvm to allow SMAP use by HVM guests only.

In PV shim mode on AMD or Hygon hardware due to significant performance impact in some cases and generally lower security risk the option defaults to false.

1.2.173 smep (x86)

= <boolean> | hvm

Default: true unless running in pv-shim mode on AMD or Hygon hardware

Flag to enable Supervisor Mode Execution Protection Use smep=hvm to allow SMEP use by HVM guests only.

In PV shim mode on AMD or Hygon hardware due to significant performance impact in some cases and generally lower security risk the option defaults to false.

1.2.174 smt (x86)

= <boolean>

Default: true

Control bring up of multiple hyper-threads per CPU core.

1.2.175 snb_igd_quirk

= <boolean> | cap | <integer>

A true boolean value enables legacy behavior (1s timeout), while cap enforces the maximum theoretically necessary timeout of 670ms. Any number is being interpreted as a custom timeout in milliseconds. Zero or boolean false disable the quirk workaround, which is also the default.

1.2.176 spec-ctrl (Arm)

= List of [ ssbd=force-disable|runtime|force-enable ]

Controls for speculative execution sidechannel mitigations.

The option ssbd= is used to control the state of Speculative Store Bypass Disable (SSBD) mitigation.

By default SSBD will be mitigated at runtime (i.e ssbd=runtime).

1.2.177 spec-ctrl (x86)

= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>, {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>, bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd, eager-fpu,l1d-flush,branch-harden,srb-lock, unpriv-mmio,gds-mit,div-scrub,lock-harden}=<bool> ]

Controls for speculative execution sidechannel mitigations. By default, Xen will pick the most appropriate mitigations based on compiled in support, loaded microcode, and hardware details, and will virtualise appropriate mitigations for guests to use.

WARNING: Any use of this option may interfere with heuristics. Use with extreme care.

An overall boolean value, spec-ctrl=no, can be specified to turn off all mitigations, including pieces of infrastructure used to virtualise certain mitigation features for guests. This also includes settings which xpti, smt, pv-l1tf, tsx control, unless the respective option(s) have been specified earlier on the command line.

Alternatively, a slightly more restricted spec-ctrl=no-xen can be used to turn off all of Xen’s mitigations, while leaving the virtualisation support in place for guests to use.

Use of a positive boolean value for either of these options is invalid.

The pv=, hvm=, msr-sc=, rsb=, verw= and ibpb-entry= options offer fine grained control over the primitives by Xen. These impact Xen’s ability to protect itself, and/or Xen’s ability to virtualise support for guests to use.

If Xen was compiled with CONFIG_INDIRECT_THUNK support, bti-thunk= can be used to select which of the thunks gets patched into the __x86_indirect_thunk_%reg locations. The default thunk is retpoline (generally preferred), with the alternatives being jmp (a jmp *%reg gadget, minimal overhead), and lfence (an lfence; jmp *%reg gadget).

On hardware supporting IBRS (Indirect Branch Restricted Speculation), the ibrs= option can be used to force or prevent Xen using the feature itself. If Xen is not using IBRS itself, functionality is still set up so IBRS can be virtualised for guests.

On hardware supporting STIBP (Single Thread Indirect Branch Predictors), the stibp= option can be used to force or prevent Xen using the feature itself. By default, Xen will use STIBP when IBRS is in use (IBRS implies STIBP), and when hardware hints recommend using it as a blanket setting.

On hardware supporting SSBD (Speculative Store Bypass Disable), the ssbd= option can be used to force or prevent Xen using the feature itself. The feature is virtualised for guests, independently of Xen’s choice of setting. On AMD hardware, disabling Xen SSBD usage on the command line (ssbd=0 which is the default value) can lead to Xen running with the guest SSBD selection depending on hardware support, on the same hardware setting ssbd=1 will result in SSBD always being enabled, regardless of guest choice.

On hardware supporting PSFD (Predictive Store Forwarding Disable), the psfd= option can be used to force or prevent Xen using the feature itself. By default, Xen will not use PSFD. PSFD is implied by SSBD, and SSBD is off by default.

On hardware supporting IBPB (Indirect Branch Prediction Barrier), the ibpb= option can be used to force (the default) or prevent Xen from issuing branch prediction barriers on vcpu context switches.

On all hardware, the eager-fpu= option can be used to force or prevent Xen from using fully eager FPU context switches. This is currently implemented as a global control. By default, Xen will choose to use fully eager context switches on hardware believed to speculate past #NM exceptions.

On hardware supporting L1D_FLUSH, the l1d-flush= option can be used to force or prevent Xen from issuing an L1 data cache flush on each VMEntry. Irrespective of Xen’s setting, the feature is virtualised for HVM guests to use. By default, Xen will enable this mitigation on hardware believed to be vulnerable to L1TF.

If Xen is compiled with CONFIG_SPECULATIVE_HARDEN_BRANCH, the branch-harden= boolean can be used to force or prevent Xen from using speculation barriers to protect selected conditional branches. By default, Xen will enable this mitigation.

On hardware supporting SRBDS_CTRL, the srb-lock= option can be used to force or prevent Xen from protect the Special Register Buffer from leaking stale data. By default, Xen will enable this mitigation, except on parts where MDS is fixed and TAA is fixed/mitigated and there are no unprivileged MMIO mappings (in which case, there is believed to be no way for an attacker to obtain stale data).

The unpriv-mmio= boolean indicates whether the system has (or will have) less than fully privileged domains granted access to MMIO devices. By default, this option is disabled. If enabled, Xen will use the FB_CLEAR and/or SRBDS_CTRL functionality available in the Intel May 2022 microcode release to mitigate cross-domain leakage of data via the MMIO Stale Data vulnerabilities.

On all hardware, the gds-mit= option can be used to force or prevent Xen from mitigating the GDS (Gather Data Sampling) vulnerability. By default, Xen will mitigate GDS on hardware believed to be vulnerable. On hardware supporting GDS_CTRL (requires the August 2023 microcode), and where firmware has elected not to lock the configuration, Xen will use GDS_CTRL to mitigate GDS with. Otherwise, Xen will mitigate by disabling AVX, which blocks the use of the AVX2 Gather instructions.

On all hardware, the div-scrub= option can be used to force or prevent Xen from mitigating the DIV-leakage vulnerability. By default, Xen will mitigate DIV-leakage on hardware believed to be vulnerable.

If Xen is compiled with CONFIG_SPECULATIVE_HARDEN_LOCK, the lock-harden= boolean can be used to force or prevent Xen from using speculation barriers to protect lock critical regions. This mitigation won’t be engaged by default, and needs to be explicitly enabled on the command line.

1.2.178 sync_console

= <boolean>

Default: false

Flag to force synchronous console output. Useful for debugging, but not suitable for production environments due to incurred overhead.

1.2.179 tboot (x86)

= 0x<phys_addr>

Specify the physical address of the trusted boot shared page.

1.2.180 tbuf_size

= <integer>

Specify the per-cpu trace buffer size in pages.

1.2.181 tdt (x86)

= <boolean>

Default: true

Flag to enable TSC deadline as the APIC timer mode.

1.2.182 tevt_mask

= <integer>

Specify a mask for Xen event tracing. This allows Xen tracing to be enabled at boot. Refer to the xentrace(8) documentation for a list of valid event mask values. In order to enable tracing, a buffer size (in pages) must also be specified via the tbuf_size parameter.

1.2.183 tickle_one_idle_cpu

= <boolean>

1.2.184 timer_slop

= <integer>

1.2.185 tsc (x86)

= unstable | skewed | stable:socket

1.2.186 tsx

= <bool>

Applicability: x86
Default: false on parts vulnerable to TAA, true otherwise

Controls for the use of Transactional Synchronization eXtensions.

Several microcode updates are relevant:

On systems with the ability to configure TSX, this boolean offers system wide control of whether TSX is enabled or disabled.

When TSX is disabled, transactions unconditionally abort. This is compatible with the TSX spec, which requires software to have a non-transactional path as a fallback. The RTM and HLE CPUID bits are hidden from VMs by default, but can be re-enabled if required. This allows VMs which previously saw RTM/HLE to be migrated in, although any TSX-enabled software will run with reduced performance.

1.2.187 ucode

= List of [ <integer> | scan=<bool>, nmi=<bool>, allow-same=<bool> ]

Applicability: x86
Default: `nmi`

Controls for CPU microcode loading. For early loading, this parameter can specify how and where to find the microcode update blob. For late loading, this parameter specifies if the update happens within a NMI handler.

‘integer’ specifies the CPU microcode update blob module index. When positive, this specifies the n-th module (in the GrUB entry, zero based) to be used for updating CPU micrcode. When negative, counting starts at the end of the modules in the GrUB entry (so with the blob commonly being last, one could specify ucode=-1). Note that the value of zero is not valid here (entry zero, i.e. the first module, is always the Dom0 kernel image). Note further that use of this option has an unspecified effect when used with xen.efi (there the concept of modules doesn’t exist, and the blob gets specified via the ucode=<filename> config file/section entry; see EFI configuration file description).

‘scan’ instructs the hypervisor to scan the multiboot images for an cpio image that contains microcode. Depending on the platform the blob with the microcode in the cpio name space must be: - on Intel: kernel/x86/microcode/GenuineIntel.bin - on AMD : kernel/x86/microcode/AuthenticAMD.bin When using xen.efi, the ucode=<filename> config file setting takes precedence over scan.

‘nmi’ determines late loading is performed in NMI handler or just in stop_machine context. In NMI handler, even NMIs are blocked, which is considered safer. The default value is true.

‘allow-same’ alters the default acceptance policy for new microcode to permit trying to reload the same version. Many CPUs will actually reload microcode of the same version, and this allows for easy testing of the late microcode loading path.

1.2.188 unrestricted_guest (Intel)

= <boolean>

1.2.189 vcpu_migration_delay

= <integer>

Default: 0

Specify a delay, in microseconds, between migrations of a VCPU between PCPUs when using the credit1 scheduler. This prevents rapid fluttering of a VCPU between CPUs, and reduces the implicit overheads such as cache-warming. 1ms (1000) has been measured as a good value.

1.2.190 vesa-ram

= <integer>

Default: 0

This allows to override the amount of video RAM, in MiB, determined to be present.

1.2.191 vga

= ( ask | current | text-80x<rows> | gfx-<width>x<height>x<depth> | mode-<mode> )[,keep]

ask causes Xen to display a menu of available modes and request the user to choose one of them.

current causes Xen to use the graphics adapter in its current state, without further setup.

text-80x<rows> instructs Xen to set up text mode. Valid values for <rows> are 25, 28, 30, 34, 43, 50, 80

gfx-<width>x<height>x<depth> instructs Xen to set up graphics mode with the specified width, height and depth.

mode-<mode> instructs Xen to use a specific mode, as shown with the ask option. (N.B menu modes are displayed in hex, so <mode> should be a hexadecimal number)

The optional keep parameter causes Xen to continue using the vga console even after dom0 has been started. The default behaviour is to relinquish control to dom0.

1.2.192 viridian-spinlock-retry-count (x86)

= <integer>

Default: 2047

Specify the maximum number of retries before an enlightened Windows guest will notify Xen that it has failed to acquire a spinlock.

1.2.193 viridian-version (x86)

= [<major>],[<minor>],[<build>]

Default: 6,0,0x1772

, and must be integers. The values will be encoded in guest CPUID 0x40000002 if viridian enlightenments are enabled.

1.2.194 vm-notify-window (Intel)

= <integer>

Default: 0

Specify the value of the VM Notify window used to detect locked VMs. Set to -1 to disable the feature. Value is in units of crystal clock cycles.

Note the hardware might add a threshold to the provided value in order to make it safe, and hence using 0 is fine.

1.2.195 vpid (Intel)

= <boolean>

Default: true

Use Virtual Processor ID support if available. This prevents the need for TLB flushes on VM entry and exit, increasing performance.

1.2.196 vpmu (x86)

= List of [ <bool>, bts, ipc, arch, rtm-abort=<bool> ]

Applicability: x86.  Default: false

Controls for Performance Monitoring Unit virtualisation.

Performance monitoring facilities tend to be very hardware specific, and provide access to a wealth of low level processor information.

Warning: As the virtualisation is not 100% safe, don’t use the vpmu flag on production systems (see https://xenbits.xen.org/xsa/advisory-163.html)!

1.2.197 vwfi (arm)

= trap | native

Default: trap

WFI is the ARM instruction to “wait for interrupt”. WFE is similar and means “wait for event”. This option, which is ARM specific, changes the way guest WFI and WFE are implemented in Xen. By default, Xen traps both instructions. In the case of WFI, Xen blocks the guest vcpu; in the case of WFE, Xen yield the guest vcpu. When setting vwfi to native, Xen doesn’t trap either instruction, running them in guest context. Setting vwfi to native reduces irq latency significantly. It can also lead to suboptimal scheduling decisions, but only when the system is oversubscribed (i.e., in total there are more vCPUs than pCPUs).

1.2.198 watchdog (x86)

= force | <boolean>

Default: false

Run an NMI watchdog on each processor. If a processor is stuck for longer than the watchdog_timeout, a panic occurs. When force is specified, in addition to running an NMI watchdog on each processor, unknown NMIs will still be processed.

1.2.199 watchdog_timeout (x86)

= <integer>

Default: 5

Set the NMI watchdog timeout in seconds. Specifying 0 will turn off the watchdog.

1.2.200 x2apic (x86)

= <boolean>

Default: true

Permit use of x2apic setup for SMP environments.

1.2.201 x2apic-mode (x86)

= physical | cluster | mixed

Default: physical if FADT mandates physical mode, otherwise set at build time by CONFIG_X2APIC_{PHYSICAL,LOGICAL,MIXED}.

In the case that x2apic is in use, this option switches between modes to address APICs in the system as interrupt destinations.

1.2.202 x2apic_phys (x86)

= <boolean>

Default: true if FADT mandates physical mode or if interrupt remapping is not available, false otherwise.

In the case that x2apic is in use, this option switches between physical and clustered mode. The default, given no hint from the FADT, is cluster mode.

WARNING: x2apic_phys is deprecated and superseded by x2apic-mode. The latter takes precedence if both are set.

1.2.203 xenheap_megabytes (arm32)

= <size>

Default: 0 (1/32 of RAM)

Amount of RAM to set aside for the Xenheap. Must be an integer multiple of 32.

By default will use 1/32 of the RAM up to a maximum of 1GB and with a minimum of 32M, subject to a suitably aligned and sized contiguous region of memory being available.

1.2.204 xpti (x86)

= List of [ default | <boolean> | dom0=<bool> | domu=<bool> ]

Default: false on hardware known not to be vulnerable to Meltdown (e.g. AMD) Default: true everywhere else

Override default selection of whether to isolate 64-bit PV guest page tables.

true activates page table isolation even on hardware not vulnerable by Meltdown for all domains.

false deactivates page table isolation on all systems for all domains.

default sets the default behaviour.

With dom0 and domu it is possible to control page table isolation for dom0 or guest domains only.

1.2.205 xsave (x86)

= <boolean>

Default: true

Permit use of the xsave/xrstor instructions.

1.2.206 xsm

= dummy | flask | silo

Default: selectable via Kconfig. Depends on enabled XSM modules.

Specify which XSM module should be enabled. This option is only available if the hypervisor was compiled with CONFIG_XSM enabled.