aboutsummaryrefslogtreecommitdiff
path: root/man/io_uring_setup.2
diff options
context:
space:
mode:
Diffstat (limited to 'man/io_uring_setup.2')
-rw-r--r--man/io_uring_setup.2153
1 files changed, 139 insertions, 14 deletions
diff --git a/man/io_uring_setup.2 b/man/io_uring_setup.2
index cb8eba9..75c69ff 100644
--- a/man/io_uring_setup.2
+++ b/man/io_uring_setup.2
@@ -37,7 +37,8 @@ struct io_uring_params {
__u32 sq_thread_cpu;
__u32 sq_thread_idle;
__u32 features;
- __u32 resv[4];
+ __u32 wq_fd;
+ __u32 resv[3];
struct io_sqring_offsets sq_off;
struct io_cqring_offsets cq_off;
};
@@ -170,7 +171,7 @@ then it will be clamped at
.B IORING_MAX_CQ_ENTRIES .
.TP
.B IORING_SETUP_ATTACH_WQ
-This flag should be set in conjunction with
+This flag should be set in conjunction with
.IR "struct io_uring_params.wq_fd"
being set to an existing io_uring ring file descriptor. When set, the
io_uring instance being created will share the asynchronous worker
@@ -183,6 +184,61 @@ In this state, restrictions can be registered, but submissions are not allowed.
See
.BR io_uring_register (2)
for details on how to enable the ring. Available since 5.10.
+.TP
+.B IORING_SETUP_SUBMIT_ALL
+Normally io_uring stops submitting a batch of request, if one of these requests
+results in an error. This can cause submission of less than what is expected,
+if a request ends in error while being submitted. If the ring is created with
+this flag,
+.BR io_uring_enter (2)
+will continue submitting requests even if it encounters an error submitting
+a request. CQEs are still posted for errored request regardless of whether or
+not this flag is set at ring creation time, the only difference is if the
+submit sequence is halted or continued when an error is observed. Available
+since 5.18.
+.TP
+.B IORING_SETUP_COOP_TASKRUN
+By default, io_uring will interrupt a task running in userspace when a
+completion event comes in. This is to ensure that completions run in a timely
+manner. For a lot of use cases, this is overkill and can cause reduced
+performance from both the inter-processor interrupt used to do this, the
+kernel/user transition, the needless interruption of the tasks userspace
+activities, and reduced batching if completions come in at a rapid rate. Most
+applications don't need the forceful interruption, as the events are processed
+at any kernel/user transition. The exception are setups where the application
+uses multiple threads operating on the same ring, where the application
+waiting on completions isn't the one that submitted them. For most other
+use cases, setting this flag will improve performance. Available since 5.19.
+.TP
+.B IORING_SETUP_TASKRUN_FLAG
+Used in conjunction with
+.B IORING_SETUP_COOP_TASKRUN,
+this provides a flag,
+.B IORING_SQ_TASKRUN,
+which is set in the SQ ring
+.I flags
+whenever completions are pending that should be processed. liburing will check
+for this flag even when doing
+.BR io_uring_peek_cqe (3)
+and enter the kernel to process them, and applications can do the same. This
+makes
+.B IORING_SETUP_TASKRUN_FLAG
+safe to use even when applications rely on a peek style operation on the CQ
+ring to see if anything might be pending to reap. Available since 5.19.
+.TP
+.B IORING_SETUP_SQE128
+If set, io_uring will use 128-byte SQEs rather than the normal 64-byte sized
+variant. This is a requirement for using certain request types, as of 5.19
+only the
+.B IORING_OP_URING_CMD
+passthrough command for NVMe passthrough needs this. Available since 5.19.
+.TP
+.B IORING_SETUP_CQE32
+If set, io_uring will use 32-byte CQEs rather than the normal 32-byte sized
+variant. This is a requirement for using certain request types, as of 5.19
+only the
+.B IORING_OP_URING_CMD
+passthrough command for NVMe passthrough needs this. Available since 5.19.
.PP
If no flags are specified, the io_uring instance is setup for
interrupt driven I/O. I/O may be submitted using
@@ -202,27 +258,30 @@ If this flag is set, the two SQ and CQ rings can be mapped with a single
.I mmap(2)
call. The SQEs must still be allocated separately. This brings the necessary
.I mmap(2)
-calls down from three to two.
+calls down from three to two. Available since kernel 5.4.
.TP
.B IORING_FEAT_NODROP
If this flag is set, io_uring supports never dropping completion events.
If a completion event occurs and the CQ ring is full, the kernel stores
the event internally until such a time that the CQ ring has room for more
entries. If this overflow condition is entered, attempting to submit more
-IO with fail with the
+IO will fail with the
.B -EBUSY
error value, if it can't flush the overflown events to the CQ ring. If this
happens, the application must reap events from the CQ ring and attempt the
-submit again.
+submit again. Available since kernel 5.5.
.TP
.B IORING_FEAT_SUBMIT_STABLE
If this flag is set, applications can be certain that any data for
-async offload has been consumed when the kernel has consumed the SQE.
+async offload has been consumed when the kernel has consumed the SQE. Available
+since kernel 5.5.
.TP
.B IORING_FEAT_RW_CUR_POS
If this flag is set, applications can specify
.I offset
-== -1 with
+==
+.B -1
+with
.B IORING_OP_{READV,WRITEV}
,
.B IORING_OP_{READ,WRITE}_FIXED
@@ -234,10 +293,13 @@ and
.I pwritev2(2)
with
.I offset
-== -1. It'll use (and update) the current file position. This obviously comes
+==
+.B -1.
+It'll use (and update) the current file position. This obviously comes
with the caveat that if the application has multiple reads or writes in flight,
then the end result will not be as expected. This is similar to threads sharing
-a file descriptor and doing IO using the current file position.
+a file descriptor and doing IO using the current file position. Available since
+kernel 5.6.
.TP
.B IORING_FEAT_CUR_PERSONALITY
If this flag is set, then io_uring guarantees that both sync and async
@@ -253,7 +315,7 @@ still register different personalities through
io_uring_register(2)
with
.B IORING_REGISTER_PERSONALITY
-and specify the personality to use in the sqe.
+and specify the personality to use in the sqe. Available since kernel 5.6.
.TP
.B IORING_FEAT_FAST_POLL
If this flag is set, then io_uring supports using an internal poll mechanism
@@ -262,20 +324,81 @@ write data to a file no longer need to be punted to an async thread for
handling, instead they will begin operation when the file is ready. This is
similar to doing poll + read/write in userspace, but eliminates the need to do
so. If this flag is set, requests waiting on space/data consume a lot less
-resources doing so as they are not blocking a thread.
+resources doing so as they are not blocking a thread. Available since kernel
+5.7.
.TP
.B IORING_FEAT_POLL_32BITS
If this flag is set, the
.B IORING_OP_POLL_ADD
command accepts the full 32-bit range of epoll based flags. Most notably
.B EPOLLEXCLUSIVE
-which allows exclusive (waking single waiters) behavior.
+which allows exclusive (waking single waiters) behavior. Available since kernel
+5.9.
.TP
.B IORING_FEAT_SQPOLL_NONFIXED
If this flag is set, the
.B IORING_SETUP_SQPOLL
feature no longer requires the use of fixed files. Any normal file descriptor
-can be used for IO commands without needing registration.
+can be used for IO commands without needing registration. Available since
+kernel 5.11.
+.TP
+.B IORING_FEAT_ENTER_EXT_ARG
+If this flag is set, then the
+.BR io_uring_enter (2)
+system call supports passing in an extended argument instead of just the
+.IR "sigset_t"
+of earlier kernels. This.
+extended argument is of type
+.IR "struct io_uring_getevents_arg"
+and allows the caller to pass in both a
+.IR "sigset_t"
+and a timeout argument for waiting on events. The struct layout is as follows:
+.TP
+.in +8n
+.EX
+struct io_uring_getevents_arg {
+ __u64 sigmask;
+ __u32 sigmask_sz;
+ __u32 pad;
+ __u64 ts;
+};
+.EE
+
+and a pointer to this struct must be passed in if
+.B IORING_ENTER_EXT_ARG
+is set in the flags for the enter system call. Available since kernel 5.11.
+.TP
+.B IORING_FEAT_NATIVE_WORKERS
+If this flag is set, io_uring is using native workers for its async helpers.
+Previous kernels used kernel threads that assumed the identity of the
+original io_uring owning task, but later kernels will actively create what
+looks more like regular process threads instead. Available since kernel
+5.12.
+.TP
+.B IORING_FEAT_RSRC_TAGS
+If this flag is set, then io_uring supports a variety of features related
+to fixed files and buffers. In particular, it indicates that registered
+buffers can be updated in-place, whereas before the full set would have to
+be unregistered first. Available since kernel 5.13.
+.TP
+.B IORING_FEAT_CQE_SKIP
+If this flag is set, then io_uring supports setting
+.B IOSQE_CQE_SKIP_SUCCESS
+in the submitted SQE, indicating that no CQE should be generated for this
+SQE if it executes normally. If an error happens processing the SQE, a
+CQE with the appropriate error value will still be generated. Available since
+kernel 5.17.
+.TP
+.B IORING_FEAT_LINKED_FILE
+If this flag is set, then io_uring supports sane assignment of files for SQEs
+that have dependencies. For example, if a chain of SQEs are submitted with
+.B IOSQE_IO_LINK,
+then kernels without this flag will prepare the file for each link upfront.
+If a previous link opens a file with a known index, eg if direct descriptors
+are used with open or accept, then file assignment needs to happen post
+execution of that SQE. If this flag is set, then the kernel will defer
+file assignment until execution of a given request is started. Available since
+kernel 5.17.
.PP
The rest of the fields in the
@@ -425,7 +548,9 @@ or
.BR io_uring_enter (2)
system calls.
-On error, -1 is returned and
+On error,
+.B -1
+is returned and
.I errno
is set appropriately.
.PP