aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 66eed4e8245f398a04754b57e286ce2486b1bbe0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# virtio-queue

The `virtio-queue` crate provides a virtio device implementation for a virtio
queue, a virtio descriptor and a chain of such descriptors.
Two formats of virtio queues are defined in the specification: split virtqueues
and packed virtqueues. The `virtio-queue` crate offers support only for the
[split virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-240006)
format.
The purpose of the virtio-queue API is to be consumed by virtio device
implementations (such as the block device or vsock device).
The main abstraction is the `Queue`. The crate is also defining a state object
for the queue, i.e. `QueueState`.

## Usage

Let’s take a concrete example of how a device would work with a queue, using
the MMIO bus.

First, it is important to mention that the mandatory parts of the virtio
interface are the following:

- the device status field → provides an indication of
  [the completed steps](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001)
  of the device initialization routine, 
- the feature bits →
  [the features](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001)
  the driver/device understand(s),
- [notifications](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-170003),
- one or more
  [virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-230005)
  → the mechanism for data transport between the driver and device.

Each virtqueue consists of three parts:

- Descriptor Table,
- Available Ring,
- Used Ring.

Before booting the virtual machine (VM), the VMM does the following set up:

1. initialize an array of Queues using the Queue constructor.
2. register the device to the MMIO bus, so that the driver can later send
   read/write requests from/to the MMIO space, some of those requests also set
   up the queues’ state.
3. other pre-boot configurations, such as registering a fd for the interrupt
   assigned to the device, fd which will be later used by the device to inform
   the driver that it has information to communicate.

After the boot of the VM, the driver starts sending read/write requests to
configure things like:

* the supported features;
* queue parameters. The following setters are used for the queue set up:
    * `set_size` → for setting the size of the queue.
    * `set_ready` → configure the queue to the `ready for processing` state.
    * `set_desc_table_address`, `set_avail_ring_address`,
      `set_used_ring_address` → configure the guest address of the constituent
      parts of the queue.
    * `set_event_idx` → it is called as part of the features' negotiation in
      the `virtio-device` crate, and is enabling or disabling the
      VIRTIO_F_RING_EVENT_IDX feature.
* the device activation. As part of this activation, the device can also create
  a queue handler for the device, that can be later used to process the queue.

Once the queues are ready, the device can be used.

The steady state operation of a virtio device follows a model where the driver
produces descriptor chains which are consumed by the device, and both parties
need to be notified when new elements have been placed on the associate ring to
avoid busy polling. The precise notification mechanism is left up to the VMM
that incorporates the devices and queues (it usually involves things like MMIO
vm exits and interrupt injection into the guest). The queue implementation is
agnostic to the notification mechanism in use, and it exposes methods and
functionality (such as iterators) that are called from the outside in response
to a notification event.

### Data transmission using virtqueues

The basic principle of how the queues are used by the device/driver is the
following, as showed in the diagram below as well:

1. when the guest driver has a new request (buffer), it allocates free
   descriptor(s) for the buffer in the descriptor table, chaining as necessary.
2. the driver adds a new entry with the head index of the descriptor chain
   describing the request, in the available ring entries.
3. the driver increments the `idx` with the number of new entries, the diagram
   shows the simple use case of only one new entry.
4. the driver sends an available buffer notification to the device if such 
   notifications are not suppressed.
5. the device will at some point consume that request, by first reading the
   `idx` field from the available ring. This can be directly achieved with
   `Queue::avail_idx`, but we do not recommend to the consumers of the crate
   to use this because it is already called behind the scenes by the iterator
   over all available descriptor chain heads.
6. the device gets the index of the descriptor chain(s) corresponding to the
   read `idx` value.
7. the device reads the corresponding descriptor(s) from the descriptor table.
8. the device adds a new entry in the used ring by using `Queue::add_used`; the
   entry is defined in the spec as `virtq_used_elem`, and in `virtio-queue` as
   `VirtqUsedElem`. This structure is holding both the index of the descriptor
   chain and the number of bytes that were written to the memory as part of
   serving the request.
9. the device increments the `idx` from the used ring; this is done as part of
   the `Queue::add_used` that was mentioned above.
10. the device sends a used buffer notification to the driver if such
    notifications are not suppressed.

![queue](https://raw.githubusercontent.com/rust-vmm/vm-virtio/main/virtio-queue/docs/images/queue.png)

A descriptor is storing four fields, with the first two, `addr` and `len`,
pointing to the data in memory to which the descriptor refers, as shown in the
diagram below. The `flags` field is useful for indicating if, for example, the
buffer is device readable or writable, or if we have another descriptor chained
after this one (VIRTQ_DESC_F_NEXT flag set). `next` field is storing the index
of the next descriptor if VIRTQ_DESC_F_NEXT is set.

![descriptor](https://raw.githubusercontent.com/rust-vmm/vm-virtio/main/virtio-queue/docs/images/descriptor.png)

**Requirements for device implementation**

* Abstractions from virtio-queue such as `DescriptorChain` can be used to parse
  descriptors provided by the device, which represent input or output memory
  areas for device I/O. A descriptor is essentially an (address, length) pair,
  which is subsequently used by the device model operation. We do not check the
  validity of the descriptors, and instead expect any validations to happen
  when the device implementation is attempting to access the corresponding
  areas. Early checks can add non-negligible additional costs, and exclusively
  relying upon them may lead to time-of-check-to-time-of-use race conditions.
* The device should validate before reading/writing to a buffer that it is
  device-readable/device-writable.

## Design

`QueueT` is a trait that allows different implementations for a `Queue`
object for single-threaded context and multi-threaded context. The
implementations provided in `virtio-queue` are:

1. `Queue` → it is used for the single-threaded context.
2. `QueueSync` → it is used for the multi-threaded context, and is simply
   a wrapper over an `Arc<Mutex<Queue>>`.

Besides the above abstractions, the `virtio-queue` crate provides also the
following ones:

* `Descriptor` → which mostly offers accessors for the members of the
  `Descriptor`.
* `DescriptorChain` → provides accessors for the `DescriptorChain`’s members
  and an `Iterator` implementation for iterating over the `DescriptorChain`,
  there is also an abstraction for iterators over just the device readable or
  just the device writable descriptors (`DescriptorChainRwIter`).
* `AvailIter` - is a consuming iterator over all available descriptor chain
  heads in the queue.

## Save/Restore Queue

The `Queue` allows saving the state through the `state` function which returns
a `QueueState`. `Queue` objects can be created from a previously saved state by
using `QueueState::try_from`. The VMM should check for errors when restoring
a `Queue` from a previously saved state.

### Notification suppression

A big part of the `virtio-queue` crate consists of the notification suppression
support. As already mentioned, the driver can send an available buffer
notification to the device when there are new entries in the available ring,
and the device can send a used buffer notification to the driver when there are
new entries in the used ring. There might be cases when sending a notification
each time these scenarios happen is not efficient, for example when the driver
is processing the used ring, it would not need to receive another used buffer
notification. The mechanism for suppressing the notifications is detailed in
the following sections from the specification:
- [Used Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-400007),
- [Available Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-4800010).

The `Queue` abstraction is proposing the following sequence of steps for
processing new available ring entries:

1. the device first disables the notifications to make the driver aware it is
   processing the available ring and does not want interruptions, by using
   `Queue::disable_notification`. Notifications are disabled by the device
   either if VIRTIO_F_EVENT_IDX is not negotiated, and VIRTQ_USED_F_NO_NOTIFY
   is set in the `flags` field of the used ring, or if VIRTIO_F_EVENT_IDX is
   negotiated, and `avail_event` value is not updated, i.e. it remains set to
   the latest `idx` value of the available ring that was already notified by
   the driver.
2. the device processes the new entries by using the `AvailIter` iterator.
3. the device can enable the notifications now, by using
   `Queue::enable_notification`. Notifications are enabled by the device either
   if VIRTIO_F_EVENT_IDX is not negotiated, and 0 is set in the `flags` field
   of the used ring, or if VIRTIO_F_EVENT_IDX is negotiated, and `avail_event`
   value is set to the smallest `idx` value of the available ring that was not
   already notified by the driver. This way the device makes sure that it won’t
   miss any notification.

The above steps should be done in a loop to also handle the less likely case
where the driver added new entries just before we re-enabled notifications.

On the driver side, the `Queue` provides the `needs_notification` method which
should be used each time the device adds a new entry to the used ring.
Depending on the `used_event` value and on the last used value
(`signalled_used`), `needs_notification` returns true to let the device know it
should send a notification to the guest.

## Assumptions

We assume the users of the `Queue` implementation won’t attempt to use the
queue before checking that the `ready` bit is set. This can be verified by
calling `Queue::is_valid` which, besides this, is also checking that the three
queue parts are valid memory regions.
We assume consumers will use `AvailIter::go_to_previous_position` only in
single-threaded contexts.
We assume the users will consume the entries from the available ring in the
recommended way from the documentation, i.e. device starts processing the
available ring entries, disables the notifications, processes the entries,
and then re-enables notifications.

## License

This project is licensed under either of

- [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0
- [BSD-3-Clause License](https://opensource.org/licenses/BSD-3-Clause)