1 files changed, 583 insertions, 0 deletions
diff --git a/pw_tokenizer/detokenization.rst b/pw_tokenizer/detokenization.rst
new file mode 100644
index 000000000..7fbefec88
--- /dev/null
+++ b/pw_tokenizer/detokenization.rst
@@ -0,0 +1,583 @@
+:tocdepth: 3
+
+.. _module-pw_tokenizer-detokenization:
+
+==============
+Detokenization
+==============
+.. pigweed-module-subpage::
+   :name: pw_tokenizer
+   :tagline: Compress strings to shrink logs by +75%
+
+Detokenization is the process of expanding a token to the string it represents
+and decoding its arguments. ``pw_tokenizer`` provides Python, C++ and
+TypeScript detokenization libraries.
+
+--------------------------------
+Example: decoding tokenized logs
+--------------------------------
+A project might tokenize its log messages with the
+:ref:`module-pw_tokenizer-base64-format`. Consider the following log file, which
+has four tokenized logs and one plain text log:
+
+.. code-block:: text
+
+   20200229 14:38:58 INF $HL2VHA==
+   20200229 14:39:00 DBG $5IhTKg==
+   20200229 14:39:20 DBG Crunching numbers to calculate probability of success
+   20200229 14:39:21 INF $EgFj8lVVAUI=
+   20200229 14:39:23 ERR $DFRDNwlOT1RfUkVBRFk=
+
+The project's log strings are stored in a database like the following:
+
+.. code-block::
+
+   1c95bd1c,          ,"Initiating retrieval process for recovery object"
+   2a5388e4,          ,"Determining optimal approach and coordinating vectors"
+   3743540c,          ,"Recovery object retrieval failed with status %s"
+   f2630112,          ,"Calculated acceptable probability of success (%.2f%%)"
+
+Using the detokenizing tools with the database, the logs can be decoded:
+
+.. code-block:: text
+
+   20200229 14:38:58 INF Initiating retrieval process for recovery object
+   20200229 14:39:00 DBG Determining optimal algorithm and coordinating approach vectors
+   20200229 14:39:20 DBG Crunching numbers to calculate probability of success
+   20200229 14:39:21 INF Calculated acceptable probability of success (32.33%)
+   20200229 14:39:23 ERR Recovery object retrieval failed with status NOT_READY
+
+.. note::
+
+   This example uses the :ref:`module-pw_tokenizer-base64-format`, which
+   occupies about 4/3 (133%) as much space as the default binary format when
+   encoded. For projects that wish to interleave tokenized with plain text,
+   using Base64 is a worthwhile tradeoff.
+
+------------------------
+Detokenization in Python
+------------------------
+To detokenize in Python, import ``Detokenizer`` from the ``pw_tokenizer``
+package, and instantiate it with paths to token databases or ELF files.
+
+.. code-block:: python
+
+   import pw_tokenizer
+
+   detokenizer = pw_tokenizer.Detokenizer('path/to/database.csv', 'other/path.elf')
+
+   def process_log_message(log_message):
+       result = detokenizer.detokenize(log_message.payload)
+       self._log(str(result))
+
+The ``pw_tokenizer`` package also provides the ``AutoUpdatingDetokenizer``
+class, which can be used in place of the standard ``Detokenizer``. This class
+monitors database files for changes and automatically reloads them when they
+change. This is helpful for long-running tools that use detokenization. The
+class also supports token domains for the given database files in the
+``<path>#<domain>`` format.
+
+For messages that are optionally tokenized and may be encoded as binary,
+Base64, or plaintext UTF-8, use
+:func:`pw_tokenizer.proto.decode_optionally_tokenized`. This will attempt to
+determine the correct method to detokenize and always provide a printable
+string.
+
+.. _module-pw_tokenizer-base64-decoding:
+
+Decoding Base64
+===============
+The Python ``Detokenizer`` class supports decoding and detokenizing prefixed
+Base64 messages with ``detokenize_base64`` and related methods.
+
+.. tip::
+   The Python detokenization tools support recursive detokenization for prefixed
+   Base64 text. Tokenized strings found in detokenized text are detokenized, so
+   prefixed Base64 messages can be passed as ``%s`` arguments.
+
+   For example, the tokenized string for "Wow!" is ``$RhYjmQ==``. This could be
+   passed as an argument to the printf-style string ``Nested message: %s``, which
+   encodes to ``$pEVTYQkkUmhZam1RPT0=``. The detokenizer would decode the message
+   as follows:
+
+   ::
+
+     "$pEVTYQkkUmhZam1RPT0=" → "Nested message: $RhYjmQ==" → "Nested message: Wow!"
+
+Base64 decoding is supported in C++ or C with the
+``pw::tokenizer::PrefixedBase64Decode`` or ``pw_tokenizer_PrefixedBase64Decode``
+functions.
+
+Investigating undecoded Base64 messages
+---------------------------------------
+Tokenized messages cannot be decoded if the token is not recognized. The Python
+package includes the ``parse_message`` tool, which parses tokenized Base64
+messages without looking up the token in a database. This tool attempts to guess
+the types of the arguments and displays potential ways to decode them.
+
+This tool can be used to extract argument information from an otherwise unusable
+message. It could help identify which statement in the code produced the
+message. This tool is not particularly helpful for tokenized messages without
+arguments, since all it can do is show the value of the unknown token.
+
+The tool is executed by passing Base64 tokenized messages, with or without the
+``$`` prefix, to ``pw_tokenizer.parse_message``. Pass ``-h`` or ``--help`` to
+see full usage information.
+
+Example
+^^^^^^^
+.. code-block::
+
+   $ python -m pw_tokenizer.parse_message '$329JMwA=' koSl524TRkFJTEVEX1BSRUNPTkRJVElPTgJPSw== --specs %s %d
+
+   INF Decoding arguments for '$329JMwA='
+   INF Binary: b'\xdfoI3\x00' [df 6f 49 33 00] (5 bytes)
+   INF Token:  0x33496fdf
+   INF Args:   b'\x00' [00] (1 bytes)
+   INF Decoding with up to 8 %s or %d arguments
+   INF   Attempt 1: [%s]
+   INF   Attempt 2: [%d] 0
+
+   INF Decoding arguments for '$koSl524TRkFJTEVEX1BSRUNPTkRJVElPTgJPSw=='
+   INF Binary: b'\x92\x84\xa5\xe7n\x13FAILED_PRECONDITION\x02OK' [92 84 a5 e7 6e 13 46 41 49 4c 45 44 5f 50 52 45 43 4f 4e 44 49 54 49 4f 4e 02 4f 4b] (28 bytes)
+   INF Token:  0xe7a58492
+   INF Args:   b'n\x13FAILED_PRECONDITION\x02OK' [6e 13 46 41 49 4c 45 44 5f 50 52 45 43 4f 4e 44 49 54 49 4f 4e 02 4f 4b] (24 bytes)
+   INF Decoding with up to 8 %s or %d arguments
+   INF   Attempt 1: [%d %s %d %d %d] 55 FAILED_PRECONDITION 1 -40 -38
+   INF   Attempt 2: [%d %s %s] 55 FAILED_PRECONDITION OK
+
+
+.. _module-pw_tokenizer-protobuf-tokenization-python:
+
+Detokenizing protobufs
+======================
+The :py:mod:`pw_tokenizer.proto` Python module defines functions that may be
+used to detokenize protobuf objects in Python. The function
+:py:func:`pw_tokenizer.proto.detokenize_fields` detokenizes all fields
+annotated as tokenized, replacing them with their detokenized version. For
+example:
+
+.. code-block:: python
+
+   my_detokenizer = pw_tokenizer.Detokenizer(some_database)
+
+   my_message = SomeMessage(tokenized_field=b'$YS1EMQ==')
+   pw_tokenizer.proto.detokenize_fields(my_detokenizer, my_message)
+
+   assert my_message.tokenized_field == b'The detokenized string! Cool!'
+
+Decoding optionally tokenized strings
+-------------------------------------
+The encoding used for an optionally tokenized field is not recorded in the
+protobuf. Despite this, the text can reliably be decoded. This is accomplished
+by attempting to decode the field as binary or Base64 tokenized data before
+treating it like plain text.
+
+The following diagram describes the decoding process for optionally tokenized
+fields in detail.
+
+.. mermaid::
+
+  flowchart TD
+     start([Received bytes]) --> binary
+
+     binary[Decode as<br>binary tokenized] --> binary_ok
+     binary_ok{Detokenizes<br>successfully?} -->|no| utf8
+     binary_ok -->|yes| done_binary([Display decoded binary])
+
+     utf8[Decode as UTF-8] --> utf8_ok
+     utf8_ok{Valid UTF-8?} -->|no| base64_encode
+     utf8_ok -->|yes| base64
+
+     base64_encode[Encode as<br>tokenized Base64] --> display
+     display([Display encoded Base64])
+
+     base64[Decode as<br>Base64 tokenized] --> base64_ok
+
+     base64_ok{Fully<br>or partially<br>detokenized?} -->|no| is_plain_text
+     base64_ok -->|yes| base64_results
+
+     is_plain_text{Text is<br>printable?} -->|no| base64_encode
+     is_plain_text-->|yes| plain_text
+
+     base64_results([Display decoded Base64])
+     plain_text([Display text])
+
+Potential decoding problems
+---------------------------
+The decoding process for optionally tokenized fields will yield correct results
+in almost every situation. In rare circumstances, it is possible for it to fail,
+but these can be avoided with a low-overhead mitigation if desired.
+
+There are two ways in which the decoding process may fail.
+
+Accidentally interpreting plain text as tokenized binary
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+If a plain-text string happens to decode as a binary tokenized message, the
+incorrect message could be displayed. This is very unlikely to occur. While many
+tokens will incidentally end up being valid UTF-8 strings, it is highly unlikely
+that a device will happen to log one of these strings as plain text. The
+overwhelming majority of these strings will be nonsense.
+
+If an implementation wishes to guard against this extremely improbable
+situation, it is possible to prevent it. This situation is prevented by
+appending 0xFF (or another byte never valid in UTF-8) to binary tokenized data
+that happens to be valid UTF-8 (or all binary tokenized messages, if desired).
+When decoding, if there is an extra 0xFF byte, it is discarded.
+
+Displaying undecoded binary as plain text instead of Base64
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+If a message fails to decode as binary tokenized and it is not valid UTF-8, it
+is displayed as tokenized Base64. This makes it easily recognizable as a
+tokenized message and makes it simple to decode later from the text output (for
+example, with an updated token database).
+
+A binary message for which the token is not known may coincidentally be valid
+UTF-8 or ASCII. 6.25% of 4-byte sequences are composed only of ASCII characters
+When decoding with an out-of-date token database, it is possible that some
+binary tokenized messages will be displayed as plain text rather than tokenized
+Base64.
+
+This situation is likely to occur, but should be infrequent. Even if it does
+happen, it is not a serious issue. A very small number of strings will be
+displayed incorrectly, but these strings cannot be decoded anyway. One nonsense
+string (e.g. ``a-D1``) would be displayed instead of another (``$YS1EMQ==``).
+Updating the token database would resolve the issue, though the non-Base64 logs
+would be difficult decode later from a log file.
+
+This situation can be avoided with the same approach described in
+`Accidentally interpreting plain text as tokenized binary`_. Appending
+an invalid UTF-8 character prevents the undecoded binary message from being
+interpreted as plain text.
+
+---------------------
+Detokenization in C++
+---------------------
+The C++ detokenization libraries can be used in C++ or any language that can
+call into C++ with a C-linkage wrapper, such as Java or Rust. A reference
+Java Native Interface (JNI) implementation is provided.
+
+The C++ detokenization library uses binary-format token databases (created with
+``database.py create --type binary``). Read a binary format database from a
+file or include it in the source code. Pass the database array to
+``TokenDatabase::Create``, and construct a detokenizer.
+
+.. code-block:: cpp
+
+   Detokenizer detokenizer(TokenDatabase::Create(token_database_array));
+
+   std::string ProcessLog(span<uint8_t> log_data) {
+     return detokenizer.Detokenize(log_data).BestString();
+   }
+
+The ``TokenDatabase`` class verifies that its data is valid before using it. If
+it is invalid, the ``TokenDatabase::Create`` returns an empty database for which
+``ok()`` returns false. If the token database is included in the source code,
+this check can be done at compile time.
+
+.. code-block:: cpp
+
+   // This line fails to compile with a static_assert if the database is invalid.
+   constexpr TokenDatabase kDefaultDatabase =  TokenDatabase::Create<kData>();
+
+   Detokenizer OpenDatabase(std::string_view path) {
+     std::vector<uint8_t> data = ReadWholeFile(path);
+
+     TokenDatabase database = TokenDatabase::Create(data);
+
+     // This checks if the file contained a valid database. It is safe to use a
+     // TokenDatabase that failed to load (it will be empty), but it may be
+     // desirable to provide a default database or otherwise handle the error.
+     if (database.ok()) {
+       return Detokenizer(database);
+     }
+     return Detokenizer(kDefaultDatabase);
+   }
+
+----------------------------
+Detokenization in TypeScript
+----------------------------
+To detokenize in TypeScript, import ``Detokenizer`` from the ``pigweedjs``
+package, and instantiate it with a CSV token database.
+
+.. code-block:: typescript
+
+   import { pw_tokenizer, pw_hdlc } from 'pigweedjs';
+   const { Detokenizer } = pw_tokenizer;
+   const { Frame } = pw_hdlc;
+
+   const detokenizer = new Detokenizer(String(tokenCsv));
+
+   function processLog(frame: Frame){
+     const result = detokenizer.detokenize(frame);
+     console.log(result);
+   }
+
+For messages that are encoded in Base64, use ``Detokenizer::detokenizeBase64``.
+`detokenizeBase64` will also attempt to detokenize nested Base64 tokens. There
+is also `detokenizeUint8Array` that works just like `detokenize` but expects
+`Uint8Array` instead of a `Frame` argument.
+
+
+
+.. _module-pw_tokenizer-cli-detokenizing:
+
+---------------------
+Detokenizing CLI tool
+---------------------
+``pw_tokenizer`` provides two standalone command line utilities for detokenizing
+Base64-encoded tokenized strings.
+
+* ``detokenize.py`` -- Detokenizes Base64-encoded strings in files or from
+  stdin.
+* ``serial_detokenizer.py`` -- Detokenizes Base64-encoded strings from a
+  connected serial device.
+
+If the ``pw_tokenizer`` Python package is installed, these tools may be executed
+as runnable modules. For example:
+
+.. code-block::
+
+   # Detokenize Base64-encoded strings in a file
+   python -m pw_tokenizer.detokenize -i input_file.txt
+
+   # Detokenize Base64-encoded strings in output from a serial device
+   python -m pw_tokenizer.serial_detokenizer --device /dev/ttyACM0
+
+See the ``--help`` options for these tools for full usage information.
+
+--------
+Appendix
+--------
+
+.. _module-pw_tokenizer-python-detokenization-c99-printf-notes:
+
+Python detokenization: C99 ``printf`` compatibility notes
+=========================================================
+This implementation is designed to align with the
+`C99 specification, section 7.19.6
+<https://www.dii.uchile.cl/~daespino/files/Iso_C_1999_definition.pdf>`_.
+Notably, this specification is slightly different than what is implemented
+in most compilers due to each compiler choosing to interpret undefined
+behavior in slightly different ways. Treat the following description as the
+source of truth.
+
+This implementation supports:
+
+- Overall Format: ``%[flags][width][.precision][length][specifier]``
+- Flags (Zero or More)
+   - ``-``: Left-justify within the given field width; Right justification is
+     the default (see Width modifier).
+   - ``+``: Forces to preceed the result with a plus or minus sign (``+`` or
+     ``-``) even for positive numbers. By default, only negative numbers are
+     preceded with a ``-`` sign.
+   - (space): If no sign is going to be written, a blank space is inserted
+     before the value.
+   - ``#``: Specifies an alternative print syntax should be used.
+      - Used with ``o``, ``x`` or ``X`` specifiers the value is preceeded with
+        ``0``, ``0x`` or ``0X``, respectively, for values different than zero.
+      - Used with ``a``, ``A``, ``e``, ``E``, ``f``, ``F``, ``g``, or ``G`` it
+        forces the written output to contain a decimal point even if no more
+        digits follow. By default, if no digits follow, no decimal point is
+        written.
+   - ``0``: Left-pads the number with zeroes (``0``) instead of spaces when
+     padding is specified (see width sub-specifier).
+- Width (Optional)
+   - ``(number)``: Minimum number of characters to be printed. If the value to
+     be printed is shorter than this number, the result is padded with blank
+     spaces or ``0`` if the ``0`` flag is present. The value is not truncated
+     even if the result is larger. If the value is negative and the ``0`` flag
+     is present, the ``0``\s are padded after the ``-`` symbol.
+   - ``*``: The width is not specified in the format string, but as an
+     additional integer value argument preceding the argument that has to be
+     formatted.
+- Precision (Optional)
+   - ``.(number)``
+      - For ``d``, ``i``, ``o``, ``u``, ``x``, ``X``, specifies the minimum
+        number of digits to be written. If the value to be written is shorter
+        than this number, the result is padded with leading zeros. The value is
+        not truncated even if the result is longer.
+
+        - A precision of ``0`` means that no character is written for the value
+          ``0``.
+
+      - For ``a``, ``A``, ``e``, ``E``, ``f``, and ``F``, specifies the number
+        of digits to be printed after the decimal point. By default, this is
+        ``6``.
+
+      - For ``g`` and ``G``, specifies the maximum number of significant digits
+        to be printed.
+
+      - For ``s``, specifies the maximum number of characters to be printed. By
+        default all characters are printed until the ending null character is
+        encountered.
+
+      - If the period is specified without an explicit value for precision,
+        ``0`` is assumed.
+   - ``.*``: The precision is not specified in the format string, but as an
+     additional integer value argument preceding the argument that has to be
+     formatted.
+- Length (Optional)
+   - ``hh``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
+     to convey the argument will be a ``signed char`` or ``unsigned char``.
+     However, this is largely ignored in the implementation due to it not being
+     necessary for Python or argument decoding (since the argument is always
+     encoded at least as a 32-bit integer).
+   - ``h``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
+     to convey the argument will be a ``signed short int`` or
+     ``unsigned short int``. However, this is largely ignored in the
+     implementation due to it not being necessary for Python or argument
+     decoding (since the argument is always encoded at least as a 32-bit
+     integer).
+   - ``l``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
+     to convey the argument will be a ``signed long int`` or
+     ``unsigned long int``. Also is usable with ``c`` and ``s`` to specify that
+     the arguments will be encoded with ``wchar_t`` values (which isn't
+     different from normal ``char`` values). However, this is largely ignored in
+     the implementation due to it not being necessary for Python or argument
+     decoding (since the argument is always encoded at least as a 32-bit
+     integer).
+   - ``ll``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
+     to convey the argument will be a ``signed long long int`` or
+     ``unsigned long long int``. This is required to properly decode the
+     argument as a 64-bit integer.
+   - ``L``: Usable with ``a``, ``A``, ``e``, ``E``, ``f``, ``F``, ``g``, or
+     ``G`` conversion specifiers applies to a long double argument. However,
+     this is ignored in the implementation due to floating point value encoded
+     that is unaffected by bit width.
+   - ``j``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
+     to convey the argument will be a ``intmax_t`` or ``uintmax_t``.
+   - ``z``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
+     to convey the argument will be a ``size_t``. This will force the argument
+     to be decoded as an unsigned integer.
+   - ``t``: Usable with ``d``, ``i``, ``o``, ``u``, ``x``, or ``X`` specifiers
+     to convey the argument will be a ``ptrdiff_t``.
+   - If a length modifier is provided for an incorrect specifier, it is ignored.
+- Specifier (Required)
+   - ``d`` / ``i``: Used for signed decimal integers.
+
+   - ``u``: Used for unsigned decimal integers.
+
+   - ``o``: Used for unsigned decimal integers and specifies formatting should
+     be as an octal number.
+
+   - ``x``: Used for unsigned decimal integers and specifies formatting should
+     be as a hexadecimal number using all lowercase letters.
+
+   - ``X``: Used for unsigned decimal integers and specifies formatting should
+     be as a hexadecimal number using all uppercase letters.
+
+   - ``f``: Used for floating-point values and specifies to use lowercase,
+     decimal floating point formatting.
+
+     - Default precision is ``6`` decimal places unless explicitly specified.
+
+   - ``F``: Used for floating-point values and specifies to use uppercase,
+     decimal floating point formatting.
+
+     - Default precision is ``6`` decimal places unless explicitly specified.
+
+   - ``e``: Used for floating-point values and specifies to use lowercase,
+     exponential (scientific) formatting.
+
+     - Default precision is ``6`` decimal places unless explicitly specified.
+
+   - ``E``: Used for floating-point values and specifies to use uppercase,
+     exponential (scientific) formatting.
+
+     - Default precision is ``6`` decimal places unless explicitly specified.
+
+   - ``g``: Used for floating-point values and specified to use ``f`` or ``e``
+     formatting depending on which would be the shortest representation.
+
+     - Precision specifies the number of significant digits, not just digits
+       after the decimal place.
+
+     - If the precision is specified as ``0``, it is interpreted to mean ``1``.
+
+     - ``e`` formatting is used if the the exponent would be less than ``-4`` or
+       is greater than or equal to the precision.
+
+     - Trailing zeros are removed unless the ``#`` flag is set.
+
+     - A decimal point only appears if it is followed by a digit.
+
+     - ``NaN`` or infinities always follow ``f`` formatting.
+
+   - ``G``: Used for floating-point values and specified to use ``f`` or ``e``
+     formatting depending on which would be the shortest representation.
+
+     - Precision specifies the number of significant digits, not just digits
+       after the decimal place.
+
+     - If the precision is specified as ``0``, it is interpreted to mean ``1``.
+
+     - ``E`` formatting is used if the the exponent would be less than ``-4`` or
+       is greater than or equal to the precision.
+
+     - Trailing zeros are removed unless the ``#`` flag is set.
+
+     - A decimal point only appears if it is followed by a digit.
+
+     - ``NaN`` or infinities always follow ``F`` formatting.
+
+   - ``c``: Used for formatting a ``char`` value.
+
+   - ``s``: Used for formatting a string of ``char`` values.
+
+     - If width is specified, the null terminator character is included as a
+       character for width count.
+
+     - If precision is specified, no more ``char``\s than that value will be
+       written from the string (padding is used to fill additional width).
+
+   - ``p``: Used for formatting a pointer address.
+
+   - ``%``: Prints a single ``%``. Only valid as ``%%`` (supports no flags,
+     width, precision, or length modifiers).
+
+Underspecified details:
+
+- If both ``+`` and (space) flags appear, the (space) is ignored.
+- The ``+`` and (space) flags will error if used with ``c`` or ``s``.
+- The ``#`` flag will error if used with ``d``, ``i``, ``u``, ``c``, ``s``, or
+  ``p``.
+- The ``0`` flag will error if used with ``c``, ``s``, or ``p``.
+- Both ``+`` and (space) can work with the unsigned integer specifiers ``u``,
+  ``o``, ``x``, and ``X``.
+- If a length modifier is provided for an incorrect specifier, it is ignored.
+- The ``z`` length modifier will decode arugments as signed as long as ``d`` or
+  ``i`` is used.
+- ``p`` is implementation defined.
+
+  - For this implementation, it will print with a ``0x`` prefix and then the
+    pointer value was printed using ``%08X``.
+
+  - ``p`` supports the ``+``, ``-``, and (space) flags, but not the ``#`` or
+    ``0`` flags.
+
+  - None of the length modifiers are usable with ``p``.
+
+  - This implementation will try to adhere to user-specified width (assuming the
+    width provided is larger than the guaranteed minimum of ``10``).
+
+  - Specifying precision for ``p`` is considered an error.
+- Only ``%%`` is allowed with no other modifiers. Things like ``%+%`` will fail
+  to decode. Some C stdlib implementations support any modifiers being
+  present between ``%``, but ignore any for the output.
+- If a width is specified with the ``0`` flag for a negative value, the padded
+  ``0``\s will appear after the ``-`` symbol.
+- A precision of ``0`` for ``d``, ``i``, ``u``, ``o``, ``x``, or ``X`` means
+  that no character is written for the value ``0``.
+- Precision cannot be specified for ``c``.
+- Using ``*`` or fixed precision with the ``s`` specifier still requires the
+  string argument to be null-terminated. This is due to argument encoding
+  happening on the C/C++-side while the precision value is not read or
+  otherwise used until decoding happens in this Python code.
+
+Non-conformant details:
+
+- ``n`` specifier: We do not support the ``n`` specifier since it is impossible
+  for us to retroactively tell the original program how many characters have
+  been printed since this decoding happens a great deal of time after the
+  device sent it, usually on a separate processing device entirely.