aboutsummaryrefslogtreecommitdiff
path: root/pw_tokenizer/docs.rst
blob: 2b90827c18ba69fd29e39b17e86c83527db8d7d0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
.. _module-pw_tokenizer:

============
pw_tokenizer
============
.. pigweed-module::
   :name: pw_tokenizer
   :tagline: Compress strings to shrink logs by +75%
   :status: stable
   :languages: C11, C++14, Python, Rust, TypeScript
   :code-size-impact: 50% reduction in binary log size

Logging is critical, but developers are often forced to choose between
additional logging or saving crucial flash space. The ``pw_tokenizer`` module
enables **extensive logging with substantially less memory usage** by replacing
printf-style strings with binary tokens during compilation. It is designed to
integrate easily into existing logging systems.

Although the most common application of ``pw_tokenizer`` is binary logging,
**the tokenizer is general purpose and can be used to tokenize any strings**,
with or without printf-style arguments.

Why tokenize strings?

* **Dramatically reduce binary size** by removing string literals from binaries.
* **Reduce I/O traffic, RAM, and flash usage** by sending and storing compact tokens
  instead of strings. We've seen over 50% reduction in encoded log contents.
* **Reduce CPU usage** by replacing snprintf calls with simple tokenization code.
* **Remove potentially sensitive log, assert, and other strings** from binaries.

.. grid:: 1

   .. grid-item-card:: :octicon:`rocket` Get started
      :link: module-pw_tokenizer-get-started
      :link-type: ref
      :class-item: sales-pitch-cta-primary

      Integrate pw_tokenizer into your project.

.. grid:: 2

   .. grid-item-card:: :octicon:`code-square` Tokenization
      :link: module-pw_tokenizer-tokenization
      :link-type: ref
      :class-item: sales-pitch-cta-secondary

      Convert strings and arguments to tokens.

   .. grid-item-card:: :octicon:`code-square` Token databases
      :link: module-pw_tokenizer-token-databases
      :link-type: ref
      :class-item: sales-pitch-cta-secondary

      Store a mapping of tokens to the strings and arguments they represent.

.. grid:: 2

   .. grid-item-card:: :octicon:`code-square` Detokenization
      :link: module-pw_tokenizer-detokenization
      :link-type: ref
      :class-item: sales-pitch-cta-secondary

      Expand tokens back to the strings and arguments they represent.

   .. grid-item-card:: :octicon:`info` API reference
      :link: module-pw_tokenizer-api
      :link-type: ref
      :class-item: sales-pitch-cta-secondary

      Detailed reference information about the pw_tokenizer API.


.. _module-pw_tokenizer-tokenized-logging-example:

---------------------------
Tokenized logging in action
---------------------------
Here's an example of how ``pw_tokenizer`` enables you to store
and send the same logging information using significantly less
resources:

.. mermaid::

   flowchart TD

     subgraph after["After: Tokenized Logs (37 bytes saved!)"]
       after_log["LOG(#quot;Battery Voltage: %d mV#quot;, voltage)"] -- 4 bytes stored on-device as... -->
       after_encoding["d9 28 47 8e"] -- 6 bytes sent over the wire as... -->
       after_transmission["d9 28 47 8e aa 3e"] -- Displayed in logs as... -->
       after_display["#quot;Battery Voltage: 3989 mV#quot;"]
     end

     subgraph before["Before: No Tokenization"]
       before_log["LOG(#quot;Battery Voltage: %d mV#quot;, voltage)"] -- 41 bytes stored on-device as... -->
       before_encoding["#quot;Battery Voltage: %d mV#quot;"] -- 43 bytes sent over the wire as... -->
       before_transmission["#quot;Battery Voltage: 3989 mV#quot;"] -- Displayed in logs as... -->
       before_display["#quot;Battery Voltage: 3989 mV#quot;"]
     end

     style after stroke:#00c852,stroke-width:3px
     style before stroke:#ff5252,stroke-width:3px

A quick overview of how the tokenized version works:

* You tokenize ``"Battery Voltage: %d mV"`` with a macro like
  :c:macro:`PW_TOKENIZE_STRING`. You can use :ref:`module-pw_log_tokenized`
  to handle the tokenization automatically.
* After tokenization, ``"Battery Voltage: %d mV"`` becomes ``d9 28 47 8e``.
* The first 4 bytes sent over the wire is the tokenized version of
  ``"Battery Voltage: %d mV"``. The last 2 bytes are the value of ``voltage``
  converted to a varint using :ref:`module-pw_varint`.
* The logs are converted back to the original, human-readable message
  via the :ref:`Detokenization API <module-pw_tokenizer-detokenization>`
  and a :ref:`token database <module-pw_tokenizer-token-databases>`.

.. toctree::
   :hidden:
   :maxdepth: 1

   Get started <get_started>
   tokenization
   token_databases
   detokenization
   API reference <api>