diff options
Diffstat (limited to 'docs/extensions/api.md')
-rw-r--r-- | docs/extensions/api.md | 886 |
1 files changed, 886 insertions, 0 deletions
diff --git a/docs/extensions/api.md b/docs/extensions/api.md new file mode 100644 index 0000000..d7eb689 --- /dev/null +++ b/docs/extensions/api.md @@ -0,0 +1,886 @@ +title: Extensions API + +# Writing Extensions for Python-Markdown + +Python-Markdown includes an API for extension writers to plug their own custom functionality and syntax into the +parser. An extension will patch into one or more stages of the parser: + +* [*Preprocessors*](#preprocessors) alter the source before it is passed to the parser. +* [*Block Processors*](#blockprocessors) work with blocks of text separated by blank lines. +* [*Tree Processors*](#treeprocessors) modify the constructed ElementTree +* [*Inline Processors*](#inlineprocessors) are common tree processors for inline elements, such as `*strong*`. +* [*Postprocessors*](#postprocessors) munge of the output of the parser just before it is returned. + +The parser loads text, applies the preprocessors, creates and builds an [ElementTree][ElementTree] object from the +block processors and inline processors, renders the ElementTree object as Unicode text, and then then applies the +postprocessors. + +There are classes and helpers provided to ease writing your extension. Each part of the API is discussed in its +respective section below. Additionally, you can walk through the [Tutorial on Writing Extensions][tutorial]; look at +some of the [Available Extensions][] and their [source code][extension source]. As always, you may report bugs, ask +for help, and discuss various other issues on the [bug tracker]. + +## Phases of processing {: #stages } + +### Preprocessors {: #preprocessors } + +Preprocessors munge the source text before it is passed to the Markdown parser. This is an excellent place to clean up +bad characters or to extract portions for later processing that the parser may otherwise choke on. + +Preprocessors inherit from `markdown.preprocessors.Preprocessor` and implement a `run` method, which takes a single +parameter `lines`. This parameter is the entire source text stored as a list of Unicode strings, one per line. `run` +should return its processed list of Unicode strings, one per line. + +#### Example + +This simple example removes any lines with 'NO RENDER' before processing: + +```python +from markdown.preprocessors import Preprocessor +import re + +class NoRender(Preprocessor): + """ Skip any line with words 'NO RENDER' in it. """ + def run(self, lines): + new_lines = [] + for line in lines: + m = re.search("NO RENDER", line) + if not m: + # any line without NO RENDER is passed through + new_lines.append(line) + return new_lines +``` + +#### Usages + +Some preprocessors in the Markdown source tree include: + +| Class | Kind | Description | +| ------------------------------|-----------|------------------------------------------------- | +| [`NormalizeWhiteSpace`][c1] | built-in | Normalizes whitespace by expanding tabs, fixing `\r` line endings, etc. | +| [`HtmlBlockPreprocessor`][c2] | built-in | Removes html blocks from the text and stores them for later processing | +| [`ReferencePreprocessor`][c3] | built-in | Removes reference definitions from text and stores for later processing | +| [`MetaPreprocessor`][c4] | extension | Strips and records meta data at top of documents | +| [`FootnotesPreprocessor`][c5] | extension | Removes footnote blocks from the text and stores them for later processing | + +[c1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/preprocessors.py +[c2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/preprocessors.py +[c3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/preprocessors.py +[c4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/meta.py +[c5]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py + +### Block Processors {: #blockprocessors } + +A block processor parses blocks of text and adds new elements to the `ElementTree`. Blocks of text, separated from +other text by blank lines, may have a different syntax and produce a differently structured tree than other Markdown. +Block processors excel at code formatting, equation layouts, and tables. + +Block processors inherit from `markdown.blockprocessors.BlockProcessor`, are passed `md.parser` on initialization, and +implement both the `test` and `run` methods: + +* `test(self, parent, block)` takes two parameters: `parent` is the parent `ElementTree` element and `block` is a + single, multi-line, Unicode string of the current block. `test`, often a regular expression match, returns a true + value if the block processor's `run` method should be called to process starting at that block. +* `run(self, parent, blocks)` has the same `parent` parameter as `test`; and `blocks` is the list of all remaining + blocks in the document, starting with the `block` passed to `test`. `run` may return `False` (not `None`) to signal + failure, meaning that it did not process the blocks after all. On success, `run` is expected to `pop` one or more + blocks from the front of `blocks` and attach new nodes to `parent`. + +Crafting block processors is more involved and flexible than the other processors, involving controlling recursive +parsing of the block's contents and managing state across invocations. For example, a blank line is allowed in +indented code, so the second invocation of the inline code processor appends to the element tree generated by the +previous call. Other block processors may insert new text into the `blocks` list, signal to future calls of itself, +and more. + +To make writing these complex beasts more tractable, three convenience functions have been provided by the +`BlockProcessor` parent class: + +* `lastChild(parent)` returns the last child of the given element or `None` if it has no children. +* `detab(text)` removes one level of indent (four spaces by default) from the front of each line of the given + multi-line, text string, until a non-blank line is indented less. +* `looseDetab(text, level)` removes multiple levels + of indent from the front of each line of `text` but does not affect lines indented less. + +Also, `BlockProcessor` provides the fields `self.tab_length`, the tab length (default 4), and `self.parser`, the +current `BlockParser` instance. + +#### BlockParser + +`BlockParser`, not to be confused with `BlockProcessor`, is the class used by Markdown to cycle through all the +registered block processors. You should never need to create your own instance; use `self.parser` instead. + +The `BlockParser` instance provides a stack of strings for its current state, which your processor can push with +`self.parser.set(state)`, pop with `self.parser.reset()`, or check the the top state with +`self.parser.isstate(state)`. Be sure your code pops the states it pushes. + +The `BlockParser` instance can also be called recursively, that is, to process blocks from within your block +processor. There are three methods: + +* `parseDocument(lines)` parses a list of lines, each a single-line Unicode string, returning a complete + `ElementTree`. +* `parseChunk(parent, text)` parses a single, multi-line, possibly multi-block, Unicode string `text` and attaches the + resulting tree to `parent`. +* `parseBlocks(parent, blocks)` takes a list of `blocks`, each a multi-line Unicode string without blank lines, and + attaches the resulting tree to `parent`. + +For perspective, Markdown calls `parseDocument` which calls `parseChunk` which calls `parseBlocks` which calls your +block processor, which, in turn, might call one of these routines. + +#### Example + +This example calls out important paragraphs by giving them a border. It looks for a fence line of exclamation points +before and after and renders the fenced blocks into a new, styled `div`. If it does not find the ending fence line, +it does nothing. + +Our code, like most block processors, is longer than other examples: + +```python +def test_block_processor(): + class BoxBlockProcessor(BlockProcessor): + RE_FENCE_START = r'^ *!{3,} *\n' # start line, e.g., ` !!!! ` + RE_FENCE_END = r'\n *!{3,}\s*$' # last non-blank line, e.g, '!!!\n \n\n' + + def test(self, parent, block): + return re.match(self.RE_FENCE_START, block) + + def run(self, parent, blocks): + original_block = blocks[0] + blocks[0] = re.sub(self.RE_FENCE_START, '', blocks[0]) + + # Find block with ending fence + for block_num, block in enumerate(blocks): + if re.search(self.RE_FENCE_END, block): + # remove fence + blocks[block_num] = re.sub(self.RE_FENCE_END, '', block) + # render fenced area inside a new div + e = etree.SubElement(parent, 'div') + e.set('style', 'display: inline-block; border: 1px solid red;') + self.parser.parseBlocks(e, blocks[0:block_num + 1]) + # remove used blocks + for i in range(0, block_num + 1): + blocks.pop(0) + return True # or could have had no return statement + # No closing marker! Restore and do nothing + blocks[0] = original_block + return False # equivalent to our test() routine returning False + + class BoxExtension(Extension): + def extendMarkdown(self, md): + md.parser.blockprocessors.register(BoxBlockProcessor(md.parser), 'box', 175) +``` + +Start with this example input: + +``` text +A regular paragraph of text. + +!!!!! +First paragraph of wrapped text. + +Second Paragraph of **wrapped** text. +!!!!! + +Another regular paragraph of text. +``` + +The fenced text adds one node with two children to the tree: + +* `div`, with a `style` attribute. It renders as + `<div style="display: inline-block; border: 1px solid red;">...</div>` + * `p` with text `First paragraph of wrapped text.` + * `p` with text `Second Paragraph of **wrapped** text`. The conversion to a `<strong>` tag will happen when + running the inline processors, which will happen after all of the block processors have completed. + +The example output might display as follows: + +!!! note "" + <p>A regular paragraph of text.</p> + <div style="display: inline-block; border: 1px solid red;"> + <p>First paragraph of wrapped text.</p> + <p>Second Paragraph of **wrapped** text.</p> + </div> + <p>Another regular paragraph of text.</p> + +#### Usages + +Some block processors in the Markdown source tree include: + +| Class | Kind | Description | +| ----------------------------|-----------|---------------------------------------------| +| [`HashHeaderProcessor`][b1] | built-in | Title hashes (`#`), which may split blocks | +| [`HRProcessor`][b2] | built-in | Horizontal lines, e.g., `---` | +| [`OListProcessor`][b3] | built-in | Ordered lists; complex and using `state` | +| [`Admonition`][b4] | extension | Render each [Admonition][] in a new `div` | + +[b1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/blockprocessors.py +[b2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/blockprocessors.py +[b3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/blockprocessors.py +[Admonition]: https://python-markdown.github.io/extensions/admonition/ +[b4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/admonition.py + +### Tree processors {: #treeprocessors } + +Tree processors manipulate the tree created by block processors. They can even create an entirely new ElementTree +object. This is an excellent place for creating summaries, adding collected references, or last minute adjustments. + +A tree processor must inherit from `markdown.treeprocessors.Treeprocessor` (note the capitalization). A tree processor +must implement a `run` method which takes a single argument `root`. In most cases `root` would be an +`xml.etree.ElementTree.Element` instance; however, in rare cases it could be some other type of ElementTree object. +The `run` method may return `None`, in which case the (possibly modified) original `root` object is used, or it may +return an entirely new `Element` object, which will replace the existing `root` object and all of its children. It is +generally preferred to modify `root` in place and return `None`, which avoids creating multiple copies of the entire +document tree in memory. + +For specifics on manipulating the ElementTree, see [Working with the ElementTree][workingwithetree] below. + +#### Example + +A pseudo example: + +```python +from markdown.treeprocessors import Treeprocessor + +class MyTreeprocessor(Treeprocessor): + def run(self, root): + root.text = 'modified content' + # No return statement is same as `return None` +``` + +#### Usages + +The core `InlineProcessor` class is a tree processor. It walks the tree, matches patterns, and splits and creates +nodes on matches. + +Additional tree processors in the Markdown source tree include: + +| Class | Kind | Description | +| ----------------------------------|-----------|---------------------------------------------------------------| +| [`PrettifyTreeprocessor`][e1] | built-in | Add line breaks to the html document | +| [`TocTreeprocessor`][e2] | extension | Builds a [table of contents][] from the finished tree | +| [`FootnoteTreeprocessor`][e3] | extension | Create [footnote][] div at end of document | +| [`FootnotePostTreeprocessor`][e4] | extension | Amend div created by `FootnoteTreeprocessor` with duplicates | + +[e1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/treeprocessors.py +[e2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/toc.py +[e3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py +[e4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py +[table of contents]: https://python-markdown.github.io/extensions/toc/ +[footnote]: https://python-markdown.github.io/extensions/footnotes/ + +### Inline Processors {: #inlineprocessors } + +Inline processors, previously called inline patterns, are used to add formatting, such as `**emphasis**`, by replacing +a matched pattern with a new element tree node. It is an excellent for adding new syntax for inline tags. Inline +processor code is often quite short. + +Inline processors inherit from `InlineProcessor`, are initialized, and implement `handleMatch`: + +* `__init__(self, pattern, md=None)` is the inherited constructor. You do not need to implement your own. + * `pattern` is the regular expression string that must match the code block in order for the `handleMatch` method + to be called. + * `md`, an optional parameter, is a pointer to the instance of `markdown.Markdown` and is available as `self.md` + on the `InlineProcessor` instance. + +* `handleMatch(self, m, data)` must be implemented in all `InlineProcessor` subclasses. + * `m` is the regular expression [match object][] found by the `pattern` passed to `__init__`. + * `data` is a single, multi-line, Unicode string containing the entire block of text around the pattern. A block + is text set apart by blank lines. + * Returns either `(None, None, None)`, indicating the provided match was rejected or `(el, start, end)`, if the + match was successfully processed. On success, `el` is the element being added the tree, `start` and `end` are + indexes in `data` that were "consumed" by the pattern. The "consumed" span will be replaced by a placeholder. + The same inline processor may be called several times on the same block. + +Inline Processors can define the property `ANCESTOR_EXCLUDES` which is either a list or tuple of undesirable ancestors. +The processor will be skipped if it would cause the content to be a descendant of one of the listed tag names. + +##### Convenience Classes + +Convenience subclasses of `InlineProcessor` are provide for common operations: + +* [`SimpleTextInlineProcessor`][i1] returns the text of `group(1)` of the match. +* [`SubstituteTagInlineProcessor`][i4] is initialized as `SubstituteTagInlineProcessor(pattern, tag)`. It returns a + new element `tag` whenever `pattern` is matched. +* [`SimpleTagInlineProcessor`][i3] is initialized as `SimpleTagInlineProcessor(pattern, tag)`. It returns an element + `tag` with a text field of `group(2)` of the match. + +##### Example + +This example changes `--strike--` to `<del>strike</del>`. + +```python +from markdown.inlinepatterns import InlineProcessor +from markdown.extensions import Extension +import xml.etree.ElementTree as etree + + +class DelInlineProcessor(InlineProcessor): + def handleMatch(self, m, data): + el = etree.Element('del') + el.text = m.group(1) + return el, m.start(0), m.end(0) + +class DelExtension(Extension): + def extendMarkdown(self, md): + DEL_PATTERN = r'--(.*?)--' # like --del-- + md.inlinePatterns.register(DelInlineProcessor(DEL_PATTERN, md), 'del', 175) +``` + +Use this input example: + +``` text +First line of the block. +This is --strike one--. +This is --strike two--. +End of the block. +``` + +The example output might display as follows: + +!!! note "" + <p>First line of the block. + This is <del>strike one</del>. + This is <del>strike two</del>. + End of the block.</p> + +* On the first call to `handleMatch` + * `m` will be the match for `--strike one--` + * `data` will be the string: + `First line of the block.\nThis is --strike one--.\nThis is --strike two--.\nEnd of the block.` + + Because the match was successful, the region between the returned `start` and `end` are replaced with a + placeholder token and the new element is added to the tree. + +* On the second call to `handleMatch` + * `m` will be the match for `--strike two--` + * `data` will be the string + `First line of the block.\nThis is klzzwxh:0000.\nThis is --strike two--.\nEnd of the block.` + +Note the placeholder token `klzzwxh:0000`. This allows the regular expression to be run against the entire block, +not just the the text contained in an individual element. The placeholders will later be swapped back out for the +actual elements by the parser. + +Actually it would not be necessary to create the above inline processor. The fact is, that example is not very DRY +(Don't Repeat Yourself). A pattern for `**strong**` text would be almost identical, with the exception that it would +create a `strong` element. Therefore, Markdown provides a number of generic `InlineProcessor` subclasses that can +provide some common functionality. For example, strike could be implemented with an instance of the +`SimpleTagInlineProcessor` class as demonstrated below. Feel free to use or extend any of the `InlineProcessor` +subclasses found at `markdown.inlinepatterns`. + +```python +from markdown.inlinepatterns import SimpleTagInlineProcessor +from markdown.extensions import Extension + +class DelExtension(Extension): + def extendMarkdown(self, md): + md.inlinePatterns.register(SimpleTagInlineProcessor(r'()--(.*?)--', 'del'), 'del', 175) +``` + + +##### Usages + +Here are some convenience functions and other examples: + +| Class | Kind | Description | +| ---------------------------------|-----------|---------------------------------------------------------------| +| [`AsteriskProcessor`][i5] | built-in | Emphasis processor for handling strong and em matches inside asterisks | +| [`AbbrInlineProcessor`][i6] | extension | Apply tag to abbreviation registered by preprocessor | +| [`WikiLinksInlineProcessor`][i7] | extension | Link `[[article names]]` to wiki given in metadata | +| [`FootnoteInlineProcessor`][i8] | extension | Replaces footnote in text with link to footnote div at bottom | + +[i1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py +[i2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py +[i3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py +[i4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py +[i5]: https://github.com/Python-Markdown/markdown/blob/master/markdown/inlinepatterns.py +[i6]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/abbr.py +[i7]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/wikilinks.py +[i8]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py + +### Patterns + +In version 3.0, a new, more flexible inline processor was added, `markdown.inlinepatterns.InlineProcessor`. The +original inline patterns, which inherit from `markdown.inlinepatterns.Pattern` or one of its children are still +supported, though users are encouraged to migrate. + +#### Comparison with new `InlineProcessor` + +The new `InlineProcessor` provides two major enhancements to `Patterns`: + +1. Inline Processors no longer need to match the entire block, so regular expressions no longer need to start with + `r'^(.*?)'` and end with `r'(.*?)%'`. This runs faster. The returned [match object][] will only contain what is + explicitly matched in the pattern, and extension pattern groups now start with `m.group(1)`. + +2. The `handleMatch` method now takes an additional input called `data`, which is the entire block under analysis, + not just what is matched with the specified pattern. The method now returns the element *and* the indexes relative + to `data` that the return element is replacing (usually `m.start(0)` and `m.end(0)`). If the boundaries are + returned as `None`, it is assumed that the match did not take place, and nothing will be altered in `data`. + + This allows handling of more complex constructs than regular expressions can handle, e.g., matching nested + brackets, and explicit control of the span "consumed" by the processor. + +#### Inline Patterns + +Inline Patterns can implement inline HTML element syntax for Markdown such as `*emphasis*` or +`[links](http://example.com)`. Pattern objects should be instances of classes that inherit from +`markdown.inlinepatterns.Pattern` or one of its children. Each pattern object uses a single regular expression and +must have the following methods: + +* **`getCompiledRegExp()`**: + + Returns a compiled regular expression. + +* **`handleMatch(m)`**: + + Accepts a match object and returns an ElementTree element of a plain Unicode string. + +Inline Patterns can define the property `ANCESTOR_EXCLUDES` with is either a list or tuple of undesirable ancestors. +The pattern will be skipped if it would cause the content to be a descendant of one of the listed tag names. + +Note that any regular expression returned by `getCompiledRegExp` must capture the whole block. Therefore, they should +all start with `r'^(.*?)'` and end with `r'(.*?)!'`. When using the default `getCompiledRegExp()` method provided in +the `Pattern` you can pass in a regular expression without that and `getCompiledRegExp` will wrap your expression for +you and set the `re.DOTALL` and `re.UNICODE` flags. This means that the first group of your match will be `m.group(2)` +as `m.group(1)` will match everything before the pattern. + +For an example, consider this simplified emphasis pattern: + +```python +from markdown.inlinepatterns import Pattern +import xml.etree.ElementTree as etree + +class EmphasisPattern(Pattern): + def handleMatch(self, m): + el = etree.Element('em') + el.text = m.group(2) + return el +``` + +As discussed in [Integrating Your Code Into Markdown][], an instance of this class will need to be provided to +Markdown. That instance would be created like so: + +```python +# an oversimplified regex +MYPATTERN = r'\*([^*]+)\*' +# pass in pattern and create instance +emphasis = EmphasisPattern(MYPATTERN) +``` + +### Postprocessors {: #postprocessors } + +Postprocessors munge the document after the ElementTree has been serialized into a string. Postprocessors should be +used to work with the text just before output. Usually, they are used add back sections that were extracted in a +preprocessor, fix up outgoing encodings, or wrap the whole document. + +Postprocessors inherit from `markdown.postprocessors.Postprocessor` and implement a `run` method which takes a single +parameter `text`, the entire HTML document as a single Unicode string. `run` should return a single Unicode string +ready for output. Note that preprocessors use a list of lines while postprocessors use a single multi-line string. + +#### Example + +Here is a simple example that changes the output to one big page showing the raw html. + +```python +from markdown.postprocessors import Postprocessor +import re + +class ShowActualHtmlPostprocesor(Postprocessor): + """ Wrap entire output in <pre> tags as a diagnostic. """ + def run(self, text): + return '<pre>\n' + re.sub('<', '<', text) + '</pre>\n' +``` + +#### Usages + +Some postprocessors in the Markdown source tree include: + +| Class | Kind | Description | +| ------------------------------|-----------|----------------------------------------------------| +| [`raw_html`][p1] | built-in | Restore raw html from `htmlStash`, stored by `HTMLBlockPreprocessor`, and code highlighters | +| [`amp_substitute`][p2] | built-in | Convert ampersand substitutes to `&`; used in links | +| [`unescape`][p3] | built-in | Convert some escaped characters back from integers; used in links | +| [`FootnotePostProcessor`][p4] | extension | Replace footnote placeholders with html entities; as set by other stages | + + [p1]: https://github.com/Python-Markdown/markdown/blob/master/markdown/postprocessors.py + [p2]: https://github.com/Python-Markdown/markdown/blob/master/markdown/postprocessors.py + [p3]: https://github.com/Python-Markdown/markdown/blob/master/markdown/postprocessors.py + [p4]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py + + +## Working with the ElementTree {: #working_with_et } + +As mentioned, the Markdown parser converts a source document to an [ElementTree][ElementTree] object before +serializing that back to Unicode text. Markdown has provided some helpers to ease that manipulation within the context +of the Markdown module. + +First, import the ElementTree module: + +```python +import xml.etree.ElementTree as etree +``` +Sometimes you may want text inserted into an element to be parsed by [Inline Patterns][]. In such a situation, simply +insert the text as you normally would and the text will be automatically run through the Inline Patterns. However, if +you do *not* want some text to be parsed by Inline Patterns, then insert the text as an `AtomicString`. + +```python +from markdown.util import AtomicString +some_element.text = AtomicString(some_text) +``` + +Here's a basic example which creates an HTML table (note that the contents of the second cell (`td2`) will be run +through Inline Patterns latter): + +```python +table = etree.Element("table") +table.set("cellpadding", "2") # Set cellpadding to 2 +tr = etree.SubElement(table, "tr") # Add child tr to table +td1 = etree.SubElement(tr, "td") # Add child td1 to tr +td1.text = markdown.util.AtomicString("Cell content") # Add plain text content +td2 = etree.SubElement(tr, "td") # Add second td to tr +td2.text = "*text* with **inline** formatting." # Add markup text +table.tail = "Text after table" # Add text after table +``` + +You can also manipulate an existing tree. Consider the following example which adds a `class` attribute to `<a>` +elements: + +```python +def set_link_class(self, element): + for child in element: + if child.tag == "a": + child.set("class", "myclass") #set the class attribute + set_link_class(child) # run recursively on children +``` + +For more information about working with ElementTree see the [ElementTree +Documentation][ElementTree]. + +## Working with Raw HTML {: #working_with_raw_html } + +Occasionally an extension may need to call out to a third party library which returns a pre-made string +of raw HTML that needs to be inserted into the document unmodified. Raw strings can be stashed for later +retrieval using an `htmlStash` instance, rather than converting them into `ElementTree` objects. A raw string +(which may or may not be raw HTML) passed to `self.md.htmlStash.store()` will be saved to the stash and a +placeholder string will be returned which should be inserted into the tree instead. After the tree is +serialized, a postprocessor will replace the placeholder with the raw string. This prevents subsequent +processing steps from modifying the HTML data. For example, + +```python +html = "<p>This is some <em>raw</em> HTML data</p>" +el = etree.Element("div") +el.text = self.md.htmlStash.store(html) +``` + +For the global `htmlStash` instance to be available from a processor, the `markdown.Markdown` instance must +be passed to the processor from [extendMarkdown](#extendmarkdown) and will be available as `self.md.htmlStash`. + +## Integrating Your Code Into Markdown {: #integrating_into_markdown } + +Once you have the various pieces of your extension built, you need to tell Markdown about them and ensure that they +are run in the proper sequence. Markdown accepts an `Extension` instance for each extension. Therefore, you will need +to define a class that extends `markdown.extensions.Extension` and over-rides the `extendMarkdown` method. Within this +class you will manage configuration options for your extension and attach the various processors and patterns to the +Markdown instance. + +It is important to note that the order of the various processors and patterns matters. For example, if we replace +`http://...` links with `<a>` elements, and *then* try to deal with inline HTML, we will end up with a mess. +Therefore, the various types of processors and patterns are stored within an instance of the `markdown.Markdown` class +in a [Registry][]. Your `Extension` class will need to manipulate those registries appropriately. You may `register` +instances of your processors and patterns with an appropriate priority, `deregister` built-in instances, or replace a +built-in instance with your own. + +### `extendMarkdown` {: #extendmarkdown } + +The `extendMarkdown` method of a `markdown.extensions.Extension` class accepts one argument: + +* **`md`**: + + A pointer to the instance of the `markdown.Markdown` class. You should use this to access the + [Registries][Registry] of processors and patterns. They are found under the following attributes: + + * `md.preprocessors` + * `md.inlinePatterns` + * `md.parser.blockprocessors` + * `md.treeprocessors` + * `md.postprocessors` + + Some other things you may want to access on the `markdown.Markdown` instance are: + + * `md.htmlStash` + * `md.output_formats` + * `md.set_output_format()` + * `md.output_format` + * `md.serializer` + * `md.registerExtension()` + * `md.tab_length` + * `md.block_level_elements` + * `md.isBlockLevel()` + +!!! Warning + With access to the above items, theoretically you have the option to change anything through various + [monkey_patching][] techniques. However, you should be aware that the various undocumented parts of Markdown may + change without notice and your monkey_patches may break with a new release. Therefore, what you really should be + doing is inserting processors and patterns into the Markdown pipeline. Consider yourself warned! + +[monkey_patching]: https://en.wikipedia.org/wiki/Monkey_patch + +A simple example: + +```python +from markdown.extensions import Extension + +class MyExtension(Extension): + def extendMarkdown(self, md): + # Register instance of 'mypattern' with a priority of 175 + md.inlinePatterns.register(MyPattern(md), 'mypattern', 175) +``` + +### registerExtension {: #registerextension } + +Some extensions may need to have their state reset between multiple runs of the `markdown.Markdown` class. For +example, consider the following use of the [Footnotes][] extension: + +```python +md = markdown.Markdown(extensions=['footnotes']) +html1 = md.convert(text_with_footnote) +md.reset() +html2 = md.convert(text_without_footnote) +``` + +Without calling `reset`, the footnote definitions from the first document will be inserted into the second document as +they are still stored within the class instance. Therefore the `Extension` class needs to define a `reset` method that +will reset the state of the extension (i.e.: `self.footnotes = {}`). However, as many extensions do not have a need +for `reset`, `reset` is only called on extensions that are registered. + +To register an extension, call `md.registerExtension` from within your `extendMarkdown` method: + +```python +def extendMarkdown(self, md): + md.registerExtension(self) + # insert processors and patterns here +``` + +Then, each time `reset` is called on the `markdown.Markdown` instance, the `reset` method of each registered extension +will be called as well. You should also note that `reset` will be called on each registered extension after it is +initialized the first time. Keep that in mind when over-riding the extension's `reset` method. + +### Configuration Settings {: #configsettings } + +If an extension uses any parameters that the user may want to change, those parameters should be stored in +`self.config` of your `markdown.extensions.Extension` class in the following format: + +```python +class MyExtension(markdown.extensions.Extension): + def __init__(self, **kwargs): + self.config = { + 'option1' : ['value1', 'description1'], + 'option2' : ['value2', 'description2'] + } + super(MyExtension, self).__init__(**kwargs) +``` + +When implemented this way the configuration parameters can be over-ridden at run time (thus the call to `super`). For +example: + +```python +markdown.Markdown(extensions=[MyExtension(option1='other value')]) +``` + +Note that if a keyword is passed in that is not already defined in `self.config`, then a `KeyError` is raised. + +The `markdown.extensions.Extension` class and its subclasses have the following methods available to assist in working +with configuration settings: + +* **`getConfig(key [, default])`**: + + Returns the stored value for the given `key` or `default` if the `key` does not exist. If not set, `default` + returns an empty string. + +* **`getConfigs()`**: + + Returns a dict of all key/value pairs. + +* **`getConfigInfo()`**: + + Returns all configuration descriptions as a list of tuples. + +* **`setConfig(key, value)`**: + + Sets a configuration setting for `key` with the given `value`. If `key` is unknown, a `KeyError` is raised. If the + previous value of `key` was a Boolean value, then `value` is converted to a Boolean value. If the previous value + of `key` is `None`, then `value` is converted to a Boolean value except when it is `None`. No conversion takes + place when the previous value of `key` is a string. + +* **`setConfigs(items)`**: + + Sets multiple configuration settings given a dict of key/value pairs. + +### Naming an Extension { #naming_an_extension } + +As noted in the [library reference] an instance of an extension can be passed directly to `markdown.Markdown`. In +fact, this is the preferred way to use third-party extensions. + +For example: + +```python +import markdown +from path.to.module import MyExtension +md = markdown.Markdown(extensions=[MyExtension(option='value')]) +``` + +However, Markdown also accepts "named" third party extensions for those occasions when it is impractical to import an +extension directly (from the command line or from within templates). A "name" can either be a registered [entry +point](#entry_point) or a string using Python's [dot notation](#dot_notation). + +#### Entry Point { #entry_point } + +[Entry points] are defined in a Python package's `setup.py` script. The script must use [setuptools] to support entry +points. Python-Markdown extensions must be assigned to the `markdown.extensions` group. An entry point definition +might look like this: + +```python +from setuptools import setup + +setup( + # ... + entry_points={ + 'markdown.extensions': ['myextension = path.to.module:MyExtension'] + } +) +``` + +After a user installs your extension using the above script, they could then call the extension using the +`myextension` string name like this: + +```python +markdown.markdown(text, extensions=['myextension']) +``` + +Note that if two or more entry points within the same group are assigned the same name, Python-Markdown will only ever +use the first one found and ignore all others. Therefore, be sure to give your extension a unique name. + +For more information on writing `setup.py` scripts, see the Python documentation on [Packaging and Distributing +Projects]. + +#### Dot Notation { #dot_notation } + +If an extension does not have a registered entry point, Python's dot notation may be used instead. The extension must +be installed as a Python module on your PYTHONPATH. Generally, a class should be specified in the name. The class must +be at the end of the name and be separated by a colon from the module. + +Therefore, if you were to import the class like this: + +```python +from path.to.module import MyExtension +``` + +Then the extension can be loaded as follows: + +```python +markdown.markdown(text, extensions=['path.to.module:MyExtension']) +``` + +You do not need to do anything special to support this feature. As long as your extension class is able to be +imported, a user can include it with the above syntax. + +The above two methods are especially useful if you need to implement a large number of extensions with more than one +residing in a module. However, if you do not want to require that your users include the class name in their string, +you must define only one extension per module and that module must contain a module-level function called +`makeExtension` that accepts `**kwargs` and returns an extension instance. + +For example: + +```python +class MyExtension(markdown.extensions.Extension) + # Define extension here... + +def makeExtension(**kwargs): + return MyExtension(**kwargs) +``` + +When `markdown.Markdown` is passed the "name" of your extension as a dot notation string that does not include a class +(for example `path.to.module`), it will import the module and call the `makeExtension` function to initiate your +extension. + +## Registries + +The `markdown.util.Registry` class is a priority sorted registry which Markdown uses internally to determine the +processing order of its various processors and patterns. + +A `Registry` instance provides two public methods to alter the data of the registry: `register` and `deregister`. Use +`register` to add items and `deregister` to remove items. See each method for specifics. + +When registering an item, a "name" and a "priority" must be provided. All items are automatically sorted by the value +of the "priority" parameter such that the item with the highest value will be processed first. The "name" is used to +remove (`deregister`) and get items. + +A `Registry` instance is like a list (which maintains order) when reading data. You may iterate over the items, get an +item and get a count (length) of all items. You may also check that the registry contains an item. + +When getting an item you may use either the index of the item or the string-based "name". For example: + +```python +registry = Registry() +registry.register(SomeItem(), 'itemname', 20) +# Get the item by index +item = registry[0] +# Get the item by name +item = registry['itemname'] +``` + +When checking that the registry contains an item, you may use either the string-based "name", or a reference to the +actual item. For example: + +```python +someitem = SomeItem() +registry.register(someitem, 'itemname', 20) +# Contains the name +assert 'itemname' in registry +# Contains the item instance +assert someitem in registry +``` + +`markdown.util.Registry` has the following methods: + +### `Registry.register(self, item, name, priority)` {: #registry.register data-toc-label='Registry.register'} + +: Add an item to the registry with the given name and priority. + + Parameters: + + * `item`: The item being registered. + * `name`: A string used to reference the item. + * `priority`: An integer or float used to sort against all items. + + If an item is registered with a "name" which already exists, the existing item is replaced with the new item. + Be careful as the old item is lost with no way to recover it. The new item will be sorted according to its + priority and will **not** retain the position of the old item. + +### `Registry.deregister(self, name, strict=True)` {: #registry.deregister data-toc-label='Registry.deregister'} + +: Remove an item from the registry. + + Set `strict=False` to fail silently. + +### `Registry.get_index_for_name(self, name)` {: #registry.get_index_for_name data-toc-label='Registry.get_index_for_name'} + +: Return the index of the given `name`. + +[match object]: https://docs.python.org/3/library/re.html#match-objects +[bug tracker]: https://github.com/Python-Markdown/markdown/issues +[extension source]: https://github.com/Python-Markdown/markdown/tree/master/markdown/extensions +[tutorial]: https://github.com/Python-Markdown/markdown/wiki/Tutorial-1---Writing-Extensions-for-Python-Markdown +[workingwithetree]: #working_with_et +[Integrating your code into Markdown]: #integrating_into_markdown +[extendMarkdown]: #extendmarkdown +[Registry]: #registry +[registerExtension]: #registerextension +[Config Settings]: #configsettings +[makeExtension]: #makeextension +[ElementTree]: https://docs.python.org/3/library/xml.etree.elementtree.html +[Available Extensions]: index.md +[Footnotes]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/footnotes.py +[Definition Lists]: https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/definition_lists +[library reference]: ../reference.md +[setuptools]: https://packaging.python.org/key_projects/#setuptools +[Entry points]: https://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins +[Packaging and Distributing Projects]: https://packaging.python.org/tutorials/distributing-packages/ |