Markdown: A Comprehensive Reference

Roland Russwurm ·

Origins, philosophy, AI adoption, syntax, and tool-by-tool support


Part 1: Origins and History

The Birth of Markdown (2004)

Markdown was created in 2004 by John Gruber, with significant input from Aaron Swartz. Gruber, a writer and software developer best known for the Daring Fireball blog, designed Markdown with one deliberate goal: to allow people to write using an easy-to-read, easy-to-write plain-text format that could be converted to structurally valid HTML.

The central design philosophy, in Gruber's own words, was that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it has been marked up with tags or formatting instructions. This is the insight that distinguishes Markdown from earlier markup languages — Markdown is not designed to make documents look a certain way, it is designed to make documents readable in their unrendered form.

What Came Before

To understand why Markdown was such a leap forward, it helps to remember what existed before it.

  • HTML (early 1990s): The native markup of the web, but verbose. Writing

    This is important.

    is tedious for prose, and the source is hard to read.
  • SGML / XML: Even more verbose, with stricter structural requirements. Aimed at machines as much as humans.
  • LaTeX (1984): Powerful for academic and scientific writing, but complex, with a steep learning curve oriented toward print typography.
  • reStructuredText (2002): Created for the Python documentation ecosystem. More structured than Markdown, but with stricter rules and a less friendly learning curve.
  • Textile (2002): A predecessor in spirit, used by Movable Type and other early blogging platforms.
  • Setext / "structured text": An older plaintext convention that inspired Markdown's underline-style headings.
  • Email plain-text conventions: Since the 1980s people had been using asterisks for emphasis, _underscores_ for italics, > for quotation, and --- for horizontal separators. Markdown formalized these long-standing conventions.

Gruber's contribution was less about inventing new syntax and more about codifying existing plaintext conventions into a coherent specification that mapped cleanly onto HTML.

The Aaron Swartz Connection

Aaron Swartz — the prodigious programmer and activist who later co-founded Reddit — collaborated with Gruber on the syntax. Swartz had been working on a project called atx, which influenced the heading syntax (# Heading). His fingerprints are on several design decisions, particularly around link syntax and the choice to make the format "forgiving."

Fragmentation and the CommonMark Effort

Markdown's original specification was famously imprecise. Gruber published a syntax description and a Perl reference implementation (Markdown.pl), but the specification left many edge cases undefined. By 2012, dozens of mutually incompatible Markdown processors existed, each interpreting tricky cases differently.

In 2014, Jeff Atwood, John MacFarlane, and others launched CommonMark — an effort to produce a strongly defined, highly compatible specification of Markdown. CommonMark is now the closest thing to a true standard, and it forms the basis of GitHub Flavored Markdown, GitLab Flavored Markdown, Reddit's renderer, Stack Overflow's renderer, and many others.

Gruber notably refused to bless CommonMark and objected to it using the "Markdown" name, leading to its rebranding from "Standard Markdown" to "CommonMark." This split — a creator who refuses to standardize, an ecosystem that did it anyway — is one of the more curious episodes in software history, and it's why "Markdown" today is best understood as a family of dialects rather than a single language.


Part 2: Why Markdown Dominates AI

The relationship between Markdown and large language models is unusually deep, and not accidental. Several converging factors explain why nearly every modern LLM produces Markdown by default.

1. Training Data Gravity

LLMs learn from text. The largest, highest-quality public text corpora — GitHub READMEs, Stack Overflow answers, Reddit posts, technical documentation, Jupyter notebooks, blog posts, much of arXiv-adjacent commentary — are saturated with Markdown. Training on these corpora teaches models that structured human knowledge looks like Markdown. When a model "wants" to convey structured information, Markdown is the path of least resistance.

2. Token Efficiency

Markdown is extraordinarily token-efficient compared to HTML or XML. Compare:

HTML: multiple tokens, redundant tag structure

html
<h2>Introduction</h2>

Markdown: typically two or three tokens

markdown
## Introduction

Across long documents the savings compound dramatically. For models priced and rate-limited per token, Markdown is simply cheaper to produce and consume — both for the model provider and the end user.

3. Streaming-Friendly

Markdown can be generated and rendered incrementally. As the model emits one token at a time, the output remains valid (or at worst gracefully degraded). HTML requires balanced opening and closing tags, so partial output is often invalid until completion. Markdown fails gracefully — a half-finished bullet list still looks like a bullet list.

4. Dual Readability

Markdown is readable as plaintext when rendering is unavailable (logs, terminals, debugging) and renders beautifully when a renderer is present. This duality is invaluable for AI systems that may be embedded in many environments — a CLI, a web chat, an email summary, a voice assistant's transcript.

5. Structural Information Without Visual Commitment

LLMs benefit from being able to convey structural information — what is a heading, what is code, what is a list — without committing to visual formatting. Markdown lets the model communicate "this is a heading" while delegating "what does a heading look like" to the renderer. The model and the UI agree on structure; the UI owns presentation.

6. The Tooling Feedback Loop

Every major LLM frontend — ChatGPT, Claude, Gemini, Copilot, Cursor, and countless others — renders Markdown by default. This created a feedback loop: models produce Markdown, frontends render it, users expect it, training data keeps reinforcing it, and the convention solidifies. Markdown became the de facto serialization format for human-AI communication.

7. Code Affinity

A huge fraction of LLM use is code-adjacent. Fenced code blocks with language hints (like ```python) are perfectly suited for separating prose from code, which is exactly what programmers were already doing on GitHub and Stack Overflow.


Part 3: Default Markdown Syntax

What follows is the syntax common to the original Gruber specification and (with minor refinements) CommonMark. This is the lingua franca that works essentially everywhere.

Paragraphs and Line Breaks

A paragraph is one or more consecutive lines of text separated from neighbors by a blank line.

This is the first paragraph.

This is the second paragraph.

A single newline inside a paragraph is treated as a space. To force a hard line break, end a line with two or more trailing spaces, or — in CommonMark — a backslash:

First line.  
Second line, forced break.

First line.\
Second line, also a forced break.

Headings

Two styles are supported.

ATX style (using #):

# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6

Setext style (using underlines, only for H1 and H2):

Heading 1
=========

Heading 2
---------

ATX is more common today because it scales to all six heading levels and is easier to type.

Emphasis

*italic* or _italic_
**bold** or __bold__
***bold italic*** or ___bold italic___

The asterisk variants are generally recommended because underscores have edge cases inside words (snake_case_words) that some older renderers handle inconsistently.

Lists

Unordered lists use -, *, or +:

- Apple
- Banana
- Cherry

Ordered lists use any number followed by a period. The actual numbering is determined by the renderer; only the first number influences the start value:

1. First
2. Second
3. Third
1. First
1. Second
1. Third

Both render identically as 1, 2, 3. This is intentional: it lets you reorder list items without renumbering.

Nested lists use indentation (typically two or four spaces, depending on dialect).

Links

Inline:

[link text](https://example.com)
[link with title](https://example.com "Hover title")

Reference style (useful when the same URL appears many times):

[link text][1]
[another][gruber]

[1]: https://example.com
[gruber]: https://daringfireball.net "Daring Fireball"

Automatic links:

<https://example.com>
<email@example.com>

Images

Same syntax as links, with a leading !:

![alt text](image.png)
![alt text](image.png "Title")

The alt text is read by screen readers and shown when the image fails to load — it is not optional in good practice.

Code

Inline code uses single backticks:

Use the `printf` function.

If your code contains backticks, use double or triple backticks as the fence, with a space buffer:

```
` use `backticks` here `
```

Code blocks in original Markdown use four-space indentation:

    function hello() {
        return "world";
    }

(Fenced code blocks with triple backticks are a near-universal extension — covered in Part 4.)

Blockquotes

> This is a quotation.
> It continues on this line.
>
> > Nested blockquotes work too.

A blockquote can contain any other Markdown elements — lists, code blocks, headings — by prefixing each line with >.

Horizontal Rules

Three or more hyphens, asterisks, or underscores on a line by themselves:

---
***
___

Inline HTML

Original Markdown allows raw HTML to pass through. CommonMark has more nuanced rules. In practice, most modern renderers either pass HTML through, strip it for security, or sanitize it via a whitelist (this is what GitHub and Stack Overflow do).

Escaping Special Characters

Use a backslash to escape Markdown's special characters:

\*not italic\*

The full escapable set: ` ` * _ { } [ ] ( ) # + - . !`


Part 4: Extensions Beyond the Core

Most modern Markdown processors implement extensions on top of the core syntax. The following are the most widespread.

Fenced Code Blocks

Three backticks (or three tildes) open and close a code block, with an optional language identifier for syntax highlighting:

```python
def hello():
return "world"
```

This is more convenient than four-space indentation and is supported by GitHub Flavored Markdown, CommonMark, and almost every other modern dialect. Fenced code blocks are arguably the single most useful extension and are universally treated as "core" today even though they're technically not in original Markdown.

Tables

GitHub-style pipe tables:

| Column A | Column B |
|----------|----------|
| Cell 1   | Cell 2   |
| Cell 3   | Cell 4   |

Alignment is controlled by colons in the separator row:

| Left | Center | Right |
|:-----|:------:|------:|
| a    | b      | c     |

Pipe tables are part of GFM, GitLab, Pandoc, MultiMarkdown, and Markdown Extra. They are not in original Markdown or strict CommonMark — but in practice every realistic target supports them.

Strikethrough

~~deleted text~~

Supported by GFM, GitLab, Reddit, Discord (which uses double tildes), and many others.

Task Lists

- [x] Done
- [ ] Not done
- [ ] Also not done

Supported by GFM, GitLab, Obsidian, Joplin, Bear, and most modern note-taking tools.

Footnotes

Here is some text with a footnote.[^1]

[^1]: This is the footnote content.

Supported by Pandoc, MultiMarkdown, Markdown Extra, GitHub, and Obsidian.

Definition Lists

Term 1
:   Definition 1

Term 2
:   Definition 2

Supported by Pandoc, MultiMarkdown, Markdown Extra, Kramdown.

Heading IDs and Anchors

Pandoc / Markdown Extra style explicit IDs:

## My Heading {#my-heading}

GitHub auto-generates IDs from heading text by lowercasing, replacing spaces with hyphens, and stripping punctuation.

Math

LaTeX-style math, using $...$ for inline and $$...$$ for display:

The Pythagorean theorem: $a^2 + b^2 = c^2$.

$$
\int_0^\infty e^{-x}\, dx = 1
$$

Supported by Pandoc, Jupyter, Obsidian, Notion (block-level), GitHub, GitLab, MkDocs (with KaTeX/MathJax extension), and most academic-leaning tools. Not supported by Discord or Slack.

Mermaid Diagrams

A fenced code block with mermaid as the language:

```mermaid
graph TD
A[Start] --> B[Process]
B --> C[End]
```

Supported natively by GitHub, GitLab, Obsidian, MkDocs (with the right plugin), Notion, Joplin, and many others.

Admonitions / Callouts

There are several competing dialects.

MkDocs / Material style:

!!! note "Optional title"
    This is a note.

Obsidian and GitHub style (using blockquote syntax):

> [!NOTE]
> This is a note.

> [!WARNING]
> Be careful.

GitHub supports five types: NOTE, TIP, IMPORTANT, WARNING, CAUTION. Obsidian supports a wider set including INFO, EXAMPLE, QUOTE, ABSTRACT, and others.

Front Matter (YAML / TOML)

Used by static site generators to attach metadata to documents:

---
title: My Post
date: 2025-01-15
tags: [markdown, writing]
draft: false
---

The post content begins here.

Hugo, Jekyll, MkDocs, Obsidian, Pandoc, Eleventy, and most static site generators all read front matter, though they differ on which keys are meaningful.

Wikilinks

Obsidian / Roam style:

[[Page Name]]
[[Page Name|display text]]
[[Page Name#Heading]]
[[Page Name#^block-id]]

Supported by Obsidian, Roam Research, Logseq, Foam, Dendron, MediaWiki (its own dialect), and (with plugins) MkDocs.

Highlighting

==highlighted text==

Supported by Obsidian, Pandoc (with the right extension), and several other tools. Not in GFM or CommonMark.

Subscript and Superscript

H~2~O
E = mc^2^

Pandoc and MultiMarkdown. Reddit supports ^ for superscript with a slightly different syntax.

Emoji Shortcodes

:smile: :heart: :rocket:

Supported by GitHub, GitLab, Discord, Slack, Mattermost, and most chat platforms. Not standard Markdown.

Spoilers

||hidden text||      (Discord)
>!hidden text!<      (Reddit)

Platform-specific.

Block References

Obsidian-style:

This is some text. ^block-id

Reference: [[Page Name#^block-id]]

A power-user feature in Obsidian, Logseq, and Roam Research.


Part 5: Tool Support Matrix

Different platforms support different subsets. Here is a practical map.

CommonMark

The reference standard. Implements the core Gruber syntax with all ambiguities resolved. Includes fenced code blocks. Does not include tables, strikethrough, task lists, footnotes, or math by default. Tools that say "CommonMark" without further qualification typically mean "CommonMark plus our extensions."

GitHub Flavored Markdown (GFM)

A formal superset of CommonMark, and the most widely targeted dialect. Adds:

  • Tables
  • Task lists
  • Strikethrough (~~)
  • Autolinked URLs (URLs without <> around them auto-link)
  • Disallowed raw HTML (a small set of dangerous tags is stripped)
  • Footnotes
  • Math via $...$ and $$...$$
  • Mermaid and other diagrams
  • Admonition callouts (> [!NOTE] etc.)
  • Auto-generated heading IDs and anchor links
  • Issue/PR autolinks (#123, user/repo#456)
  • Username autolinks (@username)
  • Emoji shortcodes (:smile:)

GFM is what you get when you write a README.md on GitHub.

GitLab Flavored Markdown

A superset of GFM. Adds, on top of GFM:

  • Math (KaTeX)
  • Mermaid, PlantUML, Kroki diagrams
  • Multi-line blockquotes with >>>
  • Issue/MR autolinking with GitLab semantics (#123, !456, &789)
  • Wiki-style links inside wikis
  • Inline math with $` ... \$ (alternate syntax)
  • Color chips (e.g. #F00 ` rendered as a swatch)

Pandoc Markdown

The most feature-rich Markdown variant. Pandoc is a universal document converter, and its Markdown dialect is a superset of nearly everything. Adds:

  • Tables in multiple styles: pipe, grid, simple, multiline
  • Definition lists
  • Footnotes
  • Math (LaTeX)
  • Citations ([@key] with bibliography integration)
  • Fenced divs and bracketed spans (with attributes)
  • Line blocks (preserve line breaks for poetry, addresses)
  • Subscript and superscript
  • Smart punctuation (curly quotes, em-dashes)
  • Inline raw LaTeX, HTML, RTF, etc.
  • Heading IDs and arbitrary attributes
  • Numbered example lists
  • Pipe and grid tables with captions

Pandoc is the tool of choice when Markdown needs to round-trip into LaTeX, DOCX, EPUB, or other formats.

MultiMarkdown (MMD)

An older extended dialect by Fletcher Penney, predating GFM. Adds tables, definition lists, footnotes, citations, math, and metadata blocks. Less common today but historically influential — many Pandoc and Markdown Extra features trace back to MMD.

PHP Markdown Extra

By Michel Fortin. Added tables, definition lists, footnotes, abbreviations, fenced code blocks, and inline attributes — many features that later spread to other dialects. Still used in some PHP-based static sites and CMSes.

Obsidian Markdown

Based on CommonMark + GFM, with substantial additions oriented around personal knowledge management:

  • Wikilinks [[Page]]
  • Block references ^block-id
  • Embeds ![[Page]] (transclude another note)
  • Callouts via blockquote syntax (> [!note])
  • Highlights text
  • Internal tags #tag
  • Frontmatter YAML
  • Math (MathJax)
  • Mermaid
  • Plugin-defined syntax (Templater, Dataview queries, etc.)

Notion

Notion accepts Markdown shortcuts on input but stores documents in a proprietary block format. It is not a Markdown editor in the strict sense. Supported on input: headings, lists, bold, italic, inline code, code blocks, quotes, dividers. Tables, math, toggles, callouts, and databases use Notion-specific blocks. Export to Markdown is lossy.

Discord

A small, security-conscious subset:

  • Bold (text), italic (text or _text_), underline (__text__)
  • Strikethrough (~~text~~)
  • Inline code, fenced code blocks (with syntax highlighting)
  • Block quotes (> for one line, >>> for multi-line)
  • Spoilers (||text||)
  • Headings (#, ##, ###)
  • Bullet and numbered lists
  • Masked links text — only in embeds, slash command responses, and a few other contexts, not in regular messages

No tables, no images via Markdown, no HTML, no math.

Slack (mrkdwn)

Slack uses a variant called mrkdwn with significant differences from standard Markdown:

  • bold (single asterisks, not double)
  • _italic_
  • ~strikethrough~ (single tilde)
  • ` code and ```block```
  • > quote
  • Links use a unique syntax: rather than display`
  • No headings, no images via Markdown, limited list support

Slack's Block Kit (a JSON-based composition format) is the recommended path for richer formatting.

Reddit

Reddit uses a CommonMark-based renderer (since 2018) with extensions:

  • Tables
  • Strikethrough
  • Superscript via ^
  • Spoilers via >!text!<
  • Username and subreddit autolinks (/u/name, /r/name)
  • Some old-Reddit quirks remain in legacy threads

Stack Overflow / Stack Exchange

Uses CommonMark with extensions:

  • Tables
  • Fenced code blocks with syntax highlighting (Prettify / highlight.js)
  • HTML allowed via a whitelist
  • KaTeX math on math-tagged sites only (Math Stack Exchange, Cross Validated, etc.)
  • Special quoting and Q&A formatting conventions

Jupyter Notebooks

Markdown cells use a CommonMark-based renderer with:

  • LaTeX math ($...$, $$...$$)
  • HTML passthrough
  • Embedded images (including base64-encoded inline)
  • GFM-style tables
  • Custom widget rendering depending on the frontend (JupyterLab, classic Notebook, VS Code)

MkDocs (especially Material theme)

CommonMark + Python-Markdown extensions. The Material theme adds (via its pymdownx extensions):

  • Admonitions (!!! note)
  • Tabbed content
  • Annotations
  • Task lists
  • Footnotes
  • Math (KaTeX or MathJax)
  • Mermaid
  • Code highlighting and code annotations
  • Content tabs
  • SuperFences (more flexible fenced blocks)
  • Critic markup ({++added++}, {--removed--})

Hugo, Jekyll, and Other Static Site Generators

  • Hugo uses Goldmark (CommonMark + GFM extensions + custom shortcodes). Shortcodes look like {{< youtube id >}} and let authors invoke template snippets from inside Markdown.
  • Jekyll uses Kramdown by default (a Ruby implementation with attribute lists, footnotes, definition lists, math).
  • Eleventy is renderer-agnostic but typically uses markdown-it.
  • Astro, Next.js, and Docusaurus use various JavaScript Markdown libraries — most commonly MDX, which extends Markdown with embedded JSX components.

markdown-it

A popular JavaScript implementation. CommonMark-strict by default, with a rich plugin ecosystem. Powers VS Code's Markdown preview, Eleventy's default renderer, and many web-based editors.

Visual Studio Code

The built-in preview uses markdown-it with GFM-flavored extensions. Supports syntax highlighting in fenced blocks via the editor's language services. Math support requires an extension.

Discord, Slack, Teams, and Other Chat Platforms — Quick Comparison

FeatureDiscordSlackTeamsReddit
Bold**x***x***x****x**
Italic*x* / _x__x_*x* / _x_*x*
Strikethrough~~x~~~x~~~x~~~~x~~
Inline code` x `` x `` x `` x `
Code block ``` ``` ``` indent or ```
Headingsyesnoyesyes
Tablesnonolimitedyes
Mathnonoyes (LaTeX)no

Part 6: Practical Guidance

Which Dialect to Write In

If you don't know where your Markdown will be rendered, target CommonMark plus the GFM extensions for tables, task lists, fenced code, and strikethrough. This subset is supported almost universally and degrades gracefully where it isn't.

If you are writing for a specific tool, learn its dialect. Pandoc users get the most expressive power. GitHub users get the deepest tooling integration. Obsidian users get a knowledge graph. Slack users get... a strange but functional subset.

Common Pitfalls

  • Underscores inside identifiers (some_variable_name) can be misinterpreted as italics in some older renderers. CommonMark fixes this; older renderers may not.
  • Trailing spaces for line breaks are invisible and easy to lose during copy-paste. Prefer paragraph breaks or backslash line breaks where possible.
  • Indentation in lists matters and varies by renderer. CommonMark requires alignment with the first non-whitespace character of the list item content; older Markdown was more forgiving.
  • HTML inside Markdown is not universally supported. Some renderers strip it, some pass it through, some sanitize it via whitelist.
  • Smart punctuation (curly quotes, em-dashes) is a renderer setting in Pandoc and others. Don't rely on it being on or off in unfamiliar environments.
  • Tabs vs. spaces — most renderers expand a tab to four spaces, but some use eight, and a few don't expand at all. Stick to spaces.
  • Lazy continuation of blockquotes and lists differs across dialects. When in doubt, prefix every line of the continuation explicitly.

Markdown for AI Prompts

When writing prompts for LLMs, Markdown structure helps the model attend to your intent. Headings demarcate sections. Code fences contain literal content. Lists enumerate constraints. Bold draws attention. The model has seen millions of well-structured Markdown documents during training and treats Markdown as a strong signal of where it is in a document and what kind of content goes where.

A prompt that uses ## Task, ## Constraints, ## Examples will reliably outperform an unstructured wall of text for the same content, because the model has seen this pattern thousands of times in instruction-style training data.


Part 7: The Road Ahead

Markdown is now over twenty years old, and despite many attempts to displace it, it continues to grow in importance. It is the de facto interchange format for technical writing, the native output of nearly every LLM, and the input format for an enormous fraction of static sites and documentation systems.

The interesting open questions:

  • Will a "CommonMark 2" ever consolidate the most popular extensions (tables, task lists, footnotes, math, callouts) into the core spec? There is no current effort to do so, but pressure builds with every new dialect.
  • Will the proliferation of LLM-generated Markdown push tools toward stricter compliance, or toward further fragmentation as each AI vendor extends the format in proprietary ways?
  • How will rich block-based editors (Notion, Craft, Roam, Tana) influence what people expect Markdown to be able to express? The block paradigm is in many ways incompatible with Markdown's flat-text model.
  • Will MDX and its descendants — Markdown plus embedded components — become the next default, displacing plain Markdown for documentation that needs interactivity?

For now, Markdown's success is its own best argument: the simplest format that does the job tends to win, and Markdown has done the job for two decades and counting.


Appendix: A Cheat Sheet

What you wantWhat you write
Heading# H1, ## H2, … ###### H6
Bold**bold**
Italic*italic*
Bold italic***both***
Strikethrough (GFM)~~strike~~
Inline code` code `
Code block ```lang ```
Blockquote> quote
Unordered list- item
Ordered list1. item
Task list (GFM)- [x] done / - [ ] todo
Link[text\](url)
Image![alt\](url)
Horizontal rule---
Footnotetext[^1] and [^1]: note
Math inline$x^2$
Math block$$ ... $$
Heading anchor (Pandoc)## Title {#my-id}
Callout (GitHub)> [!NOTE]
Front matter (YAML)------ at file top
Mermaid diagram ```mermaid ```