ASF-Pelican build process

In 2019 Infra created ASF-Pelican as a structure and template for projects to use to build their websites, and for the ASF's own website.

In 2024, Infra moved from ASF-Pelican to the ASF Infrastructure Pelican Action GitHub Action to perform the same functions without being closely tied to BuildBot. The repository for this GHA is github.com/apache/infrastructure-actions/tree/main/pelican.

For websites using the ASf-Pelican template, configure the build using the pelicanconf.py settings.

Pelican theme

# Theme
THEME = './theme/apache'

See ASF-Pelican theme for details about the ASF Theme.

Note: the following material is under review and will have an update soon.

Plugins

ASF-Pelican enhances the Pelican environment with plugins. Our environment has its own copy of the asf plugins, and the pelican-build.py script provides pelican-gfm.

# Pelican Plugins
# pelican-gfm is installed in the buildbot as part of build_pelican.py. It is an ASF Infra custom plugin.
# other plugins are discoverable and can be installed via pip by mentioning them in requirements.txt
# You can find plugins here: https://github.com/pelican-plugins
# Plugins that are custom for this site are found in PLUGIN_PATHS.
PLUGIN_PATHS = ['./theme/plugins']
PLUGINS = ['asfgenid', 'asfdata', 'pelican-gfm', 'asfreader']

Data Model. The asfdata.py plugin builds a metadata model that is shared with every page.
GFM Content. The pelican-gfm plugin reads .md, .markdown, .mkd, and .mdown files and converts the GFM Markdown into HTML.
EZMD Content. The asfreader.py plugin reads .ezmd files, injects data, translates ezt, and converts the GFM Markdown into HTML.
Generate ID. The asfgenid.py plugin performs a number of enhancements to the HTML.

See ASF-Pelican build process for the steps signaled. See plugins for the Python code.

Tree structure

Pages and static content are stored in the same tree. Generated content is output with the same relative path, except with an html extension. These are the necessary settings:

PATH = 'content'
# Save pages using full directory preservation
PAGE_PATHS = ['.']
# Path with no extension
PATH_METADATA = '(?P<path_no_ext>.*)\..*'
# We are not slugifying any pages
ARTICLE_URL = ARTICLE_SAVE_AS = PAGE_URL = PAGE_SAVE_AS = '{path_no_ext}.html'
# We want to serve our static files mixed with content
STATIC_PATHS = ['.']
# we want any HTML to be served as-is
READERS = {'html': None}
# ignore README.md files in the content tree and the interviews and include folders
IGNORE_FILES = ['README.md','interviews','include']

Process

Pelican uses signals as it goes through the process of reading and generating content. It processes pages in no particular order.

Our plugins provide the following activity:

Pelican Signal	Step	GFM Content	EZMD Content	Description
Initialization	Data Model			Read data sources
Reader	Class	GFMReader	ASFReader(GFMReader)	Pelican Reader class
	Read	read_source	super.read_source	Read page source and metadata
	Model Metadata		add_data	Add asf data to the model and expand any `[{ reference }]`
	Translate		ezt	ezt template translation
	Render GFM	render	super.render	Render GFM/HTML into HTML
Content	Generate ID	generate_id	generate_id	Perform ASF specific HTML enhancements
Generator	Template	translate	translate	Create output HTML by pushing the generated content and metadata through the theme's templates

See local builds for how to install ASF-Pelican on your system.

Data model

ezmd templates use a shared data model to generate content. There are three types of data:

When referenced	Data type
EZMD Reader, Content, Generator	Constants - either integer or string values
EZMD Reader	Sequences - arrays of objects with attributes where an attribute may be another sequence
EZMD Reader	Dictionaries - key-value maps where the value may be another dictionary

The constants are also available to the asfgenid.py plugin and the theme's templates.

There are examples of how to inject shared metadata below. See the metadata model for how asfdata.py works to populate the shared metadata.

Read source

The systems uses the read_source method to open a file and convert it into a metadata dictionary and text.

Example:

Title: ASF Export Classifications and Source Links
license: https://www.apache.org/licenses/LICENSE-2.0
asf_headings: False

#### ASF Project
...

The first three lines specify three metadata key-value pairs. There is a blank line and the rest is the text.

Code from pelican-gfm with some parts elided.

    def read_source(self, source_path):
        "Read metadata and content from the source."
	...
	# Fetch the source content, with a few appropriate tweaks
        with pelican.utils.pelican_open(source_path) as text:

            # Extract the metadata from the header of the text
            lines = text.splitlines()
            for i in range(len(lines)):
                line = lines[i]
                match = GFMReader.RE_METADATA.match(line)
                if match:
                    name = match.group(1).strip().lower()
		    ...
                    metadata[name] = value
                elif not line.strip():
                    # blank line
                    continue
                else:
                    # reached actual content
                    break
	    ...
            # Reassemble content, minus the metadata
            text = '\n'.join(lines[i:])

            return text, metadata

Model Metadata

In asfreader.py we extend EZT syntax to do metadata substitution prior to EZT translation. This allows for a more natural and direct representation than with EZT sequences.

Examples

|  |  |  |
|-----------|-----------|-------------|
| [{ board[0].name }] | [{ board[1].name }] | [{ board[2].name }] |
| [{ board[3].name }] | [{ board[4].name }] | [{ board[5].name }] |
| [{ board[6].name }] | [{ board[7].name }] | [{ board[8].name }] |

| Office    | Individual  |
|-----------|-------------|
| Board Chair |  [{ ci[boardchair][roster] }] |
| Vice Chair |  [{ ci[vicechair][roster] }] |
| President |  [{ ci[president][roster] }] |
| Exec. V.P |  [{ ci[execvp][roster] }] |
| [[]Treasurer](https://treasurer.apache.org/) |  [{ ci[treasurer][roster] }] |
| Assistant Treasurer |  [{ ci[assistanttreasurer][roster] }] |
| Secretary |  [{ ci[secretary][roster] }] |
| Assistant Secretary |  [{ ci[assistantsecretary][roster] }] |
| V.P., [[]Legal Affairs](/legal/) |  [{ ci[legal][chair] }] |
| Assistant V.P., [[]Legal Affairs](/legal/) |  [{ ci[assistantvplegalaffairs][roster] }] |

- All volunteer community
- [{ code_lines }]+ lines of code in&nbsp;stewardship
- [{ code_changed }]+ lines of code&nbsp;changed
- [{ code_commits }]+ code commits
- [{ asf_members }] individual ASF&nbsp;Members
- [{ asf_committers }]+ Apache Committers
- [{ asf_contributors }]+ code contributors
- [{ asf_people }]+ people involved in our&nbsp;communities

EZMD Reader

The asfreader.py plugin is responsible for reading the source, adding metadata, ezt translation, and rendering GFM

    def add_data(self, text, metadata):
        "Mix in ASF data as metadata"

        asf_metadata = self.settings.get('ASF_DATA', { }).get('metadata')
        if asf_metadata:
            metadata.update(asf_metadata)
            # insert any direct references
            m = 1
            while m:
                m = METADATA_RE.search(text)
                if m:
                    this_data = m.group(1).strip()
                    format_string = '{{{0}}}'.format(this_data)
                    try:
                        new_string = format_string.format(**metadata)
                        print(f'{{{{{m.group(1)}}}}} -> {new_string}')
                    except Exception:
                        # the data expression was not found
                        new_string = format_string
                        print(f'{{{{{m.group(1)}}}}} is not found')
                    text = re.sub(METADATA_RE, new_string, text, count=1)
        return text, metadata

EZT Translation

ezmd page files are ezt templates that create Markdown and HTML output. See EZT Syntax for the directives.

EZT Examples

Project list:

| Office    | Individual  |
|-----------|-------------|[for projects]
| V.P., [if-any projects.site][[][end]Apache [projects.display_name][if-any projects.site]]([projects.site])[end] | [projects.chair] |[end]

Featured projects:

[for featured_projs]<li [if-index featured_projs first]class="active"[end]>
     <a href="#[featured_projs.key_id]" data-toggle="tab">[featured_projs.display_name]</a>
</li>[end]

Insert a file as-is into the output:

Title: Apache Download Mirrors

[insertfile "include/closer.ezt"]

EZT Code

Code from asfreader.py

            # prepare text as an ezt template
            # compress_whitespace=0 is required as blank lines and indentation have meaning in markdown
            template = ezt.Template(compress_whitespace=0)
            reader = ASFTemplateReader(source_path, text)
            template.parse(reader, base_format=ezt.FORMAT_HTML)
            assert template
            # generate content from ezt template with metadata
            fp = io.StringIO()
            template.generate(fp, metadata)

Render GFM

Content is in GitHub Flavored Markdown (GFM).

ASF-Pelican uses a version of cmark-gfm by GitHub through the pelican-gfm plugin created by Apache Infra.

Mastering Markdown
Detailed Specification with many examples
Many projects used the Apache CMS for their websites. Here are some differences from its markdown.pl.
- HTML Blocks
  - Make sure the first line of your HTML block starts in column one.
  - A blank line terminates an HTML block
    - Exception to this rule for style, pre, and script.
  - Markdown content within an HTML block
- Autolinks
  - www
  - url
  - email
- Disallowed html the tagfilter extension disables certain html. The asfgenid plugin reenables script, style, and iframe html.
Examples

Pelican GFM

The pelican-gfm plugin reads the content file and renders it to HTML.

From asfreader.py:

            # Render the markdown into HTML
            content = super().render(fp.getvalue().encode('utf-8')).decode('utf-8')
            assert content

From pelican-gfm:

    def render(self, text):
      "Use cmark-gfm to render the Markdown into an HTML fragment."

      parser = F_cmark_parser_new(OPTS)
      assert parser
      for name in EXTENSIONS:
        ext = F_cmark_find_syntax_extension(name.encode('utf-8'))
        assert ext
        rv = F_cmark_parser_attach_syntax_extension(parser, ext)
        assert rv
      exts = F_cmark_parser_get_syntax_extensions(parser)
      F_cmark_parser_feed(parser, text, len(text))
      doc = F_cmark_parser_finish(parser)
      assert doc

      output = F_cmark_render_html(doc, OPTS, exts)

      F_cmark_parser_free(parser)
      F_cmark_node_free(doc)

      return output

Generate ID

We use the asfgenid plugin to perform modifications on the generated content that mimics the markdown extensions in the Apache CMS. Many of these ASF-specific enhancements are controlled in pelican settings in the ASF_GENID dictionary.

ASF_GENID key	default	process	page override
unsafe_tags	True	fix up script, style, and iframe HTML tags that the GFM autofilter extension marks as unsafe
-	-	convert HTML into beautiful soup
metadata	True	`{{ metadata }}` include data in the HTML
-	True	inventory of all ID attributes; duplicates are invalid
elements	True	find all `{#id}` and `{.class}` texts and assign attributes
headings	True	assign IDs to all headings w/o IDs already present or assigned with `{#id}` text	asf_headings
headings_re	`r'^h[1-6]'`	regex for finding headings that require IDs
tables	True	tables with a class attribute are assigned `class=table`
toc	True	generate a table of contents if [TOC] is found. If this is set to False then the `toc.py` plugin may be used.
toc_headers	`r'h[1-6]'`	headings to include in the [TOC]
-	-	convert beautiful soup back into HTML.

# Configure the asfgenid plugin
ASF_GENID = {
    'metadata': True,
    'elements': True,
    'headings': True,
    'headings_re': r'^h[1-4]',
    'permalinks': True,
    'toc': True,
    'toc_headers': r"h[1-4]",
    'tables': True,
    'debug': False
}

Element examples

Set the heading ID and permalink to #what

## What is the Apache Software Foundation?  {#what}

The Apache Software Foundation (ASF) is a non-profit 501(c)(3) corporation,
incorporated in Delaware, USA, in June of 1999. The ASF is a natural
outgrowth of The Apache Group, which
formed in 1995 to develop the Apache HTTP Server.

Set the class to display an image to float-right

![Logo](images/logo.svg) {.float-right}

An HTML fragment is also feasible for a similar purpose

<div class=".pull-right" style="float:right; border-style:dotted; width:200px; padding:5px; margin:5px">

SEE INSTEAD: [Trademark Resources Site Map][resources].

</div>

Heading code

Code from asfgenid.py uses BeautifulSoup 4 to manipulate the rendered HTML. Here is an example:

# from Apache CMS markdown/extensions/headerid.py - slugify in the same way as the Apache CMS
def slugify(value, separator):
    """ Slugify a string, to make it URL friendly. """
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
    value = re.sub('[^\\w\\s-]', '', value.decode('ascii')).strip().lower()
    return re.sub('[%s\\s]+' % separator, separator, value)

...

# append a permalink
def permalink(soup, mod_element):
    new_tag = soup.new_tag('a', href='#' + mod_element['id'])
    new_tag['class'] = 'headerlink'
    new_tag['title'] = 'Permalink'
    new_tag.string = LINK_CHAR
    mod_element.append(new_tag)

...

# generate ID for a heading
def headingid_transform(ids, soup, tag, permalinks, perma_set):
    new_string = tag.string
    if not new_string:
        # roll up strings if no immediate string
        new_string = tag.find_all(
            text=lambda t: not isinstance(t, Comment),
            recursive=True)
        new_string = ''.join(new_string)

    # don't have an id create it from text
    new_id = slugify(new_string, '-')
    tag['id'] = unique(new_id, ids)
    if permalinks:
        permalink(soup, tag)
        # inform if there is a duplicate permalink
        unique(tag['id'], perma_set)

...

    # step 6 - find all headings w/o ids already present or assigned with {#id} text
    if asf_headings == 'True':
        if asf_genid['debug']:
            print(f'headings: {content.relative_source_path}')
        # Find heading tags
        HEADING_RE = re.compile(asf_genid['headings_re'])
        for tag in soup.findAll(HEADING_RE, id=False):
            headingid_transform(ids, soup, tag, asf_genid['permalinks'], permalinks)

Copyright 2025, The Apache Software Foundation, Licensed under the Apache License, Version 2.0.
Apache® and the Apache feather logo are trademarks of The Apache Software Foundation.