Skip to main content

Parser

This class provides a specialized HTML parser that constructs a tree structure of elements while tracking their positions within the source document. It manages the nesting of tags, handles self-closing and void elements, and performs validation to ensure end tags match their corresponding start tags. The parser maintains a reference to the root element and provides utilities for formatting error messages with precise line and column information.

Attributes

AttributeTypeDescription
root[RootElement](rootelement.md?sid=django_test_html_rootelement) = RootElement()The top-level RootElement instance that serves as the container for all parsed HTML elements and data.
open_tagslist = []A list of currently open Element objects used to track the nesting structure and determine the parent for new data or tags.
element_positionsdict = {}A dictionary mapping Element objects to their respective line and column positions captured during parsing for error reporting and formatting.
current`ElementRootElement`

Constructor

Signature

def Parser() - > null

Methods


error()

@classmethod
def error(
msg: string
)

Raises an HTMLParseError with the specified message and the current line and column position of the parser.

Parameters

NameTypeDescription
msgstringThe error message text describing the parsing failure

format_position()

@classmethod
def format_position(
position: tuple|object = null,
element: [Element](element.md?sid=django_test_html_element) = null
) - > string

Formats a human-readable string indicating the line and column location for a specific element or the current parser state.

Parameters

NameTypeDescription
position`tupleobject` = null
element[Element](element.md?sid=django_test_html_element) = nullThe element whose stored position should be retrieved and formatted

Returns

TypeDescription
stringA formatted string in the format 'Line %d, Column %d' representing the source location

current()

@classmethod
def current() - > Element|RootElement

Retrieves the currently active element being parsed, which is either the last open tag or the root element.

Returns

TypeDescription
`ElementRootElement`

handle_startendtag()

@classmethod
def handle_startendtag(
tag: string,
attrs: list
)

Processes self-closing tags by triggering start tag logic and conditionally triggering end tag logic for non-void elements.

Parameters

NameTypeDescription
tagstringThe name of the HTML tag encountered
attrslistA list of (name, value) pairs representing the tag's attributes

handle_starttag()

@classmethod
def handle_starttag(
tag: string,
attrs: list
)

Creates a new Element, appends it to the current parent, and tracks its position; non-void elements are added to the open tags stack.

Parameters

NameTypeDescription
tagstringThe name of the HTML tag to initialize
attrslistThe raw attribute pairs to be normalized and stored with the element

handle_endtag()

@classmethod
def handle_endtag(
tag: string
)

Closes the specified tag by popping elements from the stack until a match is found, raising an error if the tag is unexpected.

Parameters

NameTypeDescription
tagstringThe name of the closing tag to match against the open tags stack

handle_data()

@classmethod
def handle_data(
data: string
)

Appends raw text data to the currently active element in the parse tree.

Parameters

NameTypeDescription
datastringThe text content found between HTML tags