This class provides a specialized HTML parser that constructs a tree structure of elements while tracking their positions within the source document. It manages the nesting of tags, handles self-closing and void elements, and performs validation to ensure end tags match their corresponding start tags. The parser maintains a reference to the root element and provides utilities for formatting error messages with precise line and column information.
Attributes
| Attribute | Type | Description |
|---|
| root | [RootElement](rootelement.md?sid=django_test_html_rootelement) = RootElement() | The top-level RootElement instance that serves as the container for all parsed HTML elements and data. |
| open_tags | list = [] | A list of currently open Element objects used to track the nesting structure and determine the parent for new data or tags. |
| element_positions | dict = {} | A dictionary mapping Element objects to their respective line and column positions captured during parsing for error reporting and formatting. |
| current | `Element | RootElement` |
Constructor
Signature
Methods
error()
@classmethod
def error(
msg: string
)
Raises an HTMLParseError with the specified message and the current line and column position of the parser.
Parameters
| Name | Type | Description |
|---|
| msg | string | The error message text describing the parsing failure |
@classmethod
def format_position(
position: tuple|object = null,
element: [Element](element.md?sid=django_test_html_element) = null
) - > string
Formats a human-readable string indicating the line and column location for a specific element or the current parser state.
Parameters
| Name | Type | Description |
|---|
| position | `tuple | object` = null |
| element | [Element](element.md?sid=django_test_html_element) = null | The element whose stored position should be retrieved and formatted |
Returns
| Type | Description |
|---|
string | A formatted string in the format 'Line %d, Column %d' representing the source location |
current()
@classmethod
def current() - > Element|RootElement
Retrieves the currently active element being parsed, which is either the last open tag or the root element.
Returns
| Type | Description |
|---|
| `Element | RootElement` |
handle_startendtag()
@classmethod
def handle_startendtag(
tag: string,
attrs: list
)
Processes self-closing tags by triggering start tag logic and conditionally triggering end tag logic for non-void elements.
Parameters
| Name | Type | Description |
|---|
| tag | string | The name of the HTML tag encountered |
| attrs | list | A list of (name, value) pairs representing the tag's attributes |
handle_starttag()
@classmethod
def handle_starttag(
tag: string,
attrs: list
)
Creates a new Element, appends it to the current parent, and tracks its position; non-void elements are added to the open tags stack.
Parameters
| Name | Type | Description |
|---|
| tag | string | The name of the HTML tag to initialize |
| attrs | list | The raw attribute pairs to be normalized and stored with the element |
handle_endtag()
@classmethod
def handle_endtag(
tag: string
)
Closes the specified tag by popping elements from the stack until a match is found, raising an error if the tag is unexpected.
Parameters
| Name | Type | Description |
|---|
| tag | string | The name of the closing tag to match against the open tags stack |
handle_data()
@classmethod
def handle_data(
data: string
)
Appends raw text data to the currently active element in the parse tree.
Parameters
| Name | Type | Description |
|---|
| data | string | The text content found between HTML tags |