Skip to main content

parse_html

Take a string that contains HTML and turn it into a Python object structure that can be easily compared against other HTML on semantic equivalence. Syntactical differences like which quotation is used on arguments will be ignored.

def parse_html(
html: string
) - > DocumentNode

Take a string that contains HTML and turn it into a Python object structure that can be easily compared against other HTML on semantic equivalence. Syntactical differences like which quotation is used on arguments will be ignored.

Parameters

NameTypeDescription
htmlstringThe raw HTML content string to be parsed into a semantic object structure.

Returns

TypeDescription
DocumentNodeA finalized document object or the single top-level child element representing the parsed HTML structure.