Skip to main content

TruncateCharsHTMLParser

This class provides specialized HTML parsing functionality to truncate content based on a specific character count while preserving HTML structure. It tracks the number of processed characters and handles the insertion of replacement strings when the specified length limit is reached. The parser ensures that truncation occurs accurately across text data while managing character references and escaping requirements.

Attributes

AttributeTypeDescription
lengthintThe maximum number of characters allowed in the truncated output.
processed_charsint = 0A counter tracking the total number of characters encountered during the parsing process to determine when truncation limits are reached.

Constructor

Signature

def TruncateCharsHTMLParser(
length: int,
replacement: str,
convert_charrefs: bool = True
) - > null

Parameters

NameTypeDescription
lengthintThe maximum number of characters allowed before truncation.
replacementstrThe string to append when truncation occurs.
convert_charrefsbool = TrueWhether to convert character references during parsing.

Signature

def TruncateCharsHTMLParser(
length: int,
replacement: str,
convert_charrefs: bool = True
) - > null

Parameters

NameTypeDescription
lengthintThe maximum number of characters allowed before truncation occurs.
replacementstrThe string appended to the content if it exceeds the specified length.
convert_charrefsbool = TrueDetermines whether character references are converted during parsing.

Methods


process()

@classmethod
def process(
data: str
) - > tuple

Processes a chunk of text data, tracking character counts and raising a TruncationCompleted exception if the limit is reached.

Parameters

NameTypeDescription
datastrThe raw text segment to be processed and counted against the truncation limit.

Returns

TypeDescription
tupleA tuple containing the original data chunk and the escaped, potentially truncated output string.