TruncateCharsHTMLParser
This class provides specialized HTML parsing functionality to truncate content based on a specific character count while preserving HTML structure. It tracks the number of processed characters and handles the insertion of replacement strings when the specified length limit is reached. The parser ensures that truncation occurs accurately across text data while managing character references and escaping requirements.
Attributes
| Attribute | Type | Description |
|---|---|---|
| length | int | The maximum number of characters allowed in the truncated output. |
| processed_chars | int = 0 | A counter tracking the total number of characters encountered during the parsing process to determine when truncation limits are reached. |
Constructor
Signature
def TruncateCharsHTMLParser(
length: int,
replacement: str,
convert_charrefs: bool = True
) - > null
Parameters
| Name | Type | Description |
|---|---|---|
| length | int | The maximum number of characters allowed before truncation. |
| replacement | str | The string to append when truncation occurs. |
| convert_charrefs | bool = True | Whether to convert character references during parsing. |
Signature
def TruncateCharsHTMLParser(
length: int,
replacement: str,
convert_charrefs: bool = True
) - > null
Parameters
| Name | Type | Description |
|---|---|---|
| length | int | The maximum number of characters allowed before truncation occurs. |
| replacement | str | The string appended to the content if it exceeds the specified length. |
| convert_charrefs | bool = True | Determines whether character references are converted during parsing. |
Methods
process()
@classmethod
def process(
data: str
) - > tuple
Processes a chunk of text data, tracking character counts and raising a TruncationCompleted exception if the limit is reached.
Parameters
| Name | Type | Description |
|---|---|---|
| data | str | The raw text segment to be processed and counted against the truncation limit. |
Returns
| Type | Description |
|---|---|
tuple | A tuple containing the original data chunk and the escaped, potentially truncated output string. |