Skip to main content

TruncateWordsHTMLParser

This class provides specialized HTML parsing functionality for truncating content based on a specific word count. It processes text data by splitting it into individual words, joining a subset determined by the remaining word limit, and escaping the resulting output for safe HTML rendering.

Methods


process()

@classmethod
def process(
data: string
) - > tuple

Splits the input text into individual words and truncates the content based on the remaining word count allowed for the HTML document.

Parameters

NameTypeDescription
datastringThe raw text content extracted from an HTML node to be processed and truncated.

Returns

TypeDescription
tupleA tuple containing the list of all split words and the HTML-escaped string of the truncated text.