wetsuite.helpers.split.Fragments_HTML

class documentation

class Fragments_HTML_Fallback(Fragments):

Constructor: Fragments_HTML_Fallback(docbytes, debug)

Extract text from HTML from non-specific source into fragments

Method	`__init__`	Hand the document bytestring into this. Nothing happens yet; you call accepts(), then suitableness(), then possibly fragments() -- see example use in decide().
Method	`accepts`	whether we would consider parsing that at all. Often, "is this the right file type".
Method	`fragments`	No metadata at all, just text split by
Method	`suitableness`	Mostly just says we're a bad example but we'll try; our accepts() is the real filter here
Instance Variable	`docbytes`	Undocumented
Instance Variable	`etree`	Undocumented

Inherited from Fragments:

Instance Variable debug Undocumented

def __init__(self, docbytes, debug=False): ¶

overrides wetsuite.helpers.split.Fragments.__init__

Hand the document bytestring into this. Nothing happens yet; you call accepts(), then suitableness(), then possibly fragments() -- see example use in decide().

def accepts(self): ¶

overrides wetsuite.helpers.split.Fragments.accepts

whether we would consider parsing that at all. Often, "is this the right file type".

def fragments(self): ¶

overrides wetsuite.helpers.split.Fragments.fragments

No metadata at all, just text split by

def suitableness(self): ¶

overrides wetsuite.helpers.split.Fragments.suitableness

Mostly just says we're a bad example but we'll try; our accepts() is the real filter here

docbytes = ¶

overrides wetsuite.helpers.split.Fragments.docbytes

Undocumented

etree = ¶

Undocumented

wetsuite.helpers.split.Fragments_HTML_Fallback

`wetsuite.helpers.split.Fragments_HTML_Fallback`