class Fragments:
Known subclasses: wetsuite.helpers.split.Fragments_HTML_BUS_kamer
, wetsuite.helpers.split.Fragments_HTML_CVDR
, wetsuite.helpers.split.Fragments_HTML_Fallback
, wetsuite.helpers.split.Fragments_HTML_Geschillencommissie
, wetsuite.helpers.split.Fragments_HTML_OP_Bgr
, wetsuite.helpers.split.Fragments_HTML_OP_Gmb
, wetsuite.helpers.split.Fragments_HTML_OP_Prb
, wetsuite.helpers.split.Fragments_HTML_OP_Stb
, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt
, wetsuite.helpers.split.Fragments_HTML_OP_Trb
, wetsuite.helpers.split.Fragments_HTML_OP_Wsb
, wetsuite.helpers.split.Fragments_HTML_Tuchtrecht
, wetsuite.helpers.split.Fragments_PDF_Fallback
, wetsuite.helpers.split.Fragments_XML_BUS_Kamer
, wetsuite.helpers.split.Fragments_XML_BWB
, wetsuite.helpers.split.Fragments_XML_CVDR
, wetsuite.helpers.split.Fragments_XML_Fallback
, wetsuite.helpers.split.Fragments_XML_OP_Bgr
, wetsuite.helpers.split.Fragments_XML_OP_Gmb
, wetsuite.helpers.split.Fragments_XML_OP_Handelingen
, wetsuite.helpers.split.Fragments_XML_OP_Prb
, wetsuite.helpers.split.Fragments_XML_OP_Stb
, wetsuite.helpers.split.Fragments_XML_OP_Stcrt
, wetsuite.helpers.split.Fragments_XML_OP_Trb
, wetsuite.helpers.split.Fragments_XML_OP_Wsb
, wetsuite.helpers.split.Fragments_XML_Rechtspraak
Abstractish base class explaining the purpose of implementing this
Method | __init__ |
Hand the document bytestring into this. Nothing happens yet; you call accepts(), then suitableness(), then possibly fragments() -- see example use in decide(). |
Method | accepts |
whether we would consider parsing that at all. Often, "is this the right file type". |
Method | fragments |
yields a tuple for each fragment |
Method | suitableness |
e.g. |
Instance Variable | debug |
Undocumented |
Instance Variable | docbytes |
Undocumented |
wetsuite.helpers.split.Fragments_HTML_BUS_kamer
, wetsuite.helpers.split.Fragments_HTML_CVDR
, wetsuite.helpers.split.Fragments_HTML_Fallback
, wetsuite.helpers.split.Fragments_HTML_Geschillencommissie
, wetsuite.helpers.split.Fragments_HTML_OP_Bgr
, wetsuite.helpers.split.Fragments_HTML_OP_Gmb
, wetsuite.helpers.split.Fragments_HTML_OP_Prb
, wetsuite.helpers.split.Fragments_HTML_OP_Stb
, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt
, wetsuite.helpers.split.Fragments_HTML_OP_Trb
, wetsuite.helpers.split.Fragments_HTML_OP_Wsb
, wetsuite.helpers.split.Fragments_HTML_Tuchtrecht
, wetsuite.helpers.split.Fragments_PDF_Fallback
, wetsuite.helpers.split.Fragments_XML_BUS_Kamer
, wetsuite.helpers.split.Fragments_XML_BWB
, wetsuite.helpers.split.Fragments_XML_CVDR
, wetsuite.helpers.split.Fragments_XML_Fallback
, wetsuite.helpers.split.Fragments_XML_OP_Bgr
, wetsuite.helpers.split.Fragments_XML_OP_Gmb
, wetsuite.helpers.split.Fragments_XML_OP_Handelingen
, wetsuite.helpers.split.Fragments_XML_OP_Prb
, wetsuite.helpers.split.Fragments_XML_OP_Stb
, wetsuite.helpers.split.Fragments_XML_OP_Stcrt
, wetsuite.helpers.split.Fragments_XML_OP_Trb
, wetsuite.helpers.split.Fragments_XML_OP_Wsb
, wetsuite.helpers.split.Fragments_XML_Rechtspraak
Hand the document bytestring into this. Nothing happens yet; you call accepts(), then suitableness(), then possibly fragments() -- see example use in decide().
Parameters | |
docbytes:bytes | Undocumented |
debug:bool | Undocumented |
wetsuite.helpers.split.Fragments_HTML_BUS_kamer
, wetsuite.helpers.split.Fragments_HTML_CVDR
, wetsuite.helpers.split.Fragments_HTML_Fallback
, wetsuite.helpers.split.Fragments_HTML_Geschillencommissie
, wetsuite.helpers.split.Fragments_HTML_OP_Bgr
, wetsuite.helpers.split.Fragments_HTML_OP_Gmb
, wetsuite.helpers.split.Fragments_HTML_OP_Prb
, wetsuite.helpers.split.Fragments_HTML_OP_Stb
, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt
, wetsuite.helpers.split.Fragments_HTML_OP_Trb
, wetsuite.helpers.split.Fragments_HTML_OP_Wsb
, wetsuite.helpers.split.Fragments_HTML_Tuchtrecht
, wetsuite.helpers.split.Fragments_PDF_Fallback
, wetsuite.helpers.split.Fragments_XML_BUS_Kamer
, wetsuite.helpers.split.Fragments_XML_BWB
, wetsuite.helpers.split.Fragments_XML_CVDR
, wetsuite.helpers.split.Fragments_XML_Fallback
, wetsuite.helpers.split.Fragments_XML_OP_Bgr
, wetsuite.helpers.split.Fragments_XML_OP_Gmb
, wetsuite.helpers.split.Fragments_XML_OP_Handelingen
, wetsuite.helpers.split.Fragments_XML_OP_Prb
, wetsuite.helpers.split.Fragments_XML_OP_Stb
, wetsuite.helpers.split.Fragments_XML_OP_Stcrt
, wetsuite.helpers.split.Fragments_XML_OP_Trb
, wetsuite.helpers.split.Fragments_XML_OP_Wsb
, wetsuite.helpers.split.Fragments_XML_Rechtspraak
whether we would consider parsing that at all. Often, "is this the right file type".
Returns | |
bool | Undocumented |
wetsuite.helpers.split.Fragments_HTML_BUS_kamer
, wetsuite.helpers.split.Fragments_HTML_CVDR
, wetsuite.helpers.split.Fragments_HTML_Fallback
, wetsuite.helpers.split.Fragments_HTML_OP_Bgr
, wetsuite.helpers.split.Fragments_HTML_OP_Gmb
, wetsuite.helpers.split.Fragments_HTML_OP_Prb
, wetsuite.helpers.split.Fragments_HTML_OP_Stb
, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt
, wetsuite.helpers.split.Fragments_HTML_OP_Trb
, wetsuite.helpers.split.Fragments_HTML_OP_Wsb
, wetsuite.helpers.split.Fragments_PDF_Fallback
, wetsuite.helpers.split.Fragments_XML_BUS_Kamer
, wetsuite.helpers.split.Fragments_XML_BWB
, wetsuite.helpers.split.Fragments_XML_CVDR
, wetsuite.helpers.split.Fragments_XML_Fallback
, wetsuite.helpers.split.Fragments_XML_OP_Bgr
, wetsuite.helpers.split.Fragments_XML_OP_Gmb
, wetsuite.helpers.split.Fragments_XML_OP_Handelingen
, wetsuite.helpers.split.Fragments_XML_OP_Prb
, wetsuite.helpers.split.Fragments_XML_OP_Stb
, wetsuite.helpers.split.Fragments_XML_OP_Stcrt
, wetsuite.helpers.split.Fragments_XML_OP_Trb
, wetsuite.helpers.split.Fragments_XML_OP_Wsb
, wetsuite.helpers.split.Fragments_XML_Rechtspraak
yields a tuple for each fragment
wetsuite.helpers.split.Fragments_HTML_BUS_kamer
, wetsuite.helpers.split.Fragments_HTML_CVDR
, wetsuite.helpers.split.Fragments_HTML_Fallback
, wetsuite.helpers.split.Fragments_HTML_Geschillencommissie
, wetsuite.helpers.split.Fragments_HTML_OP_Bgr
, wetsuite.helpers.split.Fragments_HTML_OP_Gmb
, wetsuite.helpers.split.Fragments_HTML_OP_Prb
, wetsuite.helpers.split.Fragments_HTML_OP_Stb
, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt
, wetsuite.helpers.split.Fragments_HTML_OP_Trb
, wetsuite.helpers.split.Fragments_HTML_OP_Wsb
, wetsuite.helpers.split.Fragments_HTML_Tuchtrecht
, wetsuite.helpers.split.Fragments_PDF_Fallback
, wetsuite.helpers.split.Fragments_XML_BUS_Kamer
, wetsuite.helpers.split.Fragments_XML_BWB
, wetsuite.helpers.split.Fragments_XML_CVDR
, wetsuite.helpers.split.Fragments_XML_Fallback
, wetsuite.helpers.split.Fragments_XML_OP_Bgr
, wetsuite.helpers.split.Fragments_XML_OP_Gmb
, wetsuite.helpers.split.Fragments_XML_OP_Handelingen
, wetsuite.helpers.split.Fragments_XML_OP_Prb
, wetsuite.helpers.split.Fragments_XML_OP_Stb
, wetsuite.helpers.split.Fragments_XML_OP_Stcrt
, wetsuite.helpers.split.Fragments_XML_OP_Trb
, wetsuite.helpers.split.Fragments_XML_OP_Wsb
, wetsuite.helpers.split.Fragments_XML_Rechtspraak
e.g.
- 5: I recognize that's PDF, from OP, and specifically Stcrt so I probably know how to fetch out the text fairly well
- 50: I recognize that's PDF, from OP, so I may do better than entirely generic
- 500: I recognize that's PDF, I will do something generic (because I am a fallback for PDFs)
- 5000: I recognize that's PDF, but I'm specific and it's probably a bad idea if I do something generic The idea is that with multiple of these, we can find the thing that (says) is most specific to this document.
Returns | |
int | Undocumented |
wetsuite.helpers.split.Fragments_HTML_BUS_kamer
, wetsuite.helpers.split.Fragments_HTML_Fallback
, wetsuite.helpers.split.Fragments_HTML_OP_Bgr
, wetsuite.helpers.split.Fragments_HTML_OP_Gmb
, wetsuite.helpers.split.Fragments_HTML_OP_Prb
, wetsuite.helpers.split.Fragments_HTML_OP_Stb
, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt
, wetsuite.helpers.split.Fragments_HTML_OP_Trb
, wetsuite.helpers.split.Fragments_HTML_OP_Wsb
, wetsuite.helpers.split.Fragments_XML_BUS_Kamer
, wetsuite.helpers.split.Fragments_XML_OP_Handelingen
Undocumented