class Fragments:
Known subclasses: wetsuite.helpers.split.Fragments_HTML_CVDR, wetsuite.helpers.split.Fragments_HTML_Fallback, wetsuite.helpers.split.Fragments_HTML_Geschillencommissie, wetsuite.helpers.split.Fragments_HTML_OP_Bgr, wetsuite.helpers.split.Fragments_HTML_OP_Gmb, wetsuite.helpers.split.Fragments_HTML_OP_Kamer, wetsuite.helpers.split.Fragments_HTML_OP_Prb, wetsuite.helpers.split.Fragments_HTML_OP_Stb, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt, wetsuite.helpers.split.Fragments_HTML_OP_Trb, wetsuite.helpers.split.Fragments_HTML_OP_Wsb, wetsuite.helpers.split.Fragments_HTML_Tuchtrecht, wetsuite.helpers.split.Fragments_PDF_Fallback, wetsuite.helpers.split.Fragments_XML_BWB, wetsuite.helpers.split.Fragments_XML_CVDR, wetsuite.helpers.split.Fragments_XML_Fallback, wetsuite.helpers.split.Fragments_XML_OP_Bgr, wetsuite.helpers.split.Fragments_XML_OP_Gmb, wetsuite.helpers.split.Fragments_XML_OP_Handelingen, wetsuite.helpers.split.Fragments_XML_OP_Kamer, wetsuite.helpers.split.Fragments_XML_OP_Prb, wetsuite.helpers.split.Fragments_XML_OP_Stb, wetsuite.helpers.split.Fragments_XML_OP_Stcrt, wetsuite.helpers.split.Fragments_XML_OP_Trb, wetsuite.helpers.split.Fragments_XML_OP_Wsb, wetsuite.helpers.split.Fragments_XML_Rechtspraak
Constructor: Fragments(docbytes, debug)
Abstractish base class explaining the purpose of implementing this
| Method | __init__ |
Hand the document bytestring into this. Nothing happens yet; you call accepts(), then suitableness(), then possibly fragments() -- see example use in decide(). |
| Method | accepts |
whether we would consider parsing that at all. Often, "is this the right file type". |
| Method | fragments |
yields a tuple for each fragment |
| Method | suitableness |
e.g. |
| Instance Variable | debug |
Undocumented |
| Instance Variable | docbytes |
Undocumented |
wetsuite.helpers.split.Fragments_HTML_CVDR, wetsuite.helpers.split.Fragments_HTML_Fallback, wetsuite.helpers.split.Fragments_HTML_Geschillencommissie, wetsuite.helpers.split.Fragments_HTML_OP_Bgr, wetsuite.helpers.split.Fragments_HTML_OP_Gmb, wetsuite.helpers.split.Fragments_HTML_OP_Kamer, wetsuite.helpers.split.Fragments_HTML_OP_Prb, wetsuite.helpers.split.Fragments_HTML_OP_Stb, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt, wetsuite.helpers.split.Fragments_HTML_OP_Trb, wetsuite.helpers.split.Fragments_HTML_OP_Wsb, wetsuite.helpers.split.Fragments_HTML_Tuchtrecht, wetsuite.helpers.split.Fragments_PDF_Fallback, wetsuite.helpers.split.Fragments_XML_BWB, wetsuite.helpers.split.Fragments_XML_CVDR, wetsuite.helpers.split.Fragments_XML_Fallback, wetsuite.helpers.split.Fragments_XML_OP_Bgr, wetsuite.helpers.split.Fragments_XML_OP_Gmb, wetsuite.helpers.split.Fragments_XML_OP_Handelingen, wetsuite.helpers.split.Fragments_XML_OP_Kamer, wetsuite.helpers.split.Fragments_XML_OP_Prb, wetsuite.helpers.split.Fragments_XML_OP_Stb, wetsuite.helpers.split.Fragments_XML_OP_Stcrt, wetsuite.helpers.split.Fragments_XML_OP_Trb, wetsuite.helpers.split.Fragments_XML_OP_Wsb, wetsuite.helpers.split.Fragments_XML_RechtspraakHand the document bytestring into this. Nothing happens yet; you call accepts(), then suitableness(), then possibly fragments() -- see example use in decide().
wetsuite.helpers.split.Fragments_HTML_CVDR, wetsuite.helpers.split.Fragments_HTML_Fallback, wetsuite.helpers.split.Fragments_HTML_Geschillencommissie, wetsuite.helpers.split.Fragments_HTML_OP_Bgr, wetsuite.helpers.split.Fragments_HTML_OP_Gmb, wetsuite.helpers.split.Fragments_HTML_OP_Kamer, wetsuite.helpers.split.Fragments_HTML_OP_Prb, wetsuite.helpers.split.Fragments_HTML_OP_Stb, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt, wetsuite.helpers.split.Fragments_HTML_OP_Trb, wetsuite.helpers.split.Fragments_HTML_OP_Wsb, wetsuite.helpers.split.Fragments_HTML_Tuchtrecht, wetsuite.helpers.split.Fragments_PDF_Fallback, wetsuite.helpers.split.Fragments_XML_BWB, wetsuite.helpers.split.Fragments_XML_CVDR, wetsuite.helpers.split.Fragments_XML_Fallback, wetsuite.helpers.split.Fragments_XML_OP_Bgr, wetsuite.helpers.split.Fragments_XML_OP_Gmb, wetsuite.helpers.split.Fragments_XML_OP_Handelingen, wetsuite.helpers.split.Fragments_XML_OP_Kamer, wetsuite.helpers.split.Fragments_XML_OP_Prb, wetsuite.helpers.split.Fragments_XML_OP_Stb, wetsuite.helpers.split.Fragments_XML_OP_Stcrt, wetsuite.helpers.split.Fragments_XML_OP_Trb, wetsuite.helpers.split.Fragments_XML_OP_Wsb, wetsuite.helpers.split.Fragments_XML_Rechtspraakwhether we would consider parsing that at all. Often, "is this the right file type".
wetsuite.helpers.split.Fragments_HTML_CVDR, wetsuite.helpers.split.Fragments_HTML_Fallback, wetsuite.helpers.split.Fragments_HTML_OP_Bgr, wetsuite.helpers.split.Fragments_HTML_OP_Gmb, wetsuite.helpers.split.Fragments_HTML_OP_Kamer, wetsuite.helpers.split.Fragments_HTML_OP_Prb, wetsuite.helpers.split.Fragments_HTML_OP_Stb, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt, wetsuite.helpers.split.Fragments_HTML_OP_Trb, wetsuite.helpers.split.Fragments_HTML_OP_Wsb, wetsuite.helpers.split.Fragments_PDF_Fallback, wetsuite.helpers.split.Fragments_XML_BWB, wetsuite.helpers.split.Fragments_XML_CVDR, wetsuite.helpers.split.Fragments_XML_Fallback, wetsuite.helpers.split.Fragments_XML_OP_Bgr, wetsuite.helpers.split.Fragments_XML_OP_Gmb, wetsuite.helpers.split.Fragments_XML_OP_Handelingen, wetsuite.helpers.split.Fragments_XML_OP_Kamer, wetsuite.helpers.split.Fragments_XML_OP_Prb, wetsuite.helpers.split.Fragments_XML_OP_Stb, wetsuite.helpers.split.Fragments_XML_OP_Stcrt, wetsuite.helpers.split.Fragments_XML_OP_Trb, wetsuite.helpers.split.Fragments_XML_OP_Wsb, wetsuite.helpers.split.Fragments_XML_Rechtspraakyields a tuple for each fragment
wetsuite.helpers.split.Fragments_HTML_CVDR, wetsuite.helpers.split.Fragments_HTML_Fallback, wetsuite.helpers.split.Fragments_HTML_Geschillencommissie, wetsuite.helpers.split.Fragments_HTML_OP_Bgr, wetsuite.helpers.split.Fragments_HTML_OP_Gmb, wetsuite.helpers.split.Fragments_HTML_OP_Kamer, wetsuite.helpers.split.Fragments_HTML_OP_Prb, wetsuite.helpers.split.Fragments_HTML_OP_Stb, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt, wetsuite.helpers.split.Fragments_HTML_OP_Trb, wetsuite.helpers.split.Fragments_HTML_OP_Wsb, wetsuite.helpers.split.Fragments_HTML_Tuchtrecht, wetsuite.helpers.split.Fragments_PDF_Fallback, wetsuite.helpers.split.Fragments_XML_BWB, wetsuite.helpers.split.Fragments_XML_CVDR, wetsuite.helpers.split.Fragments_XML_Fallback, wetsuite.helpers.split.Fragments_XML_OP_Bgr, wetsuite.helpers.split.Fragments_XML_OP_Gmb, wetsuite.helpers.split.Fragments_XML_OP_Handelingen, wetsuite.helpers.split.Fragments_XML_OP_Kamer, wetsuite.helpers.split.Fragments_XML_OP_Prb, wetsuite.helpers.split.Fragments_XML_OP_Stb, wetsuite.helpers.split.Fragments_XML_OP_Stcrt, wetsuite.helpers.split.Fragments_XML_OP_Trb, wetsuite.helpers.split.Fragments_XML_OP_Wsb, wetsuite.helpers.split.Fragments_XML_Rechtspraake.g.
- 5: I recognize that's PDF, from OP, and specifically Stcrt so I probably know how to fetch out the text fairly well
- 50: I recognize that's PDF, from OP, so I may do better than entirely generic
- 500: I recognize that's PDF, I will do something generic (because I am a fallback for PDFs)
- 5000: I recognize that's PDF, but I'm specific and it's probably a bad idea if I do something generic The idea is that with multiple of these, we can find the thing that (says) is most specific to this document.
wetsuite.helpers.split.Fragments_HTML_Fallback, wetsuite.helpers.split.Fragments_HTML_OP_Bgr, wetsuite.helpers.split.Fragments_HTML_OP_Gmb, wetsuite.helpers.split.Fragments_HTML_OP_Kamer, wetsuite.helpers.split.Fragments_HTML_OP_Prb, wetsuite.helpers.split.Fragments_HTML_OP_Stb, wetsuite.helpers.split.Fragments_HTML_OP_Stcrt, wetsuite.helpers.split.Fragments_HTML_OP_Trb, wetsuite.helpers.split.Fragments_HTML_OP_Wsb, wetsuite.helpers.split.Fragments_XML_OP_Handelingen, wetsuite.helpers.split.Fragments_XML_OP_KamerUndocumented