class documentation
class Fragments_XML_OP_Gmb(Fragments):
Constructor: Fragments_XML_OP_Gmb(docbytes, debug)
Turn gemeenteblad in XML form (from KOOP's BUS) into fragments
| Method | __init__ |
Hand the document bytestring into this. Nothing happens yet; you call accepts(), then suitableness(), then possibly fragments() -- see example use in decide(). |
| Method | accepts |
whether we would consider parsing that at all. Often, "is this the right file type". |
| Method | fragments |
yields a tuple for each fragment |
| Method | suitableness |
e.g. |
| Instance Variable | startpaths |
Undocumented |
| Instance Variable | tree |
Undocumented |
Inherited from Fragments:
| Instance Variable | debug |
Undocumented |
| Instance Variable | docbytes |
Undocumented |
Hand the document bytestring into this. Nothing happens yet; you call accepts(), then suitableness(), then possibly fragments() -- see example use in decide().
overrides
wetsuite.helpers.split.Fragments.acceptswhether we would consider parsing that at all. Often, "is this the right file type".
e.g.
- 5: I recognize that's PDF, from OP, and specifically Stcrt so I probably know how to fetch out the text fairly well
- 50: I recognize that's PDF, from OP, so I may do better than entirely generic
- 500: I recognize that's PDF, I will do something generic (because I am a fallback for PDFs)
- 5000: I recognize that's PDF, but I'm specific and it's probably a bad idea if I do something generic The idea is that with multiple of these, we can find the thing that (says) is most specific to this document.