class SRUBase:
Known subclasses: wetsuite.datacollect.koop_sru.BWB
, wetsuite.datacollect.koop_sru.CVDR
, wetsuite.datacollect.koop_sru.EuropeseRichtlijnen
, wetsuite.datacollect.koop_sru.LokaleBekendmakingen
, wetsuite.datacollect.koop_sru.OfficielePublicaties
, wetsuite.datacollect.koop_sru.PLOOI
, wetsuite.datacollect.koop_sru.PUCOpenData
, wetsuite.datacollect.koop_sru.SamenwerkendeCatalogi
, wetsuite.datacollect.koop_sru.StatenGeneraalDigitaal
, wetsuite.datacollect.koop_sru.TuchtRecht
, wetsuite.datacollect.koop_sru.WetgevingsKalender
Very minimal SRU implementation - just enough to access the KOOP repositories.
Method | __init__ |
No summary |
Method | explain |
Does an explain operation, Returns the XML |
Method | explain |
Does an explain operation, Returns a dict with some of the more interesting details. |
Method | num |
After you do a search_retrieve, this should be set to a number. |
Method | search |
Fetches a range of results for a particular query. Returns each result record as a separate ElementTree object. |
Method | search |
This function builds on search_retrieve() to "fetch _many_ results results in chunks", by calling search_retrieve() repeatedly. |
Instance Variable | base |
The base URL that other things add to; added from instantiation. |
Instance Variable | extra |
extra piece of query to add to the quiery you do late. This lets us representing subsets of larger repositories. |
Instance Variable | number |
the number of results reported in the last query we did. None before you do a query. CONSIDER: changing that. |
Instance Variable | sru |
hardcoded to "1.2" |
Instance Variable | verbose |
whether to print out things while we do them. |
Instance Variable | x |
The x_connection attribute that some of these need; added from instantiation. |
Method | _url |
Combines the basic URL parts given to the constructor, and ensures there's a ? (so you know you can add &k=v) This can probably go into the constructor, when I know how much is constant across SRU URLs... |
wetsuite.datacollect.koop_sru.BWB
, wetsuite.datacollect.koop_sru.CVDR
, wetsuite.datacollect.koop_sru.EuropeseRichtlijnen
, wetsuite.datacollect.koop_sru.LokaleBekendmakingen
, wetsuite.datacollect.koop_sru.OfficielePublicaties
, wetsuite.datacollect.koop_sru.PLOOI
, wetsuite.datacollect.koop_sru.PUCOpenData
, wetsuite.datacollect.koop_sru.SamenwerkendeCatalogi
, wetsuite.datacollect.koop_sru.StatenGeneraalDigitaal
, wetsuite.datacollect.koop_sru.TuchtRecht
, wetsuite.datacollect.koop_sru.WetgevingsKalender
Parameters | |
basestr | The base URL that other things add to. Basically everything up to the '?' |
xstr | an attribute that some of these need in the URL. Seems to be non-standard and required for these repos. |
extrastr | is used to let us AND something into the query, and is intended to restrict to a subset of documents. This lets us representing subsets of larger repositories (somewhat related to x_connection). |
verbose | whether to print out things while we do them. |
Does an explain operation, Returns the XML
- if readable==False, it returns it as-is
- if readable==True (default), it will ease human readability:
- strips namespaces
- reindent
The XML is a unicode string (for consistency with other parts of this codebase)
Does an explain operation, Returns a dict with some of the more interesting details.
TODO: actually read the standard instead of assuming things.
Fetches a range of results for a particular query. Returns each result record as a separate ElementTree object.
Exactly what each record contains will vary per repository, sometimes even per presumably-sensible-subset of records, but you may well _want_ access to this detail in raw form because in some cases, it can contain metadata not as easily fetched from the result documents themselves.
You mat want to fish out the number of results (TODO: make that easier)
Notes:
- strips namespaces from the results - makes writing code more convenient
CONSIDER:
- option to returning URL instead of searching
Parameters | |
query:str | the query string, in CQL form (see the Library of Congress spec) the list of indices you can search in (e.g. e.g. 'dcterms.modified>=2000-01-01') varies with each repo take a look at explain_parsed() (a parsed summary) or explain() (the actual explain XML) |
start | what record offset to start fetching at. Note: one-based counting |
maximum | how many records to fetch (from start_offset). Note that repositories may not like high values here. ...so if you care about _all_ results of a possible-large set, then you probably want to use search_retrieve_many() instead. |
callback | if not None, this function calls it for each such record node. You can instead wait for the entire range of fetches to conclude and hand you the complete list of result records. |
verbose | whether to be even more verbose during this query |
This function builds on search_retrieve() to "fetch _many_ results results in chunks", by calling search_retrieve() repeatedly.
(search_retrieve() will have a limit on how many to search at once, though is still useful to see e.g. if there are results at all)
Like search_retrieve, it (eventually) returns each result record as an elementTree objects, (this can be more convenient if you an to handle the results as a whole)
...and if callback is not None, this will be called on each result _during_ the fetching process. (this can be more convenient way of dealing with many results while they come in)
Parameters | |
query:str | like in search_retrieve() |
atint | how many records to fetch in a single request |
startint | like in search_retrieve() |
upint | is the last record to fetch - as an absolute offset, so e.g. start_offset=200,up_to=250 gives you records 200..250, not 200..450. |
callback | like in search_retrieve() |
waitfloat | a backoff sleep between each search request, to avoid hammering a server too much. you can lower this where you know this is overly cautious note that we skip this sleep if one fetch was enough |
verbose:bool | whether to be even more verbose during this query since we fetch in chunks, we may overshoot in the last fetch, by up to at_a_time amount of entries. The code should avoid returning those. CONSIDER:
|
extra piece of query to add to the quiery you do late. This lets us representing subsets of larger repositories.
the number of results reported in the last query we did. None before you do a query. CONSIDER: changing that.