class documentation

Very minimal SRU implementation - just enough to access the KOOP repositories.

Method __init__ No summary
Method explain Does an explain operation, Returns the XML
Method explain_parsed Does an explain operation, Returns a dict with some of the more interesting details.
Method num_records After you do a search_retrieve, this should be set to a number.
Method search_retrieve Fetches a range of results for a particular query. Returns each result record as a separate ElementTree object.
Method search_retrieve_many This function builds on search_retrieve() to "fetch _many_ results results in chunks", by calling search_retrieve() repeatedly.
Instance Variable base_url The base URL that other things add to; added from instantiation.
Instance Variable extra_query extra piece of query to add to the quiery you do late. This lets us representing subsets of larger repositories.
Instance Variable number_of_records the number of results reported in the last query we did. None before you do a query. CONSIDER: changing that.
Instance Variable sru_version hardcoded to "1.2"
Instance Variable verbose whether to print out things while we do them.
Instance Variable x_connection The x_connection attribute that some of these need; added from instantiation.
Method _url Combines the basic URL parts given to the constructor, and ensures there's a ? (so you know you can add &k=v) This can probably go into the constructor, when I know how much is constant across SRU URLs...
def __init__(self, base_url, x_connection=None, extra_query=None, verbose=False):
Parameters
base_url:strThe base URL that other things add to. Basically everything up to the '?'
x_connection:stran attribute that some of these need in the URL. Seems to be non-standard and required for these repos.
extra_query:stris used to let us AND something into the query, and is intended to restrict to a subset of documents. This lets us representing subsets of larger repositories (somewhat related to x_connection).
verbosewhether to print out things while we do them.
def explain(self, readable=True, strip_namespaces=True, timeout=10):

Does an explain operation, Returns the XML

  • if readable==False, it returns it as-is
  • if readable==True (default), it will ease human readability:
    • strips namespaces
    • reindent

The XML is a unicode string (for consistency with other parts of this codebase)

def explain_parsed(self, timeout=10):

Does an explain operation, Returns a dict with some of the more interesting details.

TODO: actually read the standard instead of assuming things.

def num_records(self):

After you do a search_retrieve, this should be set to a number.

This function may change.

def search_retrieve(self, query, start_record=None, maximum_records=None, callback=None, verbose=False):

Fetches a range of results for a particular query. Returns each result record as a separate ElementTree object.

Exactly what each record contains will vary per repository, sometimes even per presumably-sensible-subset of records, but you may well _want_ access to this detail in raw form because in some cases, it can contain metadata not as easily fetched from the result documents themselves.

You mat want to fish out the number of results (TODO: make that easier)

Notes:

  • strips namespaces from the results - makes writing code more convenient

CONSIDER:

  • option to returning URL instead of searching
Parameters
query:strthe query string, in CQL form (see the Library of Congress spec) the list of indices you can search in (e.g. e.g. 'dcterms.modified>=2000-01-01') varies with each repo take a look at explain_parsed() (a parsed summary) or explain() (the actual explain XML)
start_recordwhat record offset to start fetching at. Note: one-based counting
maximum_recordshow many records to fetch (from start_offset). Note that repositories may not like high values here. ...so if you care about _all_ results of a possible-large set, then you probably want to use search_retrieve_many() instead.
callbackif not None, this function calls it for each such record node. You can instead wait for the entire range of fetches to conclude and hand you the complete list of result records.
verbosewhether to be even more verbose during this query
def search_retrieve_many(self, query, at_a_time=10, start_record=1, up_to=250, callback=None, wait_between_sec=0.5, verbose=False):

This function builds on search_retrieve() to "fetch _many_ results results in chunks", by calling search_retrieve() repeatedly.

(search_retrieve() will have a limit on how many to search at once, though is still useful to see e.g. if there are results at all)

Like search_retrieve, it (eventually) returns each result record as an elementTree objects, (this can be more convenient if you an to handle the results as a whole)

...and if callback is not None, this will be called on each result _during_ the fetching process. (this can be more convenient way of dealing with many results while they come in)

Parameters
query:strlike in search_retrieve()
at_a_time:inthow many records to fetch in a single request
start_record:intlike in search_retrieve()
up_to:intis the last record to fetch - as an absolute offset, so e.g. start_offset=200,up_to=250 gives you records 200..250, not 200..450.
callbacklike in search_retrieve()
wait_between_sec:floata backoff sleep between each search request, to avoid hammering a server too much. you can lower this where you know this is overly cautious note that we skip this sleep if one fetch was enough
verbose:bool

whether to be even more verbose during this query

since we fetch in chunks, we may overshoot in the last fetch, by up to at_a_time amount of entries. The code should avoid returning those.

CONSIDER:

  • maybe yield something including numberOfRecords before yielding results?
base_url =

The base URL that other things add to; added from instantiation.

extra_query =

extra piece of query to add to the quiery you do late. This lets us representing subsets of larger repositories.

number_of_records =

the number of results reported in the last query we did. None before you do a query. CONSIDER: changing that.

sru_version: str =

hardcoded to "1.2"

verbose =

whether to print out things while we do them.

x_connection =

The x_connection attribute that some of these need; added from instantiation.

def _url(self):

Combines the basic URL parts given to the constructor, and ensures there's a ? (so you know you can add &k=v) This can probably go into the constructor, when I know how much is constant across SRU URLs