module documentation
Code to help fetch things from https://www.rijksoverheid.nl/documenten
Function | scrape |
Go through the pagination for a specific document type, calls a callback for each item's detail page URL. |
Variable | doctypes |
Undocumented |
Variable | ministeries |
Undocumented |
Go through the pagination for a specific document type, calls a callback for each item's detail page URL.
What to do with the result is still up to you: you implement a detail_page_callback that gets the URL. There is a notebook with some examples.
As of this writing, we work around a flaw that has probably been corrected since; TODO: describe, check, remove?
This should take _order of magnitude_ of dozens of minutes per thousand items ...mostly because of the backoff to be nice to the server.
This function hardcodes some delays, to not be rude to the server. We could make that async.
Parameters | |
doctype | Undocumented |
detail | this is called for each item. It should accept two arguments
|
from | Start of date range to fetch When from_date and to_date are not given, it defaults to the last four weeks, from call time. If given, both should be a date or datetime. This in part because if you want to fetch _everything_ from the servers, we make you be explicit about it. |
to | End of date range to fetch (if from_date is also given) |
debug | Undocumented |