module documentation

Fetches data from rechtspraak.nl's API

Note that the data.rechtspraak.nl/uitspraken/zoeken API is primarily for ranges - they do _not_ seem to allow text searches like the web interface does.

https://www.rechtspraak.nl/Uitspraken/paginas/open-data.aspx

If you want to save time, and server load for them, you would probably start with fetching OpenDataUitspraken.zip via https://www.rechtspraak.nl/Uitspraken/paginas/open-data.aspx and inserting those so you can avoid 3+ million fetches.

There is an API at https://uitspraken.rechtspraak.nl/api/zoek that backs the website search I'm not sure whether we're supposed to use it like this, but it's one of the better APIs I've seen in this context :)

Function parse_content Parse the type of XML you get when you stick an ECLI onto https://data.rechtspraak.nl/uitspraken/content?id= and tries to give you metadata and text. CONSIDER: separating those
Function parse_instanties Parse the 'instanties' value list
Function parse_instanties_buitenlands Parse the 'buitenlandse instanties' value list
Function parse_nietnederlandseuitspraken Parse the 'niet-nederpanse uitspraken' value list
Function parse_proceduresoorten Parse the 'proceduresoorten' value list (assmed to be fixed).
Function parse_rechtsgebieden Parse the 'rechtsgebieden' value list (assumed to be fixed), the data of which seems to be a depth-2 tree.
Function parse_search_results Takes search result etree (as given by search()), and returns a list of dicts like:
Function search Post a search to the public API on data.rechtspraak.nl, based on a dict of parameters.
Constant BASE_URL base URL for search as well as value lists
Function _para_text Given the open-rechtspraak XML, specifically the uitspraak or conclusie node under the root,
Constant _FORMELE_RELATIES_URL Undocumented
Constant _INSTANTIES_BUITENLANDS_URL Undocumented
Constant _INSTANTIES_URL Undocumented
Constant _NIET_NEDERLANDSE_UITSPRAKEN_URL Undocumented
Constant _PROCEDURESOORTEN_URL Undocumented
Constant _RECHTSGEBIEDEN_URL Undocumented
def parse_content(tree):

Parse the type of XML you get when you stick an ECLI onto https://data.rechtspraak.nl/uitspraken/content?id= and tries to give you metadata and text. CONSIDER: separating those

Returns

a dict with TODO

TODO: actually read the schema - see https://www.rechtspraak.nl/Uitspraken/paginas/open-data.aspx

def parse_instanties():

Parse the 'instanties' value list

Returns

a list of flat dicts, with keys Naam, Afkorting, Type, BeginDate, Identifier, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/AG DH',
           'Naam': "Ambtenarengerecht 's-Gravenhage",
      'Afkorting': 'AGSGR',
           'Type': 'AndereGerechtelijkeInstantie',
      'BeginDate': '1913-01-01'},
def parse_instanties_buitenlands():

Parse the 'buitenlandse instanties' value list

Returns

a list of flat dicts, with keys Naam, Identifier, Afkorting, Type, BeginDate, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/instantie/ES/#AudienciaNacionalNationaalHof',
           'Naam': 'Audiencia Nacional (Nationaal Hof)',
      'Afkorting': 'XX',
           'Type': 'BuitenlandseInstantie',
      'BeginDate': '1950-01-01'}
def parse_nietnederlandseuitspraken():

Parse the 'niet-nederpanse uitspraken' value list

Returns

a list of items like:

    {'id': 'ECLI:CE:ECHR:2000:0921JUD003224096', 'ljn': ['AD4213']},
    {'id': 'ECLI:EU:C:2000:679',                 'ljn': ['AD4227']},
    {'id': 'ECLI:EU:C:2000:689',                 'ljn': ['AD4228']},
    {'id': 'ECLI:EU:C:2001:112',                 'ljn': ['AD4244', 'AL3652']},
def parse_proceduresoorten():

Parse the 'proceduresoorten' value list (assmed to be fixed).

Returns

A list of flat dicts, with keys Naam, Identifier, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/procedure#artikel81ROzaken', 'Naam': 'Artikel 81 RO-zaken'}
def parse_rechtsgebieden():

Parse the 'rechtsgebieden' value list (assumed to be fixed), the data of which seems to be a depth-2 tree.

Returns

as a dict with items like:

    'http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht': ['Bestuursrecht'],

and:

    'http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_ambtenarenrecht': ['Ambtenarenrecht', 'Bestuursrecht'],

Where

  • Bestuursrecht is a grouping of this and more,
  • Mededingingsrecht one of several specific parts of it
def parse_search_results(tree):

Takes search result etree (as given by search()), and returns a list of dicts like:

    {      'ecli': 'ECLI:NL:GHARL:2022:7129',
          'title': 'ECLI:NL:GHARL:2022:7129, Gerechtshof Arnhem-Leeuwarden, 16-08-2022, 200.272.381/01',
        'summary': 'some text made shorter for this docstring example',
        'updated': '2023-01-01T13:29:23Z',
           'link': 'https://uitspraken.rechtspraak.nl/InzienDocument?id=ECLI:NL:GHARL:2022:7129',
            'xml': 'https://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:GHARL:2022:7129',
    }

Notes:

  • 'xml' is augmented based on the ecli and does not come from the search results
  • keys may be missing (in practice probably just summary?)
def search(params):

Post a search to the public API on data.rechtspraak.nl, based on a dict of parameters.

See also:

  • https://www.rechtspraak.nl/SiteCollectionDocuments/Technische-documentatie-Open-Data-van-de-Rechtspraak.pdf

Note that when when you give it nonsensical parameters, like date=2022-02-30, the service won't return valid XML, so the XML parse raises an exception.

Parameters
params

parameters like:

  • max (default is 1000)
  • from zero-based, defaults is 0
  • sort by modification date, ASC (default, oldest first) or DESC
  • type 'Uitspraak' or 'Conclusie'
  • date yyyy-mm-dd (once for 'on this date', twice for a range)
  • modified yyyy-mm-ddThh:mm:ss (once for a 'since then to now', twice for a range)
  • return DOC for things where there are documents; if omitted it also fetches things for which there is only metadata
  • replaces fetch ECLI for an LJN
  • subject URI of a rechtsgebied
  • creator

These are handed to urlencode, so could be either a list of tuples, or a dict, but because you are likely to repeat variables to specify ranges, 'list of tuples' should be your habit, e.g.:

    [ ("modified", "2023-01-01), ("modified", "2023-01-05) ]
Returns
etree object for the search (or raises an exception) CONSIDER: returning only the urls
BASE_URL: str =

base URL for search as well as value lists

Value
'https://data.rechtspraak.nl/'
def _para_text(treenode):

Given the open-rechtspraak XML, specifically the uitspraak or conclusie node under the root,

_FORMELE_RELATIES_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/FormeleRelaties')
_INSTANTIES_BUITENLANDS_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/InstantiesBuitenlands')
_INSTANTIES_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/Instanties')
_NIET_NEDERLANDSE_UITSPRAKEN_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/NietNederlandseUitspraken')
_PROCEDURESOORTEN_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/Proceduresoorten')
_RECHTSGEBIEDEN_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/Rechtsgebieden')