module documentation

Fetches data from rechtspraak.nl's API at https://data.rechtspraak.nl/

Note that the data.rechtspraak.nl/uitspraken/zoeken API is primarily for ranges (see the description in search) as they do _not_ allow text searches like the web interface does.

(There is an API behind https://uitspraken.rechtspraak.nl/api/zoek that is actually much better, yet we're probably not intended to be used like this and there is no reason to assume this will not change over time)

Note that many of the the parse_* functions parse fixed or mostly-fixed lists that might be useful in supporting use, but the more interesting thing here is search.

Function parse_content Parse uitspraak content XMLs - the type you get when you stick an ECLI onto https://data.rechtspraak.nl/uitspraken/content?id=
Function parse_instanties Parse the 'instanties' value list (which is probably mostly static)
Function parse_instanties_buitenlands Parse the 'buitenlandse instanties' value list (which is probably mostly static)
Function parse_nietnederlandseuitspraken Parse the 'niet-nederpanse uitspraken' value list
Function parse_proceduresoorten Parse the 'proceduresoorten' value list (which is probably mostly static)
Function parse_rechtsgebieden Parse the 'rechtsgebieden' value list (which is probably mostly static) the data of which seems to be a depth-2 tree.
Function parse_search_results Takes search result etree (as given by search()), and returns a list of dicts like:
Function search Post a search to the public API on data.rechtspraak.nl, based on a dict of parameters.
Constant BASE_URL base URL for search as well as value lists
Function _para_text Given the open-rechtspraak XML, specifically the uitspraak or conclusie node under the root, tries to give text in paragraph-sized chunks at a time (actually determined by the document structure).
Constant _FORMELE_RELATIES_URL Undocumented
Constant _INSTANTIES_BUITENLANDS_URL Undocumented
Constant _INSTANTIES_URL Undocumented
Constant _NIET_NEDERLANDSE_UITSPRAKEN_URL Undocumented
Constant _PROCEDURESOORTEN_URL Undocumented
Constant _RECHTSGEBIEDEN_URL Undocumented
def parse_content(tree):

Parse uitspraak content XMLs - the type you get when you stick an ECLI onto https://data.rechtspraak.nl/uitspraken/content?id=

Tries to give you metadata and text (CONSIDER: separating those).

There is an example use in the notebook repo (e.g. dataset_intro_by_doing__rechtspraaknl_raw).

TODO: actually read the schema - see https://www.rechtspraak.nl/Uitspraken/paginas/open-data.aspx

Returns
a dict like
def parse_instanties():

Parse the 'instanties' value list (which is probably mostly static)

Returns

a list of flat dicts, with keys Naam, Afkorting, Type, BeginDate, Identifier, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/AG DH',
           'Naam': "Ambtenarengerecht 's-Gravenhage",
      'Afkorting': 'AGSGR',
           'Type': 'AndereGerechtelijkeInstantie',
      'BeginDate': '1913-01-01'},
def parse_instanties_buitenlands():

Parse the 'buitenlandse instanties' value list (which is probably mostly static)

Returns

a list of flat dicts, with keys Naam, Identifier, Afkorting, Type, BeginDate, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/instantie/ES/#AudienciaNacionalNationaalHof',
           'Naam': 'Audiencia Nacional (Nationaal Hof)',
      'Afkorting': 'XX',
           'Type': 'BuitenlandseInstantie',
      'BeginDate': '1950-01-01'}
def parse_nietnederlandseuitspraken():

Parse the 'niet-nederpanse uitspraken' value list

Returns

a list of items like:

    {'id': 'ECLI:CE:ECHR:2000:0921JUD003224096', 'ljn': ['AD4213']},
    {'id': 'ECLI:EU:C:2000:679',                 'ljn': ['AD4227']},
    {'id': 'ECLI:EU:C:2000:689',                 'ljn': ['AD4228']},
    {'id': 'ECLI:EU:C:2001:112',                 'ljn': ['AD4244', 'AL3652']},
def parse_proceduresoorten():

Parse the 'proceduresoorten' value list (which is probably mostly static)

Returns

A list of flat dicts, with keys Naam, Identifier, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/procedure#artikel81ROzaken', 'Naam': 'Artikel 81 RO-zaken'}
def parse_rechtsgebieden():

Parse the 'rechtsgebieden' value list (which is probably mostly static) the data of which seems to be a depth-2 tree.

Returns

as a dict with items like:

    'http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht': ['Bestuursrecht'],

and:

    'http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_ambtenarenrecht': ['Ambtenarenrecht', 'Bestuursrecht'],

Where

  • Bestuursrecht is a grouping of this and more,
  • Mededingingsrecht one of several specific parts of it
def parse_search_results(tree):

Takes search result etree (as given by search()), and returns a list of dicts like:

    {      'ecli': 'ECLI:NL:GHARL:2022:7129',
          'title': 'ECLI:NL:GHARL:2022:7129, Gerechtshof Arnhem-Leeuwarden, 16-08-2022, 200.272.381/01',
        'summary': 'some text made shorter for this docstring example',
        'updated': '2023-01-01T13:29:23Z',
           'link': 'https://uitspraken.rechtspraak.nl/InzienDocument?id=ECLI:NL:GHARL:2022:7129',
            'xml': 'https://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:GHARL:2022:7129',
    }

Notes:

  • 'xml' is augmented based on the ecli and does not come from the search results
  • keys may be missing (in practice probably just summary?)
def search(params):

Post a search to the public API on data.rechtspraak.nl, based on a dict of parameters.

See also:

  • https://www.rechtspraak.nl/SiteCollectionDocuments/Technische-documentatie-Open-Data-van-de-Rechtspraak.pdf

Note that when when you give it nonsensical parameters, like date=2022-02-30, the service won't return valid XML, so the XML parse raises an exception.

Parameters
params

parameters like:

  • max (default is 1000)
  • from zero-based, defaults is 0
  • sort by modification date, ASC (default, oldest first) or DESC
  • type 'Uitspraak' or 'Conclusie'
  • date yyyy-mm-dd (once for 'on this date', twice for a range)
  • modified yyyy-mm-ddThh:mm:ss (once for a 'since then to now', twice for a range)
  • return DOC for things where there are documents; if omitted it also fetches things for which there is only metadata
  • replaces fetch ECLI for an LJN
  • subject URI of a rechtsgebied
  • creator

These are handed to urlencode, so could be either a list of tuples, or a dict, but because you are likely to repeat variables to specify ranges, 'list of tuples' should be your habit, e.g.:

    [ ("modified", "2023-01-01), ("modified", "2023-01-05) ]
Returns
etree object for the search (or raises an exception) CONSIDER: returning only the urls
BASE_URL: str =

base URL for search as well as value lists

Value
'https://data.rechtspraak.nl/'
def _para_text(treenode):

Given the open-rechtspraak XML, specifically the uitspraak or conclusie node under the root, tries to give text in paragraph-sized chunks at a time (actually determined by the document structure).

Mainly used by parse_content()

_FORMELE_RELATIES_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/FormeleRelaties')
_INSTANTIES_BUITENLANDS_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/InstantiesBuitenlands')
_INSTANTIES_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/Instanties')
_NIET_NEDERLANDSE_UITSPRAKEN_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/NietNederlandseUitspraken')
_PROCEDURESOORTEN_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/Proceduresoorten')
_RECHTSGEBIEDEN_URL =

Undocumented

Value
urllib.parse.urljoin(BASE_URL, '/Waardelijst/Rechtsgebieden')