wetsuite.datacollect.rechtspraaknl

module documentation

Fetches data from rechtspraak.nl's API

Note that the data.rechtspraak.nl/uitspraken/zoeken API is primarily for ranges - they do _not_ seem to allow text searches like the web interface does.

https://www.rechtspraak.nl/Uitspraken/paginas/open-data.aspx

If you want to save time, and server load for them, you would probably start with fetching OpenDataUitspraken.zip via https://www.rechtspraak.nl/Uitspraken/paginas/open-data.aspx and inserting those so you can avoid 3+ million fetches.

There is an API at https://uitspraken.rechtspraak.nl/api/zoek that backs the website search I'm not sure whether we're supposed to use it like this, but it's one of the better APIs I've seen in this context :)

Function	`parse_content`	Parse the type of XML you get when you stick an ECLI onto https://data.rechtspraak.nl/uitspraken/content?id= and tries to give you metadata and text. CONSIDER: separating those
Function	`parse_instanties`	Parse the 'instanties' value list
Function	`parse_instanties_buitenlands`	Parse the 'buitenlandse instanties' value list
Function	`parse_nietnederlandseuitspraken`	Parse the 'niet-nederpanse uitspraken' value list
Function	`parse_proceduresoorten`	Parse the 'proceduresoorten' value list (assmed to be fixed).
Function	`parse_rechtsgebieden`	Parse the 'rechtsgebieden' value list (assumed to be fixed), the data of which seems to be a depth-2 tree.
Function	`parse_search_results`	Takes search result etree (as given by search()), and returns a list of dicts like:
Function	`search`	Post a search to the public API on data.rechtspraak.nl, based on a dict of parameters.
Constant	`BASE_URL`	base URL for search as well as value lists
Function	`_para_text`	Given the open-rechtspraak XML, specifically the uitspraak or conclusie node under the root,
Constant	`_FORMELE_RELATIES_URL`	Undocumented
Constant	`_INSTANTIES_BUITENLANDS_URL`	Undocumented
Constant	`_INSTANTIES_URL`	Undocumented
Constant	`_NIET_NEDERLANDSE_UITSPRAKEN_URL`	Undocumented
Constant	`_PROCEDURESOORTEN_URL`	Undocumented
Constant	`_RECHTSGEBIEDEN_URL`	Undocumented

def parse_content(tree): ¶

Parse the type of XML you get when you stick an ECLI onto https://data.rechtspraak.nl/uitspraken/content?id= and tries to give you metadata and text. CONSIDER: separating those

Returns

a dict with TODO

TODO: actually read the schema - see https://www.rechtspraak.nl/Uitspraken/paginas/open-data.aspx

def parse_instanties(): ¶

Parse the 'instanties' value list

Returns

a list of flat dicts, with keys Naam, Afkorting, Type, BeginDate, Identifier, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/AG DH',
           'Naam': "Ambtenarengerecht 's-Gravenhage",
      'Afkorting': 'AGSGR',
           'Type': 'AndereGerechtelijkeInstantie',
      'BeginDate': '1913-01-01'},

def parse_instanties_buitenlands(): ¶

Parse the 'buitenlandse instanties' value list

Returns

a list of flat dicts, with keys Naam, Identifier, Afkorting, Type, BeginDate, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/instantie/ES/#AudienciaNacionalNationaalHof',
           'Naam': 'Audiencia Nacional (Nationaal Hof)',
      'Afkorting': 'XX',
           'Type': 'BuitenlandseInstantie',
      'BeginDate': '1950-01-01'}

def parse_nietnederlandseuitspraken(): ¶

Parse the 'niet-nederpanse uitspraken' value list

Returns

a list of items like:

    {'id': 'ECLI:CE:ECHR:2000:0921JUD003224096', 'ljn': ['AD4213']},
    {'id': 'ECLI:EU:C:2000:679',                 'ljn': ['AD4227']},
    {'id': 'ECLI:EU:C:2000:689',                 'ljn': ['AD4228']},
    {'id': 'ECLI:EU:C:2001:112',                 'ljn': ['AD4244', 'AL3652']},

def parse_proceduresoorten(): ¶

Parse the 'proceduresoorten' value list (assmed to be fixed).

Returns

A list of flat dicts, with keys Naam, Identifier, for example:

    {'Identifier': 'http://psi.rechtspraak.nl/procedure#artikel81ROzaken', 'Naam': 'Artikel 81 RO-zaken'}

def parse_rechtsgebieden(): ¶

Parse the 'rechtsgebieden' value list (assumed to be fixed), the data of which seems to be a depth-2 tree.

Returns

as a dict with items like:

    'http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht': ['Bestuursrecht'],

and:

    'http://psi.rechtspraak.nl/rechtsgebied#bestuursrecht_ambtenarenrecht': ['Ambtenarenrecht', 'Bestuursrecht'],

Where

Bestuursrecht is a grouping of this and more,
Mededingingsrecht one of several specific parts of it

def parse_search_results(tree): ¶

Takes search result etree (as given by search()), and returns a list of dicts like:

    {      'ecli': 'ECLI:NL:GHARL:2022:7129',
          'title': 'ECLI:NL:GHARL:2022:7129, Gerechtshof Arnhem-Leeuwarden, 16-08-2022, 200.272.381/01',
        'summary': 'some text made shorter for this docstring example',
        'updated': '2023-01-01T13:29:23Z',
           'link': 'https://uitspraken.rechtspraak.nl/InzienDocument?id=ECLI:NL:GHARL:2022:7129',
            'xml': 'https://data.rechtspraak.nl/uitspraken/content?id=ECLI:NL:GHARL:2022:7129',
    }

Notes:

'xml' is augmented based on the ecli and does not come from the search results
keys may be missing (in practice probably just summary?)

def search(params): ¶

Post a search to the public API on data.rechtspraak.nl, based on a dict of parameters.

See also:

https://www.rechtspraak.nl/SiteCollectionDocuments/Technische-documentatie-Open-Data-van-de-Rechtspraak.pdf

Note that when when you give it nonsensical parameters, like date=2022-02-30, the service won't return valid XML, so the XML parse raises an exception.

Parameters
params	parameters like: max (default is 1000) from zero-based, defaults is 0 sort by modification date, ASC (default, oldest first) or DESC type 'Uitspraak' or 'Conclusie' date yyyy-mm-dd (once for 'on this date', twice for a range) modified yyyy-mm-ddThh:mm:ss (once for a 'since then to now', twice for a range) return DOC for things where there are documents; if omitted it also fetches things for which there is only metadata replaces fetch ECLI for an LJN subject URI of a rechtsgebied creator These are handed to urlencode, so could be either a list of tuples, or a dict, but because you are likely to repeat variables to specify ranges, 'list of tuples' should be your habit, e.g.: [ ("modified", "2023-01-01), ("modified", "2023-01-05) ]
Returns
etree object for the search (or raises an exception) CONSIDER: returning only the urls

BASE_URL: str = ¶

base URL for search as well as value lists

Value

'https://data.rechtspraak.nl/'

def _para_text(treenode): ¶

Given the open-rechtspraak XML, specifically the uitspraak or conclusie node under the root,

_FORMELE_RELATIES_URL = ¶

Undocumented

Value

urllib.parse.urljoin(BASE_URL, '/Waardelijst/FormeleRelaties')

_INSTANTIES_BUITENLANDS_URL = ¶

Undocumented

Value

urllib.parse.urljoin(BASE_URL, '/Waardelijst/InstantiesBuitenlands')

_INSTANTIES_URL = ¶

Undocumented

Value

urllib.parse.urljoin(BASE_URL, '/Waardelijst/Instanties')

_NIET_NEDERLANDSE_UITSPRAKEN_URL = ¶

Undocumented

Value

urllib.parse.urljoin(BASE_URL, '/Waardelijst/NietNederlandseUitspraken')

_PROCEDURESOORTEN_URL = ¶

Undocumented

Value

urllib.parse.urljoin(BASE_URL, '/Waardelijst/Proceduresoorten')

_RECHTSGEBIEDEN_URL = ¶

Undocumented

Value

urllib.parse.urljoin(BASE_URL, '/Waardelijst/Rechtsgebieden')