wetsuite API Documentation Modules Classes Names
Clear Help

Module Index

  • wetsuite - wetsuite is a set of tools that helps researchers and others deal with legal and other government documents
    • datacollect - A collection of helper code that assists collection of data from varied sources.
      • eurlex - Helps interact with the EUR-Lex website and APIs.
      • koop_frbr - Code that accesses the https://repository.overheid.nl/frbr/ data in a bulk way. ...pending removal.
      • koop_sru - An interface to the SRU repositories managed by KOOP
      • rechtspraaknl - Fetches data from rechtspraak.nl's API
      • rijksoverheid_nl_documenten - Code to help fetch things from https://www.rijksoverheid.nl/documenten
      • sru - Talks to SRU repositories, mainly underlies koop_sru
      • tweedekamer_nl - Fetches from the APIs provided by opendata.tweedekamer.nl
    • datasets - Fetch and load already-created datasets that we provide. (to see a list of actual datasets, look for the wetsuite_datasets.ipynb notebook)
    • extras - code that is not considered core functionality, and not as supported, yet which you may find use for nonetheless
      • gerechtcodes - Some information about the gerechtcodes use in ECLIs
      • lawref - This code attempts to make it easier to deal with human variation in references to laws.
      • ocr - Extract text from images, mainy aimed at PDFs that contain pictures of documents
      • pdf - Query PDFs about the text objects that they contain (which is not always clean, structured, correct, or present at all)
      • word_cloud - Create wordcloud images; mostly a thin wrapper module around an existing wordcloud module.
    • helpers - A collection of small singular tools, useful when composing more complex tasks
      • akn - Lookup of AKN
      • collocation - Quick and dirty version of some collocation code.
      • date - Try to deal with varied forms of dates and times, and ease things like "I would like to specify a range of days in a particular format" (e.g. for bulk fetching), and such.
      • escape - Make it easier to safely insert text into URLs, and HTML and XML data.
      • etree - Helpers to deal with XML data, largely a wrapper around lxml and its ElementTree interface.
      • format - Formatting varied types of values into text, (and sometimes parsing the same), mostly for readability
      • koop_parse - Data and metadata parsing that is probably specific to KOOP's SRU repositories.
      • lazy - Various functions that allow you to be (a little too) lazy - less typing and/or less thinking.
      • localdata - This is intended to store store collections of data on disk, relatively unobtrusive to use (better than e.g. lots of files), and with quick random access (better than e.g. JSONL).
      • meta - Things that parse metadata.
      • net - network related helper functions, such as fetching from URLs
      • notebook - Tools for jupyter/ipython-style notebooks, and detection that you are, or are _not_, using one right now.
      • patterns - Extracting specific patterns of text.
      • shellcolor - (should arguably be in extras) Eases production of colors in the terminal, mostly for a few command line debug tools.
      • spacy - helper functions related to spacy natural language parsing.
      • split - This module tries to wrangle distinct types of documents for you, from HTML to PDF, from varied specific sources, into plain text, so that you can consume it more easily.
      • strings - mostly-basic string helper functions
      • util - General utility functions, like "give me a path to where wetsuite can store data" and debug tools to the end of inspecting data.

API Documentation for wetsuite, generated by pydoctor 22.9.1 at 2025-04-16 13:53:01.