wetsuite
- wetsuite is a set of tools that helps researchers and others deal with legal and other government documentsdatacollect
- A collection of helper code that assists collection of data from varied sources.eurlex
- Helps interact with the EUR-Lex website and APIs.koop_frbr
- Code that accesses the https://repository.overheid.nl/frbr/ data in a bulk way. ...pending removal.koop_sru
- An interface to the SRU repositories managed by KOOPrechtspraaknl
- Fetches data from rechtspraak.nl's APIrijksoverheid_nl_documenten
- Code to help fetch things from https://www.rijksoverheid.nl/documentensru
- Talks to SRU repositories, mainly underlies koop_sru
tweedekamer_nl
- Fetches from the APIs provided by opendata.tweedekamer.nldatasets
- Fetch and load already-created datasets that we provide. (to see a list of actual datasets, look for the wetsuite_datasets.ipynb notebook)extras
- code that is not considered core functionality, and not as supported, yet which you may find use for nonethelessgerechtcodes
- Some information about the gerechtcodes use in ECLIslawref
- This code attempts to make it easier to deal with human variation in references to laws.ocr
- Extract text from images, mainy aimed at PDFs that contain pictures of documentspdf
- Query PDFs about the text objects that they contain (which is not always clean, structured, correct, or present at all)word_cloud
- Create wordcloud images; mostly a thin wrapper module around an existing wordcloud module.helpers
- A collection of small singular tools, useful when composing more complex tasksakn
- Lookup of AKNcollocation
- Quick and dirty version of some collocation code.date
- Try to deal with varied forms of dates and times, and ease things like "I would like to specify a range of days in a particular format" (e.g. for bulk fetching), and such.escape
- Make it easier to safely insert text into URLs, and HTML and XML data.etree
- Helpers to deal with XML data, largely a wrapper around lxml and its ElementTree interface.format
- Formatting varied types of values into text, (and sometimes parsing the same), mostly for readabilitykoop_parse
- Data and metadata parsing that is probably specific to KOOP's SRU repositories.lazy
- Various functions that allow you to be (a little too) lazy - less typing and/or less thinking.localdata
- This is intended to store store collections of data on disk, relatively unobtrusive to use (better than e.g. lots of files), and with quick random access (better than e.g. JSONL).meta
- Things that parse metadata.net
- network related helper functions, such as fetching from URLsnotebook
- Tools for jupyter/ipython-style notebooks, and detection that you are, or are _not_, using one right now.patterns
- Extracting specific patterns of text.shellcolor
- (should arguably be in extras) Eases production of colors in the terminal, mostly for a few command line debug tools.spacy
- helper functions related to spacy natural language parsing.split
- This module tries to wrangle distinct types of documents for you, from HTML to PDF, from varied specific sources, into plain text, so that you can consume it more easily.strings
- mostly-basic string helper functionsutil
- General utility functions, like "give me a path to where wetsuite can store data" and debug tools to the end of inspecting data.