Fetch and load already-created datasets that we provide.
You may also be interested in the example notebooks that instroducing most datasets.
As this is often structured data, each dataset may work a little differently, so there is an describe() to get you started - each dataset should fill that out.
Note that these datasets are separate from the code, so availability may change.
From __init__.py
:
Class |
|
If you're looking for details about the specific dataset, look at the .description |
Function | description |
Fetch the description field for a dataset name, for a specifically named dataset. Simple, but less typing than picking it out yourself. |
Function | fetch |
Index is expected to be a list of dicts, each with keys including |
Function | generated |
Used when generating datasets |
Function | list |
Fetch index, report (only) dataset names. |
Function | load |
Takes a dataset name (that you learned of from the index), downloads it if necessary - after the first time it's cached in your home directory |
Function | print |
Print short summary per dataset, on stdout. A little more to go on than just the names from list_datasets(), a little less work than shifting through the dicts for each yourself, but only useful in notebooks or from the console... |
Function | _data |
Given a path to a data file, return the data in python-object form -- and and description (based on contents). This wraps opening and dealing with file type, and separates that from the download phase. |
Function | _load |
Takes a dataset name (that you learned of from the index), Downloads it if necessary - after the first time it's cached in your home directory |
Constant | _INDEX |
Undocumented |
Variable | _index |
Undocumented |
Variable | _index |
Undocumented |
Variable | _index |
Undocumented |
Fetch the description field for a dataset name, for a specifically named dataset. Simple, but less typing than picking it out yourself.
Index is expected to be a list of dicts, each with keys including
- url
- version (should probably become semver)
- description_short one-line summary of what this is
- description longer description, perhaps with some example data
- download_size how much transfer you'll need
- real_size Disk storage we expect to need once decompressed
- download_size_human, real_size_human: more readable version, e.g. where areal size might be the integer 397740, the human size would be 388KiB
- type content type of dataset
Fetch index, report (only) dataset names.
If you care about the details in data form, use fetch_index.
If you care about the details in a console or notebook, see orprint_dataset_summary.
Returns | |
a list of strings, e.g. ['bwb-mostrecent-xml','woo_besluiten_docs_text'] |
Takes a dataset name (that you learned of from the index), downloads it if necessary - after the first time it's cached in your home directory
Wraps _load_bare, which does most of the heavy lifting.
This primarily adds what is necessary to load that downloaded thing and give it to you as a usable Dataset object
Parameters | |
datasetstr | Undocumented |
verbose | tells you more about the download (on stderr) Can be given True or False. By default (None), we try to detect whether we are in an interactive context, and print only if we are. |
force | whether to remove the current contents before fetching dataset naming should prevent the need for this (except if you're the wetsuite programmer) |
check | Undocumented |
Returns | |
a Dataset object - which is a container object with little more than
|
Print short summary per dataset, on stdout. A little more to go on than just the names from list_datasets(), a little less work than shifting through the dicts for each yourself, but only useful in notebooks or from the console
Given a path to a data file, return the data in python-object form -- and and description (based on contents). This wraps opening and dealing with file type, and separates that from the download phase.
Takes a dataset name (that you learned of from the index), Downloads it if necessary - after the first time it's cached in your home directory
If compressed, will uncompress. Does not think about the type of data
Note: You normally would use load(), which takes the same name but gives you a usable object, instead of just a filename.
Returns | |
the filename we fetched to |