Helps interact with the EUR-Lex website and APIs.
Function | extract |
Extract data from formatted HTML from the website itself. |
Function | fetch |
Intends to query the SPARQL endpoint to ask for most CELEXes of a specific type, (defaulting to court judgments for no particular reason) |
Extract data from formatted HTML from the website itself.
Written for JUDG pages, probably needs work for others.
Also, there are plenty of assumptions in this code that probably won't hold over time, so for serious projects you should probably use a data API instead.
TODO: see how language-sensitive this is. CONSIDER: extract more link hrefs (would probably need to hand in page url to)
Parameters | |
htmlbytes | the page, as a bytes object |
Returns | |
a nested structure |
Intends to query the SPARQL endpoint to ask for most CELEXes of a specific type, (defaulting to court judgments for no particular reason)
TODO: fetch values e.g. at https://github.com/SEMICeu/Excel-to-CPSVAP-RDF-transformation/blob/master/page-objects/utils/CPSVtemplateWithCodelists.json in handier form
Asks to give its semantic results as JSON data, which we parse and return as a python structure.
Parameters | |
typ | the type to fetch, e.g.
|
Returns | |
a (possibly-many-item'd) nested structure (python structure, loaded from JSON) The structure you get back looks like: ( see also https://www.w3.org/TR/2013/REC-sparql11-results-json-20130321/ ) : { 'head': { 'link': [], 'vars': ['work', 'type', 'celex', 'date', 'force'] }, 'results': { 'distinct': False, 'ordered': True, 'bindings': [ { 'work':{ 'type':'uri', 'value':'http://publications.europa.eu/resource/cellar/1e3100ce-8a71-433a-8135-15f5cc0e927c' }, 'type':{ 'type':'uri', 'value':'http://publications.europa.eu/resource/authority/resource-type/JUDG' }, 'celex':{ 'type':'typed-literal', 'value':'61996CJ0080', 'datatype': 'http://www.w3.org/2001/XMLSchema#string' }, 'date': { 'type': 'typed-literal', 'value':'1998-01-15', 'datatype': 'http://www.w3.org/2001/XMLSchema#date' } }, # ...one of these for each result ] } } |