Make it easier to safely insert text into URLs, and HTML and XML data.
Should make code more readable (than combinations of cgi.escape(), urllib.quote(), ''.encode() and such)
Note that in HTML, & should always be encoded (in node text, attributes and elsehwere), so it is a good idea to structurally use nodetext() and/or attr(). ...or use a templating library that does this for you.
uri() and uri_component() are like javascript's encodeURI and encodeURIcomponent.
Function | attr |
Escapes for use in HTML(/XML) node attributes: Replaces <, >, &, ', " with entities |
Function | nodetext |
Escapes for HTML/XML text nodes: Replaces <, >, and & with entities |
Function | uri |
Escapes for URI use: |
Function | uri |
Escapes for URI use: %-escapes everything (including /) so that you can shove anything, including URIs, into URI query parameters. |
Function | uri |
Returns a query fragment based on a dict. |
Escapes for use in HTML(/XML) node attributes: Replaces <, >, &, ', " with entities
Much like html.escape, but...
- ' and " are encoded as numeric entitities (', " resp.) and not as " for " because that's not quite universal.
- Escapes ' (which html.escape doesn't) which you often don't need, but do if you wrap attributes in ', which is valid in XML, and various HTML. Doesn't use ' becase it's not defined in HTML4.
Note that to put URIs with unicode in attributes, what you want is often something roughly like :
'<a href="?q=%s">'%attr( uri_component(q) )
...because uri() handles the utf8 percent escaping of the unicode, attr() the attribute escaping (technically you can get away without attr because uri_component escapes a _lot_)
Passes non-ascii through. It is expected that you want to apply that to the document as a whole, or to document writing/appending.
TODO: review how I want to deal with bytes / unicode in py3 now
Parameters | |
text | text to escape (as str or bytes) |
Returns | |
as bytes if it was given bytes, as str if given str |
Escapes for HTML/XML text nodes: Replaces <, >, and & with entities
(is actually equivalent to html.escape, previously known as cgi.escape)
Parameters | |
text | text to escape (as str or bytes) |
if | a value to return if text is None (meant to simplify certain calling logic) |
Returns | |
always returns a str, even if given a bytes (Passes unicode through) |
Escapes for URI use:
%-escapes everything except ', /, ;, and ? so that the result is still formatted/usable as a URL
Handles Unicode by by converting it into url-encoded UTF8 bytes (quote() defaults to encoding to UTF8)
Parameters | |
text | URI, as string or bytes object |
same | if you handed in bytes, we will return bytes (containing UTF-8 if necessary) |
Returns | |
bytes if it was given bytes, str if given str |
Escapes for URI use: %-escapes everything (including /) so that you can shove anything, including URIs, into URI query parameters.
Parameters | |
text | URI, as string or bytes object (unicode in an input str is converted into url-encoded UTF8 bytes first (quote() defaults to encoding to UTF8)) |
same | if you handed in bytes, we will return bytes (containing UTF-8 if necessary) |
Returns | |
bytes if it was given bytes, str if given str. If same_type==false it gives it as a str always. |
Returns a query fragment based on a dict.
Handles Unicode input strings by converting it into url-encoded UTF8 bytes.
return type is explicitly requested by you (use str or bytes), not based on argument, as type variation within the dict could make that too magical
join is there so that you could use ; as w3 suggests, but it defaults to & Internally works in str
(you could also abuse it to avoid an attr()/nodetext() by handing it & but that gets confusing)