module documentation

Make it easier to safely insert text into URLs, and HTML and XML data.

Should make code more readable (than combinations of cgi.escape(), urllib.quote(), ''.encode() and such)

Note that in HTML, & should always be encoded (in node text, attributes and elsehwere), so it is a good idea to structurally use nodetext() and/or attr(). ...or use a templating library that does this for you.

uri() and uri_component() are like javascript's encodeURI and encodeURIcomponent.

Function attr Escapes for use in HTML(/XML) node attributes: Replaces <, >, &, ', " with entities
Function nodetext Escapes for HTML/XML text nodes: Replaces <, >, and & with entities
Function uri Escapes for URI use:
Function uri_component Escapes for URI use: %-escapes everything (including /) so that you can shove anything, including URIs, into URI query parameters.
Function uri_dict Returns a query fragment based on a dict.
def attr(text):

Escapes for use in HTML(/XML) node attributes: Replaces <, >, &, ', " with entities

Much like html.escape, but...

  • ' and " are encoded as numeric entitities (&#x27;, &#x22; resp.) and not as &quot; for " because that's not quite universal.
  • Escapes ' (which html.escape doesn't) which you often don't need, but do if you wrap attributes in ', which is valid in XML, and various HTML. Doesn't use &apos; becase it's not defined in HTML4.

Note that to put URIs with unicode in attributes, what you want is often something roughly like :

    '<a href="?q=%s">'%attr( uri_component(q)  )

...because uri() handles the utf8 percent escaping of the unicode, attr() the attribute escaping (technically you can get away without attr because uri_component escapes a _lot_)

Passes non-ascii through. It is expected that you want to apply that to the document as a whole, or to document writing/appending.

TODO: review how I want to deal with bytes / unicode in py3 now

Parameters
texttext to escape (as str or bytes)
Returns
as bytes if it was given bytes, as str if given str
def nodetext(text, if_none=None):

Escapes for HTML/XML text nodes: Replaces <, >, and & with entities

(is actually equivalent to html.escape, previously known as cgi.escape)

Parameters
texttext to escape (as str or bytes)
if_nonea value to return if text is None (meant to simplify certain calling logic)
Returns
always returns a str, even if given a bytes (Passes unicode through)
def uri(text, same_type=True):

Escapes for URI use:

%-escapes everything except ', /, ;, and ? so that the result is still formatted/usable as a URL

Handles Unicode by by converting it into url-encoded UTF8 bytes (quote() defaults to encoding to UTF8)

Parameters
textURI, as string or bytes object
same_typeif you handed in bytes, we will return bytes (containing UTF-8 if necessary)
Returns
bytes if it was given bytes, str if given str
def uri_component(text, same_type=True):

Escapes for URI use: %-escapes everything (including /) so that you can shove anything, including URIs, into URI query parameters.

Parameters
textURI, as string or bytes object (unicode in an input str is converted into url-encoded UTF8 bytes first (quote() defaults to encoding to UTF8))
same_typeif you handed in bytes, we will return bytes (containing UTF-8 if necessary)
Returns
bytes if it was given bytes, str if given str. If same_type==false it gives it as a str always.
def uri_dict(d, join='&', astype=str):

Returns a query fragment based on a dict.

Handles Unicode input strings by converting it into url-encoded UTF8 bytes.

return type is explicitly requested by you (use str or bytes), not based on argument, as type variation within the dict could make that too magical

join is there so that you could use ; as w3 suggests, but it defaults to & Internally works in str

(you could also abuse it to avoid an attr()/nodetext() by handing it &amp; but that gets confusing)