sanitize alternatives and similar packages
Based on the "Web Content Extracting" category.
Alternatively, view sanitize alternatives based on common mentions on social networks and blogs.
-
TWINT
An advanced Twitter scraping & OSINT tool written in Python that doesn't use Twitter's API, allowing you to scrape a user's followers, following, Tweets and more while evading most API limitations. -
newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs: -
python-readability
fast python port of arc90's readability tool, updated to match latest readability.js! -
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments -
inscriptis -- HTML to text conversion library, command line client and Web service
2.6 8.5 sanitize VS inscriptis -- HTML to text conversion library, command line client and Web serviceA python based HTML to text conversion library, command line client and Web service.
WorkOS - The modern identity platform for B2B SaaS
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of sanitize or a related project?
README
Sanitize
sanitize
is a Python module for making sure various things (e.g. HTML) are safe to use.
It was originally written by Mark Pilgrim and is distributed under the BSD license.
Usage
>>> from sanitize import HTML
>>> HTML('<b>hello')
'<b>hello</b>'
>>> HTML('<img>')
'<img />'
>>> HTML(("<b><b><b>hello")
... )
'<b><b><b>hello</b></b></b>'
>>> HTML('<img src="foo"/')
''
>>> HTML('<input type="checkbox" checked>')
'<input type="checkbox" checked="checked" />'
>>> # dangerous tags (a small sample)
...
>>> HTML('safe<applet code="foo.class" codebase="http://example.com/"></applet> <b>description</b>')
'safe <b>description</b>'
>>> HTML('safe<frameset rows="*"><frame src="http://example.com/"></frameset> <b>description</b>')
'safe <b>description</b>'
>>> # bad protocols (a small sample)
>>> HTML('<a href="java' + chr(1) + 'script:foo">bar</a>')
'<a href="#foo">bar</a>'
>>> HTML('<a href="vbscript:foo">bar</a>')
'<a href="#foo">bar</a>'
>>>
To see more usage examples see tests/test_sanitize_html.py
.
Installation
python-sanitize
is available on pypi
http://pypi.python.org/pypi/sanitize
So easily install it by pip
:
pip install sanitize
Or by easy_install
:
$ easy_install sanitize
Another way is by cloning python-sanitize
's git repository
$ git clone git://github.com/Alir3z4/python-sanitize.git
Then install it by running
$ python setup.py install
Tests
To run unit tests:
$ python setup.py test
License
Sanitize
is distributed under BSD license.
*Note that all licence references and agreements mentioned in the sanitize README section above
are relevant to that project's source code only.