Changelog History
-
v1.4.0 Changes
October 04, 2017โก๏ธ Much update!
- ๐ง Confirmed and added support for OS/X and Linux thanks to michellemorales and j-setiawan.
- ๐ Updated documentation to the current state of things. Still work to be done there.
- โ Removed 'bad file' functionality as it wasn't working as intended and wasn't important anyway. That's what error logs are for.
- Resolving
<base>
tags to grab links that wouldn't have been recognized before. Thanks lxml! - โ Added an optional (on by default) check for file size. Won't download any files larger than 500 MB, assuming the site returns a
Content-Length
header. - โ Added Firefox (on Ubuntu) as an option for browser spoofing.
spidy.zip
contains justcrawler.py
andconfig/
, while the source code archives contain all files. -
v1.3 Changes
September 14, 2017๐ Final 1.3.0 release. Added error handling back in - no changes needed.
โก๏ธ Optimized all file creation and loading. Everything is now saved with UTF-8 encoding, allowing for foreign characters and EMOJI in pages.
-
v1.3-alpha Changes
September 14, 2017โก๏ธ Optimized all file creation and loading. Everything is now saved with UTF-8 encoding, allowing for foreign characters and EMOJI in pages.
In Alpha as the error-handling system is being slightly redesigned. Still functional however!
-
v1.2 Changes
September 07, 2017โ Added domain restrictions. Crawling can now be limited to a certain domain, such as
wsj.com
,https://www.wsj.com
, orhttps://www.wsj.com/article
. Can be set when entering configuration settings or in the config files.
๐ Also more bugfixes and MIME types because those are cool. -
v1.0 Changes
August 24, 2017๐ The first official release of spidy!
A GUI is in the works, as well as many more awesome features.spidy.zip
contains only the files necessary to run the crawler, while the source code downloads contain all the things.