newspaper v0.1.1 Release Notes
Release Date: 2014-12-27 // over 9 years ago-
Closed issues:
- UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc #99
- TypeError: Can't convert 'bytes' object to str implicitly #98
- ๐ [Parse lxml ERR] Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. #78
- UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11: ordinal not in range(128) #77
- article.text and keywords error #47
๐ Merged pull requests:
- ๐ Huge bugfix to aid lxml DOM parsing + remove unhelpful and excess exception messages and added tracebacks to exception logging #102 (codelucas)
- โ
Decode bytestring returned from lxml's
toString
early on before sending it out to outer code #101 (codelucas) - ๐ Fixed #78: Remove encoding tag because lxml won't accept it for unicode #97 (mhall1)