All Versions
17
Latest Version
Avg Release Cycle
48 days
Latest Release
3006 days ago
Changelog History
Page 1
Changelog History
Page 1
-
v0.1.7 Changes
January 30, 2016Closed issues:
- ImportError: cannot import name 'Image' #183
- Won't let me import #182
- Install on Mac - El Capitan Failed - "Operation not permitted" #181
- โฌ๏ธ Downgrades to old versions of required packages upon installation #174
- Handling 404, 500, and other non-200 http response codes to prevent scraping error pages #142
- โฌ๏ธ Libray downgrading in installation #138
๐ Merged pull requests:
- Don't scrape error pages #190 (yprez)
- โ Added Hebrew stop words for language support #188 (alon7)
- ๐ Fix installation and build #187 (yprez)
- ๐ Fix installation docs #184 (yprez)
- ๐ท Travis CI integration #180 (yprez)
- requirements.txt - Use minimal instead of exact versions #179 (yprez)
- ๐ Handle lxml raising ValueError on node.itertext() - Python 3 #178 (yprez)
- ๐ Handle lxml raising ValueError on node.itertext() #144 (yprez)
- ๐ Parse byline fix #132 (davecrumbacher)
-
v0.1.6 Changes
January 10, 2016Closed issues:
- ๐ Critical leak in newspaper.mthreading.Worker #177
- ๐ HTMLParseError #165
- Take local paths to .html files #153
- Wall Street Journal Full Text is not Correctly Scraped #150
- Article HTML Returning Null #131
- No articles #130
- Loading Pages that use heavy javascript #127
- Login handling for premium websites #126
- Installation of nltk is failing #121
๐ Merged pull requests:
- ๐ Support urls with dots #176 (alexanderlukanin13)
- โฌ๏ธ upgrade beautifulsoup4 to 4.4.1 for python 3.5 #171 (AlJohri)
- โก๏ธ Updated requests version #170 (adrienthiery)
- Turkish Language added #169 (muratcorlu)
- โ Add macedonian stopwords #166 (dimitrovskif)
- Issue#95 added graceful string concatenation #157 (surajssd)
- ๐ fix for "jpeg error with PIL, Can't convert 'NoneType' object to str implicitly" #154 (hnykda)
- bugfix in article.py, is_valid_body #149 (ms8r)
- ๐ Fixed typo #139 (Eleonore9)
- Correct link for the Python 3 branch #136 (jtpio)
- โ Add python3-pip install step for Ubuntu #135 (irnc)
-
v0.1.5 Changes
March 04, 2015 -
v0.1.4 Changes
February 04, 2015 -
v0.1.3 Changes
January 15, 2015 -
v0.1.2 Changes
January 01, 2015 -
v0.1.1 Changes
December 27, 2014Closed issues:
- UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc #99
- TypeError: Can't convert 'bytes' object to str implicitly #98
- ๐ [Parse lxml ERR] Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. #78
- UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 11: ordinal not in range(128) #77
- article.text and keywords error #47
๐ Merged pull requests:
- ๐ Huge bugfix to aid lxml DOM parsing + remove unhelpful and excess exception messages and added tracebacks to exception logging #102 (codelucas)
- โ
Decode bytestring returned from lxml's
toString
early on before sending it out to outer code #101 (codelucas) - ๐ Fixed #78: Remove encoding tag because lxml won't accept it for unicode #97 (mhall1)
-
v0.1.0 Changes
December 17, 2014 -
v0.0.9 Changes
December 17, 2014Closed issues:
- ๐ object has no attribute clean Error when using parse method #90
- Questions #85
- [nltk_data] Error loading brown: <urlopen error [Errno -2] Name or [nltk_data] service not known> #84
- ๐ newspaper unable to find embeded youtube video #82
- Bound for memory usage #81
- Hosted demo #80
- Having issues installing due to lxml #79
- โ Add a BeautifulSoup4 parser. #44
- ๐ python 3 support request #36
๐ Merged pull requests:
- โก๏ธ update jieba to 0.35 #94 (WingGao)
- Parse was breaking in the method clean_article_html when keep_article_ht... #88 (phoenixwizard)
- split title with _ #87 (deweydu)
- โก๏ธ Update to support python3 #86 (log0ymxm)
- โ Added link to basic demo #83 (iwasrobbed)
- โ Add splitting of slash-separated titles #75 (igor-shevchenko)
-
v0.0.8 Changes
October 13, 2014Closed issues:
- ๐ Parsing Raw HTML #74
- Can't install newspaper #72
- ๐จ Refactor codebase so newspaper is actually pythonic #70
- Article.top_node == Article.clean_top_node #65
- article.movies missing 'http:' #64
- KeyError when calling newspaper.languages() #62
- ๐ Memoize Articles - Not Printing #61
- โ Add URL headers while building a "paper" #60
- ๐ AttributeError: 'module' object has no attribute 'build' #59
- ๐ Typo in newspaper.build argument "memoize_articles" #58
- issue with stopwords-tr.txt #51
- ๐ Other language support. #34
- Character encoding detection #2
๐ Merged pull requests:
- ๐ Huge refactor: entire codebase in PEP8, imports alphabetized, bugfixes, core changes #71 (codelucas)
- ๐ Meta tag extraction fixes #69 (karls)
- โ Test suite improvements #68 (karls)
- โ Test suite fixes #67 (karls)
- โช Revert "Added published date to the extractor+article" #66 (codelucas)
- โ Added published date to the extractor+article #63 (parhammmm)