Removed one (from pdf.py) of the two Destination classes. Both classes had the same name, but were slightly different in content, causing some errors. (from Janne Vanhala)
Corrected and Expanded README file to demonstrate PdfFileMerger
Added filter for LZW encoded streams (from Michal Horejsek)
PyPDF2 issue tracker enabled on Github to allow community discussion and collaboration
Note: This ChangeLog has not been kept up-to-date for a while. Hopefully we can keep better track of it from now on. Some of the changes listed here come from previous versions 1.14 and 1.15; they were only vaguely defined. With the new _version.py file we should have more structured and better documented versioning from now on.
Fixed encrypt() method (from Martijn The)
Improved error handling on PDFs with truncated streams (from cecilkorik)
Python 3 support (from kushal-kumaran)
Fixed example code in README (from Jeremy Bethmont)
Fixed an bug caused by DecimalError Exception (from Adam Morris)
Many other bug fixes and features by:
jeansch Anton Vlasenko Joseph Walton Jan Oliver Oelerich Fabian Henze And any others I missed. Thanks for contributing!
Fixed a typo in code for reading a "\b" escape character in strings.
Improved repr in FloatObject.
Fixed a bug in reading octal escape sequences in strings.
Added getWidth and getHeight methods to the RectangleObject class.
Fixed compatibility warnings with Python 2.4 and 2.5.
Added addBlankPage and insertBlankPage methods on PdfFileWriter class.
Fixed a bug with circular references in page's object trees (typically annotations) that prevented correctly writing out a copy of those pages.
New merge page functions allow application of a transformation matrix.
To all patch contributors: I did a poor job of keeping this ChangeLog up-to-date for this release, so I am missing attributions here for any changes you submitted. Sorry! I'll do better in the future.
Added support for XMP metadata.
Fix reading files with xref streams with multiple /Index values.
Fix extracting content streams that use graphics operators longer than 2 characters. Affects merging PDF files.
Patch from Hartmut Goebel to permit RectangleObjects to accept NumberObject or FloatObject values.
PDF compatibility fixes.
Fix to read object xref stream in correct order.
Fix for comments inside content streams.
Text strings from PDF files are returned as Unicode string objects when pyPdf determines that they can be decoded (as UTF-16 strings, or as PDFDocEncoding strings). Unicode objects are also written out when necessary. This means that string objects in pyPdf can be either generic.ByteStringObject instances, or generic.TextStringObject instances.
The extractText method now returns a unicode string object.
All document information properties now return unicode string objects. In the event that a document provides docinfo properties that are not decoded by pyPdf, the raw byte strings can be accessed with an "_raw" property (ie. title_raw rather than title)
generic.DictionaryObject instances have been enhanced to be easier to use. Values coming out of dictionary objects will automatically be de-referenced (.getObject will be called on them), unless accessed by the new "raw_get" method. DictionaryObjects can now only contain PdfObject instances (as keys and values), making it easier to debug where non-PdfObject values (which cannot be written out) are entering dictionaries.
Support for reading named destinations and outlines in PDF files. Original patch by Ashish Kulkarni.
Stream compatibility reading enhancements for malformed PDF files.
Cross reference table reading enhancements for malformed PDF files.
Replace some "assert" statements with error raising.
Minor optimizations to FlateDecode algorithm increase speed when using PNG predictors.
Fix several serious bugs introduced in version 1.8, caused by a failure to run through our PDF test suite before releasing that version.
Fix bug in NullObject reading and writing.
Add support for decryption with the standard PDF security handler. This allows for decrypting PDF files given the proper user or owner password.
Add support for encryption with the standard PDF security handler.
Add new pythondoc documentation.
Fix bug in ASCII85 decode that occurs when whitespace exists inside the two terminating characters of the stream.
Fix a bug when using a single page object in two PdfFileWriter objects.
Adjust PyPDF to be tolerant of whitespace characters that don't belong during a stream object.
Add documentInfo property to PdfFileReader.
Add numPages property to PdfFileReader.
Add pages property to PdfFileReader.
Add extractText function to PdfFileReader.
Add basic support for comments in PDF files. This allows us to read some ReportLab PDFs that could not be read before.
Add "auto-repair" for finding xref table at slightly bad locations.
New StreamObject backend, cleaner and more powerful. Allows the use of stream filters more easily, including compressed streams.
Add a graphics state push/pop around page merges. Improves quality of page merges when one page's content stream leaves the graphics in an abnormal state.
Add PageObject.compressContentStreams function, which filters all content streams and compresses them. This will reduce the size of PDF pages, especially after they could have been decompressed in a mergePage operation.
Support inline images in PDF content streams.
Add support for using .NET framework compression when zlib is not available. This does not make pyPdf compatible with IronPython, but it is a first step.
Add support for reading the document information dictionary, and extracting title, author, subject, producer and creator tags.
Add patch to support NullObject and multiple xref streams, from Bradley Lawrence.