PyPDF2 v1.10 Release Notes

Release Date: 2007-10-04 // over 16 years ago
    • Text strings from PDF files are returned as Unicode string objects when pyPdf determines that they can be decoded (as UTF-16 strings, or as PDFDocEncoding strings). Unicode objects are also written out when necessary. This means that string objects in pyPdf can be either generic.ByteStringObject instances, or generic.TextStringObject instances.

    • The extractText method now returns a unicode string object.

    • All document information properties now return unicode string objects. In the event that a document provides docinfo properties that are not decoded by pyPdf, the raw byte strings can be accessed with an "_raw" property (ie. title_raw rather than title)

    • generic.DictionaryObject instances have been enhanced to be easier to use. Values coming out of dictionary objects will automatically be de-referenced (.getObject will be called on them), unless accessed by the new "raw_get" method. DictionaryObjects can now only contain PdfObject instances (as keys and values), making it easier to debug where non-PdfObject values (which cannot be written out) are entering dictionaries.

    • Support for reading named destinations and outlines in PDF files. Original patch by Ashish Kulkarni.

    • Stream compatibility reading enhancements for malformed PDF files.

    • Cross reference table reading enhancements for malformed PDF files.

    • Encryption documentation.

    • Replace some "assert" statements with error raising.

    • Minor optimizations to FlateDecode algorithm increase speed when using PNG predictors.