csvkit v1.0.0 Release Notes

Release Date: 2016-12-27 // over 7 years ago
  • ๐Ÿš€ This is the first major release of csvkit in a very long time. The entire backend has been rewritten to leverage the agate <http://agate.rtfd.io>_ data analysis library, which was itself inspired by csvkit. The new backend provides better type detection accuracy, as well as some new features.

    ๐Ÿš€ Because of the long and complex cycle behind this release, the list of changes should not be considered exhaustive. In particular, the output format of some tools may have changed in small ways. Any existing data pipelines using csvkit should be tested as part of the upgrade.

    ๐Ÿš€ Much of the credit for this release goes to James McKinney <https://github.com/jpmckinney>_, who has almost single-handedly kept the csvkit fire burning for a year. Thanks, James!

    Backwards-incompatible changes:

    • ๐Ÿ›  :doc:/scripts/csvjoin now renames duplicate columns with integer suffixes to prevent collisions in output.
    • :doc:/scripts/csvsql now generates DateTime columns instead of Time columns.
    • :doc:/scripts/csvsql now generates Decimal columns instead of Integer, BigInteger, and Float columns.
    • :doc:/scripts/csvsql no longer generates max-length constraints for text columns.
    • The --doublequote long flag is gone, and the -b short flag is now an alias for --no-doublequote.
    • When using the --columns or --not-columns options, you must not have spaces around the comma-separated values, unless the column names contain spaces.
    • When sorting, null values are now greater than other values instead of less than.
    • ๐Ÿšš CSVKitReader, CSVKitWriter, CSVKitDictReader, and CSVKitDictWriter have been removed. Use agate.csv.reader, agate.csv.writer, agate.csv.DictReader and agate.csv.DictWriter.
    • โฌ‡๏ธ Drop Python 2.6 support (end-of-life was October 29, 2013).
    • โฌ‡๏ธ Drop support for older versions of PyPy.
    • If --no-header-row is set, the output will have column names a, b, c, etc. instead of column1, column2, column3, etc.
    • csvlook renders a simpler, markdown-compatible table.

    ๐Ÿ‘Œ Improvements:

    • โœ… csvkit is now tested against Python 3.6. (#702)
    • import csvkit as csv will now defer to agate readers/writers.
    • ๐Ÿ‘ :doc:/scripts/csvgrep supports --no-header-row.
    • ๐Ÿ‘ :doc:/scripts/csvjoin supports --no-header-row.
    • :doc:/scripts/csvjson streams input and output if the --stream and --no-inference flags are set.
    • ๐Ÿ‘ :doc:/scripts/csvjson supports --snifflimit and --no-inference.
    • :doc:/scripts/csvlook adds --max-rows, --max-columns and --max-column-width options.
    • ๐Ÿ‘ :doc:/scripts/csvlook supports --snifflimit and --no-inference.
    • ๐Ÿ‘ :doc:/scripts/csvpy supports --agate to read a CSV file into an agate table.
    • โœ… csvsql supports custom SQLAlchemy dialects <http://docs.sqlalchemy.org/en/latest/dialects/>_.
    • ๐Ÿ‘ :doc:/scripts/csvstat supports --names.
    • :doc:/scripts/in2csv CSV-to-CSV conversion streams input and output if the --no-inference flag is set.
    • :doc:/scripts/in2csv CSV-to-CSV conversion uses agate.Table.
    • :doc:/scripts/in2csv GeoJSON conversion adds columns for geometry type, longitude and latitude.
    • ๐Ÿ“š Documentation: Update tool usage, remove shell prompts, document connection string, correct typos.

    ๐Ÿ›  Fixes:

    • ๐Ÿ›  Fixed numerous instances of open files not being closed before utilities exit.
    • ๐Ÿ”„ Change -b, --doublequote to --no-doublequote, as doublequote is True by default.
    • :doc:/scripts/in2csv DBF conversion works with Python 3.
    • :doc:/scripts/in2csv correctly guesses format when file has an uppercase extension.
    • :doc:/scripts/in2csv correctly interprets --no-inference.
    • ๐Ÿ›  :doc:/scripts/in2csv again supports nested JSON objects (fixes regression).
    • ๐Ÿ–จ :doc:/scripts/in2csv with --format geojson will print a JSON object instead of OrderedDict([(...)]).
    • ๐Ÿ :doc:/scripts/csvclean with standard input works on Windows.
    • :doc:/scripts/csvgrep returns the input file's line numbers if the --linenumbers flag is set.
    • :doc:/scripts/csvgrep can match multiline values.
    • :doc:/scripts/csvgrep correctly operates on ragged rows.
    • :doc:/scripts/csvsql correctly escapes %` characters in SQL queries.
    • :doc:/scripts/csvsql adds standard input only if explicitly requested.
    • ๐Ÿ‘ :doc:/scripts/csvstack supports stacking a single file.
    • :doc:/scripts/csvstat always reports frequencies.
    • The any_match argument of FilteringCSVReader now works correctly.
    • All tools handle empty files without error.