SciKit-Learn Laboratory v1.5 Release Notes

Release Date: 2017-12-14 // over 6 years ago
  • ๐Ÿš€ This is a major new release of SKLL.

    What's new

    • โœ… Several new scikit-learn learners included along with reasonable default parameter grids for tuning, where appropriate (issues #256 & #375, PR #377).
      • BayesianRidge
      • DummyRegressor
      • HuberRegressors
      • Lars
      • MLPRegressor
      • RANSACRegressor
      • TheilSenRegressor
      • DummyClassifier
      • MLPClassifier
      • RidgeClassifier
    • ๐Ÿ‘ Allow computing any number of additional evaluation metrics in addition to the tuning objective (issue #350, PR #384).
    • Rename cv_folds_file configuration option to folds_file. The former is still supported with a deprecation warning but will be removed in the next release (PR #367).
    • Add a new configuration option use_folds_file_for_grid_search which controls whether the inner-loop grid-search in a cross-validation experiment with a custom folds file also uses the folds from the file. It's set to True by default. Setting it to False means that the inner loop uses regular 3-fold cross-validation and ignores the file (PR #367).
    • Also add a keyword argument called use_custom_folds_for_grid_search to the Learner.cross_validate() method (PR #367).
    • Learning curves can now be plotted from existing summary files using the new plot_learning_curves command line utility (issue #346, PR #396).
    • ๐Ÿ“š Overhaul logging in SKLL. All messages are now logged both to the console (if running interactively) and to log files. Read more about the SKLL log files in the Output Files section of the documentation (issue #369, PR #380).
    • ๐ŸŒฒ neg_log_loss is now available as an objective function for classification (issue #327, PR #392).

    ๐Ÿ”„ Changes

    • โœ… SKLL now supports Python 3.6. Although Python 3.4 and 3.5 will still work, 3.6 is now the officially supported Python 3 version. Python 2.7 is still supported. (issue #355, PR #360).
    • โœ… The required version of scikit-learn has been bumped up to 0.19.1 (issue #328, PR #330).
    • โœ… The learning curve y-limits are now computed a bit more intelligently (issue #389, PR #390).
    • โœ… Raise a warning if ablation flag is used for an experiment that uses train_file/test_file - this is not supported (issue #313, PR #392).
    • Raise a warning if both fixed_parameters and param_grids are specified (issue #185, PR #297).
    • โœ… Disable grid search if no default parameter grids are available in SKLL and the user doesn't provide parameter grids either (issue #376, PR #378).
    • SKLL has a copy of scikit-learn's DictVectorizer because it needs some custom functionality. Most (but not all) of our modifications have now been merged into scikit-learn so our custom version is now significantly condensed down to just a single method (issue #263, PR #374).
    • ๐Ÿ‘Œ Improved outputs for cross-validation tasks (issues #349 & #371, PRs #365 & #372)
      • When a folds file is specified, the log erroneously showed the full dictionary.
      • Show number of cross-validation folds in results to be via folds file if a folds file is specified.
      • Show grid search folds in results to be via folds file if the grid search ends up using the folds file.
      • Do not show the stratified folds information in results when a folds file is specified.
      • Show the value of use_folds_file_for_grid_search in results when appropriate.
      • Show grid search related information in results only when we are actually doing grid search.
    • ๐Ÿ‘ท The Travis CI plan was broken up into multiple jobs in order to get around the 50 minute limit (issue #385, PR #387).
    • ๐Ÿ“ฆ For the conda package, some of the dependencies are now sourced from the conda-forge channel.

    ๐Ÿ›  Bugfixes

    • Fix the bug that was causing the inner grid-search loop of a cross-validation experiment to use a single job instead of the number specified via grid_search_jobs (issue #363, PR #367).
    • ๐Ÿ›  Fix unbound variable in readers.py (issue #340, PR #392).
    • ๐Ÿ›  Fix bug when running a learning curve experiment via gridmap (issue #386, PR #390).
    • ๐Ÿ›  Fix a mismatch between the default number of grid search folds and the default number of slots requested via gridmap (issue #342, PR #367).

    ๐Ÿ“š Documentation

    • ๐Ÿ“š Update documentation and tests for all of the above changes and new features.
    • โšก๏ธ Update tutorial and installation instructions (issues #383 and #394, PR #399).
    • ๐Ÿ’… Standardize all of the function and method docstrings to be NumPy style. Add docstrings where missing (issue #373, PR #397).