H2O v3.22.0.1 Release Notes

  • ๐Ÿš€ Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xia/1/index.html

    Bug

    [PUBDEV-5023] - In Python, the metalearner method is only available for Stacked Ensembles. โœ… [PUBDEV-5658] - Fixed an issue that caused micro benchmark tests to fail to run in the jmh directory. ๐Ÿ›  [PUBDEV-5663] - Fixed an issue that caused H2O to fail to export dataframes to S3. [PUBDEV-5745] - Added the keep_cross_validation_models argument to Grid Search. [PUBDEV-5746] - Improved efficiency of the keep_cross_validation_models parameter in AutoML [PUBDEV-5777] - Simplified the comparison of H2OXGBoost with native XGBoost when using the Python client. ๐Ÿ›  [PUBDEV-5780] - Fixed JDBC ingestion for Teradata databases. [PUBDEV-5824] - In the Python client and the Java API, multiple runs of the same AutoML instance no longer fail training new "Best Of Family" SE models that would include the newly generated models. ๐Ÿ›  [PUBDEV-5873] - Fixed an issue that resulted in an AssertionError when calling cbind from the Python client. [PUBDEV-5881] - AutoML now enforces case for the sort_metric option when using the Java API. โš™ [PUBDEV-5903] - In AutoML, StackEnsemble models are now always trained, even if we reached max_runtime_secs limit. ๐Ÿ“š [PUBDEV-5904] - In the R client, added documentation for helper functions. [PUBDEV-5922] - Renamed x to X in the H2O-sklearn fit method to be consistent with the sklearn API. ๐Ÿ”€ [PUBDEV-5924] - Merging datasets now works correctly. ๐Ÿ— [PUBDEV-5931] - Building on Maven with h2o-ext-xgboost on versions later than 3.18.0.11 no longer results in a dependency error. ๐Ÿ“œ [PUBDEV-5933] - Fixed a Java 11 ORC file parsing failure. โฌ†๏ธ [PUBDEV-5954] - Upgraded the version of the lodash package used in H2O Flow. [PUBDEV-5967] - -ip localhost now works correctly on WSL. ๐Ÿ“œ [PUBDEV-5971] - CSV/ARFF Parser no longer treats blank lines as data lines with NAs. [PUBDEV-5976] - Starting h2o-3 from the Python Client no longer fails on Java 10.0.2. ๐Ÿ›  [PUBDEV-5995] - Fixed an issue that caused StackedEnsemble MOJO model to return an "IllegalArgumentException: categorical value out of range" message. ๐Ÿšš [PUBDEV-5996] - Removed the "nclasses" parameter from tree traversal routines. [PUBDEV-5998] - Exposed H2OXGBoost parameters used to train a model to the Python API. Previously, this information was visible in the Java backend but was not passed back to the Python API. ๐Ÿšš [PUBDEV-5999] - Removed "illegal reflective access" warnings when starting H2O-3 with Java 10. [PUBDEV-6004] - In Stacked Ensembles, changes made to data during scoring now apply to all models. โšก๏ธ [PUBDEV-6005] - When running AutoML in Flow, updated the list of algorithms that can ber selected in the "Exclude These Algorithms" section.

    New Feature

    [PUBDEV-5170] - Individual predictions of GBM trees are now exposed in the MOJO API. [PUBDEV-5378] - Exposed target encoding in the Java API. [PUBDEV-5399] - The keep_cross_validation_fold_assignment option is now available in AutoML. ๐Ÿ‘ [PUBDEV-5609] - Added support for the Isolation Forest algorithm in H2O-3. Note that this is a Beta version of the algorithm. [PUBDEV-5668] - Added the keep_cross_validation_fold_assignment option to AutoML in Flow. ๐Ÿ”– [PUBDEV-5681] - h2o.connect no longer ignores strict_version_check=FALSE when connecting to a Steam cluster. ๐Ÿ‘ท [PUBDEV-5695] - Created an R demo for CoxPH. This is available here. [PUBDEV-5775] - It is now possible to combine two models into one MOJO, with the second model using the prediction from the first model as a feature. These models can be from any algorithm or combination of algorithms except Word2Vec. [PUBDEV-5852] - Implemented h2oframe.fillna(method='backward'). [PUBDEV-5977] - Improved speed-up of AutoML training on smaller datesets in client mode (Sparkling Water). [PUBDEV-5979] - Exposed Java Target Encoding in the Python client. ๐Ÿšš [PUBDEV-5988] - Users can now specify a -features parameter when starting h2o from the command line. This allows users to remove experimental or beta algorithms when starting H2O-3. Available options for this parameter include beta, stable, and experimental.

    Task

    [PUBDEV-4507] - Added XGBoost to AutoML. [PUBDEV-5696] - Added an option to allow users to use a user-specified JDBC driver. [PUBDEV-5722] - Exposed pr_auc to areas where you can find AUC, including scoring_history, model summary. Also added h2o.pr_auc() in R. ๐Ÿ‘ [PUBDEV-5901] - Added support for Java 11. ๐Ÿ“š [PUBDEV-6001] - Improved the AutoML documentation in the User Guide.

    Improvement

    [PUBDEV-5590] - Added a MAX_USR_CONNECTIONS_KEY argument to limit number of sessions for import_sql_table. ๐ŸŽ [PUBDEV-5669] - Improved performance gap when importing data using Hive2. [PUBDEV-5719] - Improved and cleaned up output for the h2o.mojo_predict_csv and h2o.mojo_predict_df functions. [PUBDEV-5743] - Users can now visualize XGBoost trees when running predictions. [PUBDEV-5761] - Added weights to partial depenced plots. Also added a level for missing values. [PUBDEV-5822] - Users can now download the genmodel.jar in Flow for completed models. [PUBDEV-5886] - In AutoML, changed the default for keep_cross_validation_models and keep_cross_validation_predictions from True to False. ๐Ÿ‘ [PUBDEV-5888] - Added support for predicting using the XGBoost Predictor. โšก๏ธ [PUBDEV-5909] - In XGBoost, optimized the matrix exchange between Java and native C++ code. [PUBDEV-5913] - Improved the h2o-3 README for installing in R and IntelliJ IDEA. [PUBDEV-5927] - Introduced a simple "streaming" mode that allows H2O to read from a table using basic SQL:92 constructs. [PUBDEV-5929] - In AutoML, stopping_metric is now based on sort_metric. [PUBDEV-5952] - The requirements.txt file now includes the Colorama version. [PUBDEV-5961] - In lockable.java, delete is now final in order to prevent inconsistent overrides. โช [PUBDEV-5964] - Reverted AutoML naming change from Auto.Algo to Auto.algo. [PUBDEV-6000] - In AutoML, automatic partitioning of the valiation frame now uses 10% of the training data instead of 20%. [PUBDEV-6002] - Changed model and grid indexing in autogenerated model names in AutoML to be 1 instead of 0 indexed. [PUBDEV-6017] - Allow public access to H2O instances started from R/Python. This can be done with the new bind_to_localhost (Boolean) parameter, which can be specified in h2o.init().

    ๐Ÿ“„ Docs

    ๐Ÿ— [PUBDEV-4505] - Added Scala and Java examples to the Building and Extracting a MOJO topic. [PUBDEV-4590] - Added a Scala example to the Stacked Ensembles topic. ๐Ÿ“š [PUBDEV-5949] - Added Tree class method to the Python module documentation. ๐Ÿ“š [PUBDEV-5641] - Removed references to UDP in the documentation. ๐Ÿšš [PUBDEV-5664] - Removed Sparkling Water topics from H2O-3 User Guide. These are in the Sparkling Water User Guide. ๐Ÿ”Š [PUBDEV-5674] - Added a Resources section to the Overview and included links to the awesome-h2o repository, H2O.ai blogs, and customer use cases. ๐Ÿ“š [PUBDEV-5693] - Updated GCP Installation documentation with infomation about quota limits. ๐Ÿ“š [PUBDEV-5709] - Updated Gains/Lift documentation. 16 groups are now used by default. [PUBDEV-5756] - Added Python examples to the Cross-Validation topic in the User Guide. [PUBDEV-5762] - Added loss_by_col and loss_by_col_idx to list of GLRM parameters. [PUBDEV-5810] - Updated documentation for class_sampling_factors. balance_classes must be enabled when using class_sampling_factors. ๐Ÿณ [PUBDEV-5839] - Added a Python example for initializing and starting h2o-3 in Docker. ๐Ÿ“š [PUBDEV-5857] - Updated the Admin menu documentation in Flow after adding "Download Gen Model" option. ๐Ÿ‘ [PUBDEV-5905] - In GBM and DRF, enum_limited is a supported option for categorical_encoding. ๐Ÿ’ป [PUBDEV-5962] - Added the -notify_local flag to list of flags available when starting H2O-3 from the command line. ๐Ÿ“š [PUBDEV-5982] - Added documentation for Isolation Forest (beta).