H2O v3.24.0.1 Release Notes

  • ๐Ÿš€ Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html

    Bug

    โœ… [PUBDEV-6159] - The AutoMLTest.java test suite now runs correctly on a local machine. ๐Ÿ›  [PUBDEV-6189] - Fixed an issue in as_date that occurred when the column included NAs. [PUBDEV-6208] - AutoML no longer fails if one of the Stacked Ensemble models is deleted. ๐Ÿšš [PUBDEV-6230] - Removed elipses after the H2O server link when launching the Python client. ๐Ÿ›  [PUBDEV-6231] - In Deep Learning, fixed an issue that occurred when running one-hot-encoding on categoricals. ๐Ÿ— [PUBDEV-6262] - When running GBM in R without specifically setting a seed, users can now extract the seed that was used to build the model and reproduce that model. ๐Ÿ›  [PUBDEV-6266] - In predictions, fixed an issue that resulted in a "Categorical value out of bounds error" when calling a model. [PUBDEV-6284] - The Python API no longer reverses the labels for positive and negative values in the standardized coefficients plot legend. ๐Ÿ›  [PUBDEV-6346] - In R, fixed an issue that cause group_by mean to only calculate one column when multiple columns were specified. ๐Ÿ›  [PUBDEV-6350] - Fixed an issue that caused the confusion_matrix method to return matrices for other metrics. ๐Ÿ›  [PUBDEV-6357] - Fixed an issue that resulted in a "Categorical value out of bounds error" when calling a model using Python. [PUBDEV-6360] - Improved the error message that displays when a user attempts to modify an Enum/categorical column as if it were a string. [PUBDEV-6367] - Rows that start with a # symbol are no longer dropped during the import process. ๐Ÿ›  [PUBDEV-6368] - Fixed an SVM import failure. โœ… [PUBDEV-6376] - Fixed an issue that caused the default StackedEnsemble prediction to fail when applied to a test dataset without a response column. ๐Ÿ›  [PUBDEV-6379] - Fixed handling of BAD state in CategoricalWrapperVec.

    New Feature

    [PUBDEV-4680] - Added Blending mode to Stacked Ensembles, which can be specified with the blending_frame parameter. With Blending mode, you do not use cross-validation preds to train the metalearner. Instead you score the base models on a holdout set and use those predicted values. [PUBDEV-5801] - Model output now includes column names and types. โš™ [PUBDEV-5809] - AutoML now includes a max_runtime_secs_per_model option. ๐Ÿ‘ [PUBDEV-5925] - In GLM, added support for negative binomial family. [PUBDEV-5980] - ExposeD Java target encoding to R. [PUBDEV-6056] - For GBM and XGBoost models, users can now generate feature contributions (SHAP values). ๐Ÿ‘ [PUBDEV-6136] - Added support for Generic Models, which provide a means to use external, pretrained MOJO models in H2O for scoring. Currently only GBM, DRF, IF, and GLM MOJO models are supported. [PUBDEV-6180] - Added the blending_frame parameter to Stacked Ensembles in Flow. [PUBDEV-6196] - Added an include_algos parameter to AutoML in the R and Python APIs. Note that in Flow, users can specify exclude_algos only. [PUBDEV-6339] - In the R and Python clients, added a function that calculates the chunk size based on raw size of the data, number of CPU cores, and number of nodes. ๐Ÿ“‡ [PUBDEV-6344] - Added ability to import from Hive using metadata from Metastore. [PUBDEV-6358] - Users can now choose the database where import_sql_select creates a temporary table. ๐Ÿ‘ [PUBDEV-6365] - Added support for monotonicity constraints for binomial GBMs. [PUBDEV-6374] - Users can now define custom HTTP headers using an -add_http_header option. 0๏ธโƒฃ [PUBDEV-6386] - XGBoost MOJO now uses Java predictor by default.

    Task

    [PUBDEV-4982] - Fixed an issue that caused the pyunit_lending_club_munging_assembly_large.py and pyunit_assembly_munge_large.py tests to sometimes fail when run inside a Docker container. [PUBDEV-5876] - Simplified and improved the GLM COD implementation.

    Improvement

    ๐Ÿ‘ [PUBDEV-5491] - SQLite support is available via any JDBC driver in streaming mode. โšก๏ธ [PUBDEV-5993] - Updated Retrofit and okHttp dependecies. [PUBDEV-6129] - Target Encoding is now available in the Python client. ๐Ÿ“ฆ [PUBDEV-6176] - Moved StackedEnsembleModel to hex.ensemble packages. In prior versions, this was in a root hex package. [PUBDEV-6188] - Secret key ID and secret key are available for s3:// AWS protocol. This can be done in the R client using: h2o.setS3Credentials(accessKeyId, accesSecretKey) and in the Python client using: from h2o.persist import set_s3_credentials set_s3_credentials(access_key_id, secret_access_key) [PUBDEV-6217] - Users can now specify AWS credentials at runtime. [PUBDEV-6254] - The new blending_frame parameter is now available in AutoML. ๐Ÿ›  [PUBDEV-6334] - Fixed an error in the Javadoc for the Frame.java sort function. ๐Ÿ›  [PUBDEV-6363] - Fixed Hive delegation token generation. [PUBDEV-6388] - Reordered the algorithms train in AutoML and prioritized hardcoded XGBoost models.

    ๐Ÿ“„ Docs

    ๐Ÿšš [PUBDEV-4977] - Removed FAQ indicating that Java 9 was not yet supported. [PUBDEV-6136] - Added a "Generic Models" chapter to the Algorithms section. ๐Ÿ“š [PUBDEV-6179] - Added the blending_frame parameter to Stacked Ensembles documentation. [PUBDEV-6280] - Added information about the Negative Binomial family to the GLM booklet and the user guide. ๐Ÿ“š [PUBDV-6289] - Improved the R and Python client documentation for the sum function. [PUBDEV-6331] - Added include_algos,e xclude_algos, max_models, and max_runtime_secs_per_model examples to the Parameters appendix. ๐Ÿ“š [PUBDEV-6362] - In the User Guide and R an Python documentation, replaced references to "H2O Cloud" with "H2O Cluster". ๐ŸŽ [PUBDEV-6375] - Added information about predict_contributions to the Performance and Prediction chapter. [PUBDEV-6381] - In the GBM chapter, noted that monotone_constraints is available for Bernoulli distributions in addition to Gaussian distributions. ๐Ÿ™‹ Improved the GBM Reproducibility FAQ.