H2O v3.18.0.1 Release Notes

  • ๐Ÿš€ Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-wolpert/1/index.html

    Bug

    ๐Ÿ›  [PUBDEV-4585] - Fixed an issue that caused XGBoost binary save/load to fail. ๐Ÿ›  [PUBDEV-4593] - Fixed an issue that caused a Levensthein Distance Normalization Error. Levenstein distance is now implemented directly into H2O. [PUBDEV-5112] - The Word2Vec Python API for pretrained models no longer requires a training frame. In addition, a new from_external option was added, which creates a new H2OWord2vecEstimator based on an external model. [PUBDEV-5128] - Fixed an issue that caused the show function of metrics base to fail to check for a key custom_metric_name and excepts. [PUBDEV-5129] - The fold column in Kmeans is no longer required to be in x. ๐Ÿ“œ [PUBDEV-5130] - The date is now parsed correctly when parsed from H2O-R. [PUBDEV-5133] - In Flow, the scoring history plot is now available for GLM models. ๐Ÿ“œ [PUBDEV-5135] - The Parquet parser no longer fails if one of the files to parse has no records. ๐Ÿ“œ [PUBDEV-5145] - Added error checking and logging on all the uses of water.util.JSONUtils.parse(). </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5155'>PUBDEV-5155</a>] - In AutoML, fixed an exception in Python binding that occurred when the leaderboard was empty. </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5156'>PUBDEV-5156</a>] - In AutoML, fixed an exception in R binding that occurred when the leaderboard was empty. </li> ๐Ÿšš <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5159'>PUBDEV-5159</a>] - Removed Pandas dependency for AutoML in Python. </li> <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5167'>PUBDEV-5167</a>] - In PySparkling, reading Parquet/Orc data with time type now works correctly in H2O. </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5174'>PUBDEV-5174</a>] - Fixed a maximum recursion depth error when usingisinin the H2O Python client. </li> ๐Ÿ‘ท <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5175'>PUBDEV-5175</a>] - When running getJobs in Flow, fixed a ClassNotFoundException that occurred when AutoML jobs existed. </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5179'>PUBDEV-5179</a>] - Fixed an issue that caused a list of columns to be truncated in PySparkling. Light endpoint now returns all columns. </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5186'>PUBDEV-5186</a>] - In AutoML, fixed a deadlock issue that occurred when two AutoML runs came in the same second, resulting in matching timestamps. </li> <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5191'>PUBDEV-5191</a>] - The offset_column and distribution parameters are no longer available in Random Forest. </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5195'>PUBDEV-5195</a>] - Fixed an issue in XGBoost that caused MOJOs to fail to work without manually adding the Commons Logging dependency. </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5203'>PUBDEV-5203</a>] - Fixed an issue that caused XGBoost to mangle the domain levels for datasets that have string response domains. </li> <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5213'>PUBDEV-5213</a>] - In Flow, the separator drop down now shows 3-digit decimal values instead of 2. </li> <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5215'>PUBDEV-5215</a>] - Users can now specify interactions when running GLM in Flow. </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5228'>PUBDEV-5228</a>] - FrameMetadate code no longer uses hardcoded keys. Also fixed an issue that caused AutoML to fail when multiple AutoMLs are run simultaneously. </li> <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5229'>PUBDEV-5229</a>] - A frame can potentially have a null key. If there is a Frame with a null key (just a container for vecs), H2O no longer attempts to track a null key. </li> ๐Ÿง <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5256'>PUBDEV-5256</a>] - Users can now successfully build an XGBoost model as compile chain. XGBoost no longer fails to provide the compatible artifact for an Oracle Linux environment. </li> <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5265'>PUBDEV-5265</a>] - GLM no longer fails when a categorical column exists in the dataset along with an empty value on at least one row. </li> ๐Ÿ›  <li>[<a href='https://0xdata.atlassian.net/browse/PUBDEV-5286'>PUBDEV-5286</a>] - Fixed an issue that cause GBM grid to fail on some datasets when specifyingsample_rate` in the grid. [PUBDEV-5287] - The x argument is no longer required when performing a grid search. ๐Ÿ“œ [PUBDEV-5297] - Fixed an issue that caused the Parquet parser to fail on Spark 2.0 (SW-707). ๐Ÿ›  [PUBDEV-5315] - Fixed an issue that caused XGBoost OpenMP to fail on Ubuntu 14.04.

    New Feature

    ๐Ÿ“œ [PUBDEV-4111] - Added support for INT96 timestamp to the Parquet parser. ๐Ÿ‘ [PUBDEV-4652] - Added support for XGBoost multinode training in H2O. Note that this is still a BETA feature. [PUBDEV-4980] - Users can now specify a list of algorithms to exclude during an AutoML run. This is done using the new exclude_algos parameter. ๐Ÿ— [PUBDEV-5204] - In GLM, users can now specify a list of interactions terms to include when building a model instead of relying on the default action of including all interactions.

    Task

    [PUBDEV-5230] - The Python PCA code examples in github and in the User Guide now use the h2o.estimators.pca.H2OPrincipalComponentAnalysisEstimator method instead of the h2o.transforms.decomposition.H2OPCA method. โฌ†๏ธ [PUBDEV-5251] - Upgraded the XGBoost version. This now supports RHEL 6.

    Improvement

    0๏ธโƒฃ [PUBDEV-5086] - Stacked Ensemble allows you to specify the metalearning algorithm to use when training the ensemble. When an algorithm is specified, Stacked Ensemble runs with the specified algorithm's default hyperparameter values. The new metalearner_params option allows you to pass in a dictionary/list of hyperparameters to use for that algorithm instead of the defaults. ๐Ÿ‘€ [PUBDEV-5224] - Users can now specify a seed parameter in Stacked Ensemble. ๐Ÿ“„ [PUBDEV-5310] - Documented clouding behavior of an H2O cluster. This is available at https://github.com/h2oai/h2o-3/blob/master/h2o-docs/devel/h2o_clouding.rst.

    ๐Ÿ“„ Docs

    ๐Ÿ“š [PUBDEV-5149] - Updated the documentation to indicate that datetime parsing from R and Flow now is UTC by default. ๐Ÿ“š [PUBDEV-5151] - R documentation on docs.h2o.ai is now available in HTML format. [PUBDEV-5172] - Added a new Cloud Integration topic for using H2O with AWS. ๐Ÿ‘ [PUBDEV-5221] - In the XGBoost chapter, added that XGBoost in H2O supports multicore. [PUBDEV-5242] - Added interaction_pairs to the list of GLM parameters. [PUBDEV-5283] - Added metalearner_algorithm and metalearner_params to the Stacked Ensembles chapter. ๐Ÿ“š [PUBDEV-5311] - The H2O-3 download site now includes a link to the HTML version of the R documentation. ๐Ÿ“š [PUBDEV-5312] - Updated the XGBoost documentation to indicate that multinode support is now available as a Beta feature. ๐Ÿ‘€ [PUBDEV-5314] - Added the seed parameter to the Stacked Ensembles section of the User Guide.