PaddlePaddle v2.0.0-beta0 Release Notes

Release Date: 2020-09-15 // over 1 year ago
  • 🚀 2.0-beta Release Note

    重要更新

    本版本为飞桨框架v2.0的测试版,最重要的变化为API体系的全面升级以及命令式编程(动态图)能力的全面完善。本版本系统优化了飞桨基础API的目录结构,全面修复了历史遗留的相关问题,并对API做了充分补充,特别是提供了更为完善的高层API功能;同时提供了对动态图的量化训练、混合精度训练的支持,动静转换实现了完备的语法支持,并且易用性大幅提升,动态图相关功能趋于完善,推荐使用动态图模式。此外,推理库的C++接口也做了升级优化,推理库对量化模型的支持以及推理性能都有了全面增强。

    训练框架

    基础API

    兼容性说明

    • Paddle 2.x版本推荐用户使用位于paddle根目录下的API,同时在paddle.fluid目录下保留了所有的Paddle 1.x版本的API。按照设计,Paddle 1.x版本训练的代码,不做任何修改,即可在Paddle 2.x版本上正常运行;Paddle 1.x版本训练保存的模型,可以使用Paddle 2.x版本进行推理。

    目录结构调整

    • 在2.0-alpha版本的基础上,本版本对于目录结构进行了一些调整,调整完最新的目录结构如下:
    目录 功能和包含的API
    paddle.* paddle根目录下保留了常用API的别名,当前包括:paddle.tensor和paddle.framework目录下的所有API
    paddle.tensor 跟tensor操作相关的API,比如:创建zeros, 矩阵运算matmul, 变换concat, 计算add, 查找argmax等
    paddle.nn 跟组网相关的API,比如:Linear, Conv2d,损失函数,卷积,LSTM等,激活函数等
    paddle.static.nn 静态图下组网专用API,比如:输入占位符data, 全连接层fc, 控制流while_loop/cond
    paddle.static 静态图下基础框架相关API,比如:Variable, Program, Executor等
    paddle.framework 框架通用API和imprerative模式的API,比如:to_tensor等
    ⚡️ paddle.optimizer
    ⚡️ paddle.optimizer.lr_scheduler
    paddle.metric 评估指标计算相关的API,比如:accuracy, auc等
    paddle.io 数据输入输出相关API,比如:Dataset, DataLoader等
    paddle.device 设备管理相关API,比如:CPUPlace, CUDAPlace等
    paddle.distributed 分布式相关基础API
    paddle.distributed.fleet 分布式相关高层API
    paddle.vision 视觉领域API,比如,数据集,数据处理,常用基础网络结构,比如resnet
    paddle.text NLP领域API, 比如,数据集,数据处理,常用网络结构,比如transformer

    API别名规则

    • 为了方便用户使用,API会在不同的路径下建立别名,比如paddle.add -> paddle.tensor.add,推荐用户优先使用较短的路径paddle.add
    • 所有framework, tensor目录下的API,均在paddle根目录建立别名;除少数特殊API外,其他API在paddle根目录下均没有别名。
    • paddle.nn目录下除functional目录以外的所有API,在paddle.nn目录下均有别名;functional目录中的API,在paddle.nn目录下均没有别名。
    • 以下为一些特殊的别名关系,推荐使用左边的名称:
      • paddle.sigmoid -> paddle.tensor.sigmoid -> paddle.nn.functional.sigmoid
      • paddle.tanh -> paddle.tensor.tanh -> paddle.nn.functional.tanh
      • paddle.remainder -> paddle.mod -> paddle.floor_mod
      • paddle.divide -> paddle.true_divide
      • paddle.rand -> paddle.uniform
      • paddle.randn -> paddle.standard_normal
      • Optimizer.clear_grad -> Optimizer.clear_gradients
      • Optimizer.set_state_dict -> Optimizer.set_dict
      • Optimizer.get_lr -> Optimizer.current_step_lr
      • Layer.clear_grad -> Layer.clear_gradients
      • Layer.set_state_dict -> Layer.set_dict

    常用API名称变化

    • 此版本使用Tensor表示数据,创建张量API, paddle.fluid.dygraph.to_variable修改为paddle.to_tensor
    • 加、减、乘、除使用全称,不使用简称
    • 对于当前逐元素操作,不加elementwise前缀
    • 对于按照某一轴操作,不加reduce前缀
    • Conv, Pool, Dropout, BatchNorm, Pad组网类API根据输入数据类型增加1d, 2d, 3d后缀
    Paddle 1.8 Paddle 2.0-beta
    paddle.fluid.layers.elementwise_add paddle.add
    paddle.fluid.layers.elementwise_sub paddle.subract
    paddle.fluid.layers.elementwise_mul paddle.multiply
    paddle.fluid.layers.elementwise_div paddle.divide
    paddle.fluid.layers.elementwise_max paddle.maximum
    paddle.fluid.layers.elementwise_min paddle.minimum
    paddle.fluid.layers.reduce_sum paddle.sum
    paddle.fluid.layers.reduce_prod paddle.prod
    paddle.fluid.layers.reduce_max paddle.max
    paddle.fluid.layers.reduce_min paddle.min
    paddle.fluid.layers.reduce_all paddle.all
    paddle.fluid.layers.reduce_any paddle.any
    paddle.fluid.dygraph.Conv2D paddle.nn.Conv2d
    paddle.fluid.dygraph.Conv2DTranspose paddle.nn.ConvTranspose2d
    paddle.fluid.dygraph.Pool2D paddle.nn.MaxPool2d, paddle.nn.AvgPool2d

    新增API

    • 共计新增140个API,具体参考链接和API文档
      • 新增环境设置API:paddle.set_default_dtype, paddle.get_default_dtype, paddle.set_device, paddle.get_device, paddle.manual_seed
      • 新增Tensor操作API:numel, chunk, masked_select, isfinite, isinf, isnan, sort, topk, Flatten, dim, tile
      • 新增组网API: Linear, Bilinear, Embedding, linear, bilinear, embedding
      • 新增视觉组网类API:Conv1d, ConvTranspose1d, MaxPool1d, MaxPool2d, MaxPool3d, AvgPool1d, AvgPool2d, AvgPool3d, AdaptiveMaxPool1d, AdaptiveMaxPool2d, AdaptiveMaxPool3d, ReflactionPad1d, ReflactionPad2d, ReflactionPad3d, ReplicationPad1d, ReplicationPad2d, ReplicationPad3d, ZeroPad2d, ConstantPad1d, ConstantPad2d, ConstantPad3d, PixelShuffle, Upsample, UpsamplingNearest2d, UpsamplingBilinear2d, conv1d, conv_transpose1d, avg_pool1d, avg_pool2d, avg_pool3d, max_pool1d, max_pool2d, max_pool3d, adaptive_max_pool1d, adaptive_max_pool2d, adaptive_max_pool3d, adaptive_avg_pool1d, adaptive_avg_pool3d
      • 新增文本处理组网类API: SimpleRNN, LSTM, GRU, MultiHeadAttention, Transformer, TransformerEncoder, TransformerEncoderLayer, TransformerDecoder, TransformerDecoderLayer
      • 新增激活类API:ELU, Hardshrink, Hardtanh, PReLU, ReLU6, Tanh, Tanhshrink, Softmax
      • 新增归一化API:BatchNorm1d, BatchNorm2d, BatchNorm3d, SyncBatchNorm, InstanceNorm1d, InstanceNorm2d, InstanceNorm3d, weight_norm, remove_weight_norm, batch_norm, instance_norm, layer_norm, normalize
      • 新增Dropout类API:Dropout2d, Dropout3d, AlphaDropout, dropout, dropout2d, dropout3d
      • 新增相似度、损失函数类API:CosineSimilarity, PairwiseDistance, CTCLoss, KLDivLoss, BCEWithLogitsLoss, MarginRankingLoss, SmoothL1Loss, consine_similarity, binary_cross_entropy, binary_cross_entropy_with_logits, cross_entropy, ctc_loss, l1_loss, mse_loss, margin_ranking_loss, nll_loss, smooth_l1_loss
      • 新增分布式通信类API: broadcast, all_reduce, reduce, all_gather, scatter, barrier
      • 新增概率分布类API: Distribution, normal, bernoulli
      • 新增Optimizer相关API:step, AdamW
      • 新增数据集相关API:Dataset, IterableDataset, TensorDataset, Sampler, RandomSampler, BatchSampler, DistributedBatchSampler

    修复和完善API

    • ⬆️ 共计修改和完善155个API,具体参考链接和API文档
    • 修复随机数生成相关的API,包括:种子设置paddle.rand, randn, randint, randperm, dropout, Uniform, Normal等
    • 以下API对应的底层C++ OP进行了代码升级,理论上可以实现兼容,但不排除会出现少量不兼容的情况:linspace, concat, gather, gather_nd, split, squeeze, unsqueeze, clip, argmax, argmin, mean, norm, unique, cumsum, LeakyReLU, leaky_relu, hardshrink, embedding, margin_ranking_loss, grid_sample, affine_grid
    • 增加了relu6和Sigmoid激活函数的 oneDNN支持

    多设备/分布式训练API

    动态图单机多卡训练

    • 新增paddle.distributed.spawn(func, args=(), nprocs=-1, join=True, daemon=False, **options),用于启动动态图多卡训练。
    • 新增paddle.distributed.init_parallel_env(),用于初始化动态图多卡训练的环境。
    • 新增paddle.distributed.get_rank(),用于获取多卡训练时当前进程的rank。

    - 新增paddle.distributed.get_world_size(),用于获取多卡训练时参与训练的总进程数。

    分布式集合通信

    • 新增paddle.distributed.broadcast(tensor, src, group=0),将指定进程上的tensor广播到所有进程。
    • 新增paddle.distributed.all_reduce(tensor, op=ReduceOp.SUM, group=0),对所有进程的指定Tensor执行归约操作,结果返回给所有进程。
    • 新增paddle.distributed.reduce(tensor, dst, op=ReduceOp.SUM, group=0),对所有进程的指定Tensor执行归约操作,结果返回给指定进程。
    • 新增paddle.distributed.all_gather(tensor_list, tensor, group=0),聚合所有进程的指定Tensor,结果返回给所有进程。
    • 新增paddle.distributed.scatter(tensor, tensor_list=None, src=0, group=0),将指定进程Tensor列表中的Tensor分发到所有进程。
    • 新增paddle.distributed.barrier(group=0),同步所有进程。

    高层API

    • 新增飞桨高层API,对模型开发过程中常见的组网、训练、评估、预测、存取等操作进行封装,实现低代码开发,MNIST手写数字识别任务对比命令式编程模式实现方式,高层API可减少80%执行类代码。
    • 数据管理
      • 统一数据加载使用方式
      • 数据集定义,继承paddle.io.Dataset进行实现。
      • 多进程数据加载,使用paddle.io.DataLoader
      • 新增paddle.io.IterableDataset用于流式数据集,并在paddle.io.DataLoader中支持对其进行并发加速。
      • 新增paddle.io.get_worker_info用于paddle.io.IterableDataset中划分子进程数据。
    • 模型组网
      • 新增常见Loss接口paddle.nn.loss.*和Metric接口paddle.metric.*的封装
      • 发布基于高层API实现的12个模型
      • Transformer,Seq2seq,LAC,BMN,ResNet,YOLOv3,VGG,MobileNet,TSM,CycleGAN,Bert,OCR
      • 发布于PaddlePaddle/hapi仓库的examples目录
    • 模型执行
      • 新增Model类paddle.Model封装,封装模型开发过程中常用的基础功能,包括:
      • 提供Model.summary接口,用于查看动态图组网的网络结构与参数数量。
      • 提供Model.prepare接口,用于指定损失函数和优化算法。
      • 提供Model.fit接口,实现训练和评估,可通过callback方式实现训练过程中执行自定义功能,比如模型存储等。
      • 提供Model.evaluate接口,实现评估集上的预测和评估指标计算。
      • 提供Model.predict接口,实现特定的测试数据推理预测。
      • 提供Model.train_batch接口,实现单batch数据的训练。
      • 提供Model.eval_batch接口,实现单batch数据的评估。
      • 提供Model.text_batch接口,实现单batch数据的测试。
      • 提供Model.save/Model.load接口,支持动态图训练模式存储推理模型。
      • 新增Callback接口paddle.callbacks.*,用于模型执行接口,进行日志记录、Checkpoint模型存储等,用户可继承paddle.callbacks.Callback进行自定义。
    • 领域API
      • 新增视觉(CV)领域接口paddle.vision
      • 新增Dataset接口paddle.vision.datasets.*,对常用数据集进行封装,支持数据的随机访问
      • 新增Resize, Normalize等24种常见的数据预处理接口paddle.vision.transforms.*
      • 新增图像分类骨干网络和预训练参数
        • paddle.vision.models.lenetpaddle.vision.lenet
        • paddle.vision.models.vggpaddle.vision.vgg
        • paddle.vision.models.resnetpaddle.vision.vgg
        • paddle.vision.models.mobilenetv1paddle.vision.mobilenetv1
        • paddle.vision.models.mobilenetv2paddle.vision.mobilenetv2
      • 新增自然语言处理(NLP)领域接口paddle.text
      • 新增Dataset接口paddle.text.datasets.*,对常用数据集进行封装,支持数据的随机访问
      • 新增领域组网接口paddle.text.*
    • 自动断点重启
      • 新增接口 train_epoch_range:可以在静态图上实现基于epoch粒度的 checkpoint 自动保存和自动加载功能,支持自动断点重启。

    功能优化(含分布式)

    动态图转静态图

    • ProgramTranslator新增语法支持
      • 新增对return语法动转静支持,使得动转静时可以在if-elif-else或者循环条件中提前return,也能return不同类型的tensor或None。
      • 新增对print语法动转静支持,使得print(tensor)也能在动转静中打印出tensor。
      • 新增对for遍历Tensor,for enumerate遍历Tensor,for遍历TensorList,for enumerate遍历TensorList几种语法的动转静支持,使得循环处理Tensor的相关操作在动转静中能够灵活使用。
      • 新增对assert语法动转静支持,使得assert tensor也能在动转静中保证tensor为True(bool类型)或者非0(其他数据类型)。
      • 新增对数据类型cast的转写支持,使得float(tensor), int(tensor) 等类似的动态图类型转化语句也能在静态图中进行类型转化。
    • ProgramTranslator易用性优化功能
      • 将动转静的返回类型从callable函数改为class StaticLayer,这个class可以调用.code,.main_program等接口更轻松获取转化后的静态图信息。
      • 增加 set_verbosity 和 set_code_level 接口,可以让用户设置log级别来查看动转静运行过程的log或者查看中间状态转化的代码。
      • 新增InputSpec,可以指定动转静时输入Tensor变量形状和数据类型。
      • 优化了动转静运行下如果出错显示的报错信息,使动转静后静态图运行错误的代码也能汇报到原动态图错误的代码行,并且删除python栈中动转静部分报错,使报错信息更多与用户代码相关。
      • 动转静支持用 pdb.set_trace() 进行断点调试。
    • 优化部署模型存储载入接口
      • 新增 paddle.jit.save 接口用于动转静模型的保存,使接口更加易用,删除旧接口ProgramTranslator.save_inference_model 。
      • 新增 paddle.jit.load 接口用于载入静态图格式存储的预测模型,包括paddle.jit.save和paddle.io.save_inference_model保存的模型,模型载入后可在动态图下用于模型推理或者模型训练调优。

    混合精度训练

    • 增加了动态图混合精度的支持,ResNet-50模型在V100上使用混合精度相比于fp32训练加速比为2.6。

    量化训练

    • 新增ImperativeQuantAware类,提供动态图量化训练功能,目前支持对Conv2D、Linear等层的量化,支持的模型类型包括MobileNetV1/MobileNetV2/ResNet50等。
    • 模型经动态图量化训练后,使用ImperativeQuantAware.save_quantized_model接口保存的量化模型可利用Paddle-Lite推理库进行预测部署。
    • 静态图量化支持Conv2d_tranpose量化,支持Linear使用per-channel形式量化。

    性能优化(含分布式)

    • 简化动态图模式下DataLoader底层实现逻辑,降低读取线程开销,进一步提升数据读取效率,提升模型整体训练速度。经测试MobileNetV1在V100单卡、BatchSize=128的场景下整体训练速度提升34%。
    • 动态图组网API升级和性能优化,大量动态图API将直接调用自动生成的Pybind接口,提升性能。

    动态图基础功能

    • 支持多卡训练时配置Embedding等API使用稀疏参数梯度更新的功能。
    • 增加Tensor类成员函数,包括Tensor().abs()、Tensor().add()、Tensor().cos()等120余个。
    • 增加Layer的dir()接口,可以方便地查看Layer中属性和函数。
    • ⚡️ 增加optimizer.set_lr()接口,用户可以在动态图模式下中灵活调整学习率。
    • 增加全局参数初始化方式的接口set_global_initializer,可定义全局的参数初始化方法。
    • 增加了对动态训练和推理的oneDNN(原MKL-DNN)支持。Resent50 oneDNN动态训练可以使用(Minist数据集)。

    调试分析

    • 将框架内仅100处使用LOG(FATAL)抛出异常的写法统一改为使用PADDLE_THROW,优化由于框架不支持某种行为而导致的报错格式与内容。
    • 🚦 完善框架内Signal Handler实现,优化执行遇到系统Signal错误时的报错格式与内容。
    • 优化框架报错栈格式,将编译时python报错栈移至原生报错栈下方,提升报错信息阅读体验。
    • 累计进一步完善约1300余条框架内检查报错的错误类型与提示文案,提升框架整体调试易用性。
    • 动态图报错信息增强,动态图下Pybind层的报错信息进行系统性增强,提升用户体验。

    🐛 Bug修复

    • 修复动态图Layer使用add_parameter接口可能意外出现AttributeError的问题,增强输入检查。
    • 修复无法正常打印int_8与uint_8类型的Tensor的问题,使数据可以正常输出。

    依赖库升级

    • 升级oneDNN(原MKL-DNN)从1.3至1.5版本。

    推理

    Paddle Inference

    API

    • ⚠ 全面升级推理C++ API,推荐使用新版API。原API暂时保留,但使用时会报 warning,计划未来会删除;新版API主要是从规范命名、简化使用方法角度做的升级,重要变化包括:
      • C++ 接口新增 paddle_infer 命名空间,包含推理相关接口;
      • ZeroCopyTensor 更名为 Tensor,作为推理接口默认输入输出表示方式;
      • 简化 CreatePaddlePredictorCreatePredictor,只保留 对AnalysisConfig 的支持,不再支持其他多种Config;
      • 新增服务相关的工具类,比如 PredictorPool,便于创建多个predictor 时使用。

    功能升级

    • 升级算子版本兼容信息注册表以支持更精确的Op版本信息,提升推理兼容性。
    • 新增对TRT 7.1版本的适配支持。
    • Paddle-TensorRT增强对 PaddleSlim 量化模型的支持,涵盖CV上检测,分类,分割等多个任务。
    • Python端推理新增对用户自定义OP支持。
    • ➕ CPU端增加了elementwise_addelementwise_mul INT8 oneDNN(原MKL-DNN)内核支持。
    • 提升了CPU端测试量化模型的易用性,支持同时对比测试原始模型和量化模型。
    • 新增对Jetson Nx硬件的适配支持。

    性能优化

    • 新增 conv + affine_op pass,在6248机器上,MASK-RCNN fp32单线程性能提高了26%。
    • 新增fc + gru pass和oneDNN(原MKL-DNN) GRU fp32内核,使得GRU fp32模型4线程推断速度在机器Intel Xeon 6248上提高 20%。
    • 增加了对许多Op的oneDNN inplace支持(人脸feature fp32模型提速2%)。
    • 优化的oneDNN LRN op,使得GoogleNet fp32模型提速1%。
    • 升级了量化模型的转换和优化。
    • 优化了CUDA 的ArgMin, ArgMax OP,使得该OP的二进制大小从60M下降至1.3M。

    🐛 Bug修复

    • 修复CPU下的mask-rcnn推断错误的问题。
    • 修复CPU多线程量化模型和推断过程中出现的错误。

    🚀 2.0-beta Release Note

    ⚡️ Important Update

    🐎 This version is the beta version of PaddlePaddle Framework v2.0. The most important change is the full upgrade of the API system and the comprehensive improvement on the imperative programming (dynamic graph) capability. This version systematically optimizes the directory structure of PaddlePaddle basic APIs, comprehensively fixes relevant issues left over from the past, fully supplements APIs, and especially provides the better high-level API functions. It also provides support for the quantitative training and mixed precision training under a dynamic graph. Perfect syntax support is implemented in the dynamic-to-static conversion. The usability is improved substantially. Dynamic graph-related functions tend to be perfect. In addition, the C++ APIs for the inference library are upgraded and optimized. Both the support of the inference library for quantitative models and the inference performance are fully enhanced.

    Training Framework

    Basic APIs

    Compatibility Description

    For Version Paddle 2.x, users are recommended to use APIs in the paddle root directory. In addition, all the APIs of Version Paddle 1.x are reserved in the paddle.fluid directory. Codes for Version Paddle 1.x training are not changed according to the design, that is, models saved for Version Paddle 1.x training can run on Version Paddle 2.x normally and inference can be performed using Version Paddle 2.x.

    Directory Structure Adjustment

    • ✅ Based on the 2.0-alpha version, this version has made some adjustments to the directory structure. The latest adjusted directory structure is as follows:
    Directory Functions and Included APIs
    paddle.* The aliases of commonly used APIs are reserved in the paddle root directory, which currently include all the APIs in the paddle.tensor and paddle.framework directories
    paddle.tensor APIs related to tensor operations such as creating zeros, matrix operation matmul, transforming concat, computing add, and finding argmax
    paddle.nn Networking-related APIs such as Linear, Conv2d, loss function, convolution, LSTM,and activation function
    paddle.static.nn Special APIs for networking under a static graph such as input placeholder data, fully connection fc and control flow while_loop/cond
    paddle.static APIs related to the basic framework under a static graph such as Variable, Program, and Executor
    paddle.framework Universal APIs and imprerative mode APIs such as to_tensor
    ⚡️ paddle.optimizer
    ⚡️ paddle.optimizer.lr_scheduler
    paddle.metric APIs related to evaluation index computation such as accuracy and auc
    paddle.io APIs related to data input and output such as Dataset, and DataLoader
    paddle.device APIs related to device management such as CPUPlace and CUDAPlace
    paddle.distributed Distributed related basic APIs
    paddle.distributed.fleet Distributed related high-level APIs
    paddle.vision Vision domain APIs such as datasets, data processing, and commonly used basic network structures like resnet
    paddle.text NLP domain APIs such as datasets, data processing, and commonly used basic network structures like transformer

    API Alias Rules

    • For the convenience of users, APIs will create aliases in different paths, such as paddle.add -> paddle.sensor.add. Users are recommend to use the shorter path paddle.add.
    • All the APIs in the framework and tensor directories are aliased in the paddle root directory. Except for a few special APIs, all other APIs have no aliases in the paddle root directory.
    • All the APIs in the paddle.nn directory, except those in the functional directory, have aliases in the paddle.nn directory. All the APIs in the functional directory have no aliases in the paddle.nn directory.
    • The following are some special alias relations. It is recommended to use the names on the left.
      • paddle.sigmoid -> paddle.tensor.sigmoid -> paddle.nn.functional.sigmoid
      • paddle.tanh -> paddle.tensor.tanh -> paddle.nn.functional.tanh
      • paddle.remainder -> paddle.mod -> paddle.floor_mod
      • paddle.divide -> paddle.true_divide
      • paddle.rand -> paddle.uniform
      • paddle.randn -> paddle.standard_normal
      • Optimizer.clear_grad -> Optimizer.clear_gradients
      • Optimizer.set_state_dict -> Optimizer.set_dict
      • Optimizer.get_lr -> Optimizer.current_step_lr
      • Layer.clear_grad -> Layer.clear_gradients
      • Layer.set_state_dict -> Layer.set_dict

    Name Change of Commonly Used APIs

    • This version uses tensor representation data, creates tensor APIs, and changes paddle.fluid.dygraph.to_variable to paddle.to_tensor
    • ➕ Addition, subtraction, multiplication, and division use full names only
    • For the current element-by-element operation, no elementwise prefix is added
    • For operating by a certain axis, no reduce prefix is added
    • 🛠 For Conv, Pool, Dropout, BatchNorm and Pad networking APIs, 1d, 2d, and 3d suffixes are added according to the input data type
    Paddle 1.8 Paddle 2.0-beta
    paddle.fluid.layers.elementwise_add paddle.add
    paddle.fluid.layers.elementwise_sub paddle.subract
    paddle.fluid.layers.elementwise_mul paddle.multiply
    paddle.fluid.layers.elementwise_div paddle.divide
    paddle.fluid.layers.elementwise_max paddle.maximum
    paddle.fluid.layers.elementwise_min paddle.minimum
    paddle.fluid.layers.reduce_sum paddle.sum
    paddle.fluid.layers.reduce_prod paddle.prod
    paddle.fluid.layers.reduce_max paddle.max
    paddle.fluid.layers.reduce_min paddle.min
    paddle.fluid.layers.reduce_all paddle.all
    paddle.fluid.layers.reduce_any paddle.any
    paddle.fluid.dygraph.Conv2D paddle.nn.Conv2d
    paddle.fluid.dygraph.Conv2DTranspose paddle.nn.ConvTranspose2d
    paddle.fluid.dygraph.Pool2D paddle.nn.MaxPool2d, paddle.nn.AvgPool2d

    ➕ Added APIs

    • ➕ Added a total of 140 APIs. See Link and the API document
      • Added environment setting APIs: paddle.set_default_dtype, paddle.get_default_dtype, paddle.set_device, paddle.get_device, paddle.manual_seed
      • Added tensor operation APIs: numel, chunk, masked_select, isfinite, isinf, isnan, sort, topk, Flatten, dim, tile
      • Added networking APIs: Linear, Bilinear, Embedding, linear, bilinear, embedding
      • Added vision networking APIs: Conv1d, ConvTranspose1d, MaxPool1d, MaxPool2d, MaxPool3d, AvgPool1d, AvgPool2d, AvgPool3d, AdaptiveMaxPool1d, AdaptiveMaxPool2d, AdaptiveMaxPool3d, ReflactionPad1d, ReflactionPad2d, ReflactionPad3d, ReplicationPad1d, ReplicationPad2d, ReplicationPad3d, ZeroPad2d, ConstantPad1d, ConstantPad2d, ConstantPad3d, PixelShuffle, Upsample, UpsamplingNearest2d, UpsamplingBilinear2d, conv1d, conv_transpose1d, avg_pool1d, avg_pool2d, avg_pool3d, max_pool1d, max_pool2d, max_pool3d, adaptive_max_pool1d, adaptive_max_pool2d, adaptive_max_pool3d, adaptive_avg_pool1d, adaptive_avg_pool3d
      • Added text processing networking APIs: SimpleRNN, LSTM, GRU, MultiHeadAttention, Transformer, TransformerEncoder, TransformerEncoderLayer, TransformerDecoder, TransformerDecoderLayer
      • Added activation APIs: ELU, Hardshrink, Hardtanh, PReLU, ReLU6, Tanh, Tanhshrink, Softmax
      • Added normalization APIs: BatchNorm1d, BatchNorm2d, BatchNorm3d, SyncBatchNorm, InstanceNorm1d, InstanceNorm2d, InstanceNorm3d, weight_norm, remove_weight_norm, batch_norm, instance_norm, layer_norm, normalize
      • Added dropout APIs: Dropout2d, Dropout3d, AlphaDropout, dropout, dropout2d, dropout3d
      • Added similarity and loss function APIs: CosineSimilarity, PairwiseDistance, CTCLoss, KLDivLoss, BCEWithLogitsLoss, MarginRankingLoss, SmoothL1Loss, consine_similarity, binary_cross_entropy, binary_cross_entropy_with_logits, cross_entropy, ctc_loss, l1_loss, mse_loss, margin_ranking_loss, nll_loss, smooth_l1_loss
      • Added distributed communication APIs: broadcast, all_reduce, reduce, all_gather, scatter, barrier
      • Added probability distribution APIs: Distribution, normal, bernoulli
      • Added optimizer-related APIs: step, AdamW
      • Added dataset-related APIs: Dataset, IterableDataset, TensorDataset, Sampler, RandomSampler, BatchSampler, DistributedBatchSampler

    🛠 Fixing and Improving APIs

    • ⬆️ Modified and improved a total of 155 APIs. See Link and the API document
    • 🛠 Fixed APIs related to random number generation including: seed setting paddle.rand, randn, randint, randperm, dropout, Uniform, and Normal
    • Upgraded the codes of the underlying C++ operators corresponding to the following APIs to theoretically achieve compatibility without excluding slight incompatibility: linspace, concat, gather, gather_nd, split, squeeze, unsqueeze, clip, argmax, argmin, mean, norm, unique, cumsum, LeakyReLU, leaky_relu, hardshrink, embedding, margin_ranking_loss, grid_sample, affine_grid
    • ➕ Added oneDNN support for the relu6 and Sigmoid activation functions

    Multi-device/Distributed Training APIs

    Single-Machine Multi-Card Training Under a Dynamic Graph

    • Added paddle.distributed.spawn(func, args=(), nprocs=-1, join=True, daemon=False, **options),which is used to start multi-card training under a dynamic graph.
    • Added paddle.distributed.init_parallel_env(), which is used to initialize the environment of multi-card training under a dynamic graph.
    • Added paddle.distribued.get_rank(), which is used to get the rank of the current process during the multi-card training.

    - Added paddle.distribued.get_world_size(), which is used to get the total number of processes participating in training during the multi-card training.

    Distributed Collective Communication

    • Added paddle.distributed.broadcast(tensor, src, group=0), which broadcasts a tensor of a specified process to all the processes.
    • Added paddle.distributed.all_reduce(tensor, op=ReduceOp.SUM, group=0), which performs the reduce operation on specified tensors of all the processes and returns results to all the processes.
    • Added paddle.distributed.reduce(tensor, dst, op=ReduceOp.SUM, group=0), which performs the reduce operation on specified tensors of all the processes and returns results to specified processes.
    • Added paddle.distributed.all_gather(tensor_list, tensor, group=0), which gathers specified tensors of all the processes and returns results to all the processes.
    • Added paddle.distributed.scatter(tensor, tensor_list=None, src=0, group=0), which distributes tensors in a specified tensor list to all the processes.
    • Added paddle.distributed.barrier(group=0),which synchronizes all the processes.

    High-level APIs

    • ➕ Added PaddlePaddle high-level APIs to encapsulate common operations such as networking, training, evaluation, inference, and access so as to implement low code development. In the MNIST handwritten digit recognition task versus the imperative programming implementation mode, high-level APIs can reduce 80% of executable codes.
    • Data Management
      • Unified data loading and usage method
      • Dataset definition, which is implemented by inheriting paddle.io.Dataset.
      • Multi-process data loading using paddle.io.DataLoader.
      • Added paddle.io.IterableDataset, which is used for a streaming dataset and supports its concurrent acceleration in paddle.io.DataLoader.
      • Added paddle.io.get_worker_info for dividing child process data in paddle.io.IterableDataset.
    • Model Networking
      • Added the encapsulation of the common loss API paddle.nn.loss.* and metric API paddle.metric.*
      • Released 12 models based on high-level API implementations, including Transformer, Seq2seq, LAC, BMN, ResNet, YOLOv3, VGG, MobileNet, TSM, CycleGAN, Bert, OCR. The code can be found in PaddlePaddle/hapi examples.
    • Model Execution
      • Added class API paddle.Model, which encapsulates the common model development methods:
      • API Model.summary to view the network structure and the number of parameters of the dynamic graph networking.
      • API Model.prepare to specify a loss function and an optimization algorithm.
      • API Model.fit to implement training and evaluation, which can implement the execution of user-defined functions such as model storage by callback.
      • API Model.evaluate to implement the computation of inference and evaluation indexes on the evaluation set.
      • API Model.predict to implement specific test data inference.
      • API Model.train_batch to implement training on a single batch of data.
      • API Model.eval_batch to implement evaluation on a single batch of data.
      • API Model.text_batch to implement testing on a single batch of data.
      • API Model.save/Model.load , which supports storing an inference model in dynamic graph training mode.
      • Added callback API paddle.callbacks.* as a model execution API, which performs logging and Checkpoint model saving, etc. Users can customize a callback by inheriting paddle.callbacks.Callback.
    • Domain APIs
      • Added computer vision (CV) APIs paddle.vision
      • Added dataset API paddle.vision.datasets.*, which encapsulates common public datasets and supports random access to data.
      • Added 24 common data preprocessing APIs paddle.vision.transforms.* such as Resize, Normalize, etc.
      • Added image classification backbone network and pre-training parameters:
        • paddle.vision.models.lenet or paddle.vision.lenet
        • paddle.vision.models.vgg or paddle.vision.vgg
        • paddle.vision.models.resnet or paddle.vision.resnet
        • paddle.vision.models.mobilenetv1 or paddle.vision.mobilenetv1
        • paddle.vision.models.mobilenetv2 or paddle.vision.mobilenetv2
      • Added natural language processing (NLP) APIs paddle.text.
      • Added dataset API paddle.text.datasets.*, which encapsulates commonly-used datasets and supports random access to data.
      • Added networking API paddle.text.*.
    • Automatic Breakpoint Restart
      • Added API train_epoch_range, which implements the epoch-level checkpoint autosave and autoloading functions on a static graph and supports automatic breakpoint restart.

    Function Optimization (Including Distributed)

    Dynamic Graph to Static Graph

    • Added Syntax Support for ProgramTranslator
      • Added dynamic-to-static support for the return syntax so as to return in advance or to return different types of tensors or none in if-elif-else or loop conditions during the dynamic-to-static conversion.
      • Added dynamic-to-static support for the print syntax so that print (tensor) can also print out a tensor in the dynamic-to-static conversion.
      • Added dynamic support for “for traversing a tensor”, “for traversing a tensor using enumeration”, “for traversing a TensorList”, and “for traversing a TensorList using enumeration” syntaxes so that operations related to the circular processing of tensors can be flexibly used in the dynamic-to-static conversion.
      • Added dynamic-to-static support for the assert syntax to ensure that an assert tensor can be true (bool type) or non-0 (other data types) in the dynamic-to-static conversion.
      • Added support for the transfer of cast of data type so that type conversion of similar conversion statements of dynamic graph type such as float (tensor) and int (tensor) can also be performed in a static graph.
    • ProgramTranslator Usability Optimization Function
      • Changed the dynamic-to-static return type to class StaticLayer from callable. This class can obtain converted static graph information more easily by calling .code,.main_program, and other APIs.
      • Added set_verbosity and set_code_level APIs so that users can set a log class to view a log in the dynamic-to-static running process or a converted code in intermediate state.
      • Added InputSpec to specify the shape and data type of an input tensor variable.
      • Optimized an error message displayed in case of error in the dynamic-to-static running so that codes with running error in the static graph after dynamic-to-static conversion can also be reported to the original error code line in the dynamic graph; deleted some dynamic-to-static errors from python stacks so that an error message is more related to user codes.
      • Support performing a breakpoint test using pdb.set_trace() during the dynamic-to-static conversion.
    • 🚀 Optimized Deployment of Model Storage and Loading APIs
      • Added paddle.jit.save API, which is used to save a dynamic-to-static model so that the API is easier to use; deleted an old API ProgramTranslator.save_inference_model.
      • Added paddle.jit.load API, which is used to load inference models including models saved by paddle.jit.save and paddle.io.save_inference_model. After being loaded, models can be used for model inference or model training optimization in a dynamic graph.

    Mixed Precision Training

    • ➕ Added the support for mixed precision of dynamic graphs. The ratio of the speed when the ResNet-50 model is trained on V100 using mixed precision to the speed using fp32 is 2.6.

    Quantitative Training

    • ➕ Added ImperativeQuantAware class. The dynamic graph quantitative training function is provided. Currently, the quantization of Conv2D, Linear, and other layers are supported. The supported model types include MobileNetV1/MobileNetV2/ResNet50.
    • After dynamic graph quantitative training is performed on a model, inference deployment of any quantitative model saved using an ImperativeQuantAware.save_quantized_model API can be performed using a Paddle-Lite inference library.
    • 👍 As for static graph quantization, Conv2d_tranpose quantization as well as Linear quantization in the form of per-channel is supported.

    🐎 Performance Optimization (Including Distributed)

    • Simplified the DataLoader underlying implementation logic in dynamic graph mode, reduced the thread reading overhead, and further improved the data reading efficiency and the overall model training speed.The overall training speed of MobileNetV1 in a scenario of single V100 card and BatchSize = 128 is increased by 34%.
    • 🐎 Upgrade and performance optimization of dynamic graph networking. A large number of dynamic graph APIs will directly call an automatically generated Pybind API, improving the performance.

    Basic Functions for Dynamic Graph

    • 👌 Support the function of updating the gradient using a sparse parameter by configuring embedding and other APIs.
    • ➕ Added over 120 member functions of Tensor type, including Tensor().abs(), Tensor().add(), and Tensor().cos().
    • ➕ Added dir() API for a layer to facilitate viewing the attributes and functions in the layer.
    • ➕ Added an optimizer.set_lr() API so that users can flexibly adjust a learning rate in dynamic diagram mode.
    • Added a global parameter initialization method API set_global_initializer to define a global parameter initialization method.
    • ➕ Added oneDNN (former MKL-DNN) support for dynamic training and inference.Resent50 oneDNN dynamic training with minist dataset is enabled.
    • ➕ Added oneDNN support for dynamic training and inference. Resent50 oneDNN dynamic training with minist dataset is enabled.

    Debugging Analysis

    • ⚡️ Uniformly changed the wording of LOG (FATAL) throw abnormal at just 100 points to PADDLE_THROW; optimized the error format and content caused by non-support of the framework for a behavior.
    • 👌 Improved Signal Handler implementation within the framework; optimized the error format and content when system signal error occurs during the execution.
    • ⚡️ Optimized the framework error stack format. The python error stack occurring during the compilation is moved to below the native error stack to improve error message reading experience.
    • ✨ Further improved an accumulative total of about 1,300 error type and prompt copywritings of check errors within the framework to enhance the overall debugging usability of the framework.
    • ✨ Enhanced dynamic graph error messages. Error messages on the Pybind layer under a dynamic graph are systematically enhanced to improve user experience.

    🐛 Bug Fixing

    • 🛠 Fixed the problem that AttributeError may unexpectedly occur when the add_parameter API is used on a layer under a dynamic graph; enhance the input check.
    • Fixed the problem that tensors of int_8 and uint_8 types cannot be normally printed so that data can be normally output.

    ⬆️ Dependency Library Upgrading

    • ⬆️ Upgraded oneDNN (former MKL-DNN) to Version 1.5 from Version 1.3.
    • ⬆️ Upgrade oneDNN from 1.3->1.5

    Inference

    Paddle Inference

    API

    • ⬆️ Fully upgraded the inference C++ APIs. The new version of the APIs is recommended. The original APIs are reserved tentatively, but give a warning during use, and are planned to be deleted in the future. The upgrade to the new version of the APIs mainly involves naming standardization and usage method simplification. The important changes include:
      • adding a paddle_infer naming space for the C++ APIs, containing inference-related APIs.
      • renaming ZeroCopyTensor to Tensor as the default input/output representation method for the inference APIs.
      • simplifying CreatePaddlePredictor to CreatePredictor and reserving the support for only AnalysisConfig, not for other Configs any more.
      • adding service-related utility classes such as PredictorPool, which can be used when multiple predictors are created.

    ⬆️ Functional Upgrading

    • ⬆️ Upgraded the operator version compatibility information registry to support more accurate Op version information and improve inferential compatibility.
    • ➕ Added the adaptive support for Version TRT 7.1.
    • 👍 Paddle-TensorRT enhances the support for the PaddleSlim quantitative model. Multiple tasks such as detection, classification, and segmentation on CV are covered.
    • ➕ Added the support for user-defined operators for Python-side inference.
    • ➕ Added the kernel support for elementwise_add and elementwise_mul INT8 oneDNN (former MKL-DNN) on the CPU side.
    • 👌 Improved the usability of CPU-side test quantitative models. A simultaneous comparison test of original models with quantitative models is supported.
    • ➕ Added the adaptive support for Jetson Nx hardware.

    🐎 Performance optimization

    • ➕ Added conv + affine_op pass, MASK-RCNN single thread performance is improved by 26% (1.26x) on machine 6248
    • ➕ Added fc + gru fuse pass and enabled oneDNN gru fp32 kernel, speeding up GRU fp32 model inference on 4 CPU threads by 20% (1.2x) on machine Intel Xeon 6248
    • ➕ Added support for oneDNN inplace support for many operators (speedup 2% for Feature model)
    • ⚡️ Optimized LRN operator (speedup 1% for GoogleNet)
    • 👌 Improved the transformation and optimization of quantized model
    • ⚡️ Optimized the ArgMin, ArgMax operator of CUDA so that the binary system size of the operator is decreased to 1.3 M from 60 M.

    🐛 Bug Fixing

    • 🛠 Fixed mask-rcnn inference error under CPU inference
    • 🛠 Fixed the CPU multithread inference on oneDNN quantized INT8 models