pytext v0.3.3 Release Notes

Release Date: 2020-06-08 // over 1 year ago
  • ๐Ÿ†• New features

    • โž• Add XLM-R document classification server + console (#1358)
    • MLP layer embed for float tensors and FloatListSeqTensorizer for List[List[[float]] features. (#1374)
    • โž• Add class_accuracy in MultiLabelSoftClassificationMetrics (#1371)
    • โž• Add an option to skip test run after models have been trained (#1372)
    • ๐Ÿ‘Œ Support DP in PyText (#1366)
    • Support torchscriptify in multi_label_classification_layer (#1350)
    • โž• Add custom metric class for reporting Joint model metrics (#1339)
    • MultiLabel-MultiClass Model for Joint Sequence Tagging (#1335)
    • ๐Ÿ‘ Scripted tokenizer support for DocModel (#1314)

    ๐Ÿ›  Bugfixes

    • ๐Ÿ›  Fixed metric reporter aggregation and output layer for the multi-label classification
    • Remove move_state_dict_to_gpu, which is causing CUDA OOM (#1367)
    • ๐Ÿ›  Fix Flow's default conversion of dict to AttrDict
    • ๐Ÿ›  Fix bug in ClassificationOutputLayer that pad_idx is never respected (#1347)
    • ๐Ÿ›  Serializing/Deserializing type Any: bugfix and simplification (#1344)
    • ๐Ÿ›  Fix RoBERTa Q&A Training Bug with multiple BoS tokens. (#1343)

    Other

    • ๐Ÿ‘ Better error message for misconfigured data fields
    • ๐Ÿ—„ Replace deprecated integer division with floor division operator
    • โž• Add informative prints to assert statements (#1360)
    • TorchScript: Put dense tensor on the same device with other input tensors (#1361)
    • โšก๏ธ Update PyTorch + ONNX (#1340)
    • โšก๏ธ Update PyTorch + ONNX (#1340)- binary ONNX
    • โšก๏ธ Update PR Template (#1349)
    • โฌ‡๏ธ Reduce memory request for pytext train operator
    • โž• Add 'contrib' directory for experimental code (#1333)

Previous changes from v0.3.2

  • ๐Ÿ†• New features

    • โž• Add Roberta model into BertPairwiseModel (#1336)
    • ๐Ÿ‘Œ Support read file from http URL (#1317)
    • add a new PyText get_num_examples_from_batch function in model (#1319)
    • โž• Add support for length label smoothing (#1308)
    • โž• Add new metrics type for Masked Seq2Seq Joint Model (#1304)
    • โž• Add mask generator and strategy (#1302)
    • โž• Add separate logging for label loss and length loss (#1294)
    • โž• Add tensorizer support for masking of target tokens (#1297)
    • โž• Add length prediction and basic masked generator (#1290)
    • Add self attention option to conv_encoder and conv_decoder (#1291)
    • Entity Saliency modeling on PyText: EntitySalienceMetricReporter/EntitySalienceTask
    • In-batch negative training for BertPairwiseModel
    • ๐Ÿ‘Œ Support embedding from decoder (#1284)
    • โž• Add dense features to Roberta
    • โž• Add projection layer to HuggingFace encoder (#1273)
    • โž• add PyText Embedding TorchScript Wrapper
    • โž• Add option to pad missing label in LabelListTensorizer (#1269)
    • โ†” Integrate PET and Introduce ElasticTrainer (#1266)
    • ๐Ÿ‘Œ support PoolingType in DocNN. (#1259)
    • โž• Added WordSeqEmbedding (#1255)
    • Open source Assistant NLU seq2seq model (#1236)
    • ๐Ÿ‘Œ Support multi label classification
    • BART in decoupled model

    ๐Ÿ› Bug fixes

    • ๐Ÿ›  Fix Incorrect State Dict Assumption (#1326)
    • ๐Ÿ› Bug fix for "RoBERTaTensorizer object has no attribute is_input" (#1334)
    • Cast model output to cpu (#1329)
    • ๐Ÿ›  Fix OSS predict-py API (#1320)
    • ๐Ÿ›  Fix "calling median on empty tensor" issue in MR (#1322)
    • โž• Add ScriptDoNothingTokenizer so that torchscriptification of SPM does not fail (#1316)
    • ๐Ÿ›  Fix creating generator everytime (#1301)
    • ๐Ÿ›  fix dense feature for fp16
    • ๐Ÿ‘€ Avoid edge cases with quantization by setting a known seed (#1295)
    • ๐Ÿ‘‰ Make torchscript predictions even on empty text / token inputs
    • ๐Ÿ›  fix dense feature TorchScript typing (#1281)
    • avoid zero division error in metrics reporter (#1271)
    • ๐Ÿ›  Fix contiguous issue in bilstm export (#1270)
    • ๐Ÿ›  fix debug file generation for multilabel classification (#1247)
    • ๐Ÿ›  Fix fp16 optimizer attribute name

    Other

    • Simplify contextual embedding dimension computation in PyText (#1331)
    • ๐Ÿ†• New Debug File for masked seq2seq
    • ๐Ÿšš Move MockConfigLoader to OSS (#1324)
    • โšก๏ธ Pass in optimizer config instead of create_optimizer to trainer
    • โœ‚ Remove unnecessary torch.no_grad() block (#1323)
    • ๐Ÿ›  Fix Memory Issues in Metric Reporter for Classification Tasks over large Label Spaces
    • โž• Add contextual embedding support to OS seq2seq model (#1299)
    • recover xlm_r tutorial notebook (#1305)
    • Enable controlling bias in MLP decoder
    • Migrate serving tutorial to TorchScript (#1310)
    • โœ‚ delete caffe2 export (#1307)
    • โž• add whitelist for ONNX export
    • ๐Ÿ‘‰ Use dynamic quantization api for BeamSearch (#1303)
    • โœ‚ Remove requirement that eos/bos be supplied for sequence export. (#1300)
    • ๐Ÿ‘ Multicolumn support
    • ๐Ÿ‘ Multicolumn support in torchscriptify
    • โž• Add caching support to RawExample and batch predict API (#1298)
    • โž• Add save-pytext-snapshot command to PyText cmdline (#1285)
    • โšก๏ธ Update with Whatsapp calling data + support dictionary features (#1293)
    • add arrange_caffe2_model_inputs in BaseModel (#1292)
    • โœ… Replace unit-tests on LMModel and FLLanguageModelingTask by LiteLMModel and FLLiteLMTask (#1296)
    • ๐Ÿ”„ changes to make mbart work (#1911)
    • ๐Ÿ– handle encoder and decoder embedding
    • โž• Add tutorial for semantic parsing. (#1288)
    • โž• Add new fb beam search with fused operator (#1287)
    • ๐Ÿ— Move generator builder to constructor so that it can easily overridden. (#1286)
    • Torchscriptify ELTensorizer (#1282)
    • Torchscript export for Seq2Seq model (#1265)
    • ๐Ÿ”„ Change Seq2Seq model from_config() to a more general api (#1280)
    • add max_seq_len to DocNN TorchScript model (#1279)
    • ๐Ÿ‘Œ support XLM-R model Embedding in TorchScript (#1278)
    • Generic PyText Checkpoint Manager Interface (#1267)
    • ๐Ÿ›  Fix backward compatibility issue of pad_missing in LabelListTensorizer (#1277)
    • โšก๏ธ Update mean reduction in NLLLoss (#1272)
    • migrate pages.integrity.scam.docnn_models.xxx (#1275)
    • Unify model input for ByteTokensDocumentModel (#1274)
    • Torchscriptify TokenTensorizer
    • ๐Ÿ‘ Allow dictionaries to overwrite entries with #fairseq:overwrite comment (#1073)
    • ๐Ÿ‘‰ Make WordSeqEmbedding ONNX compatible
    • If the snapshot path provided is not valid, throw error (#1268)
    • ๐Ÿ‘Œ support vocab filter by min count
    • Unify input for TorchScript Tensorizers and Models (#1256)
    • Torchscriptify XLM-R
    • โž• Add class logging to task (#1264)
    • โž• Add usage logging to exporter (#1262)
    • โž• Add usage logging across models (#1263)
    • ๐ŸŒฒ Usage logging on data classes (#1261)
    • ๐Ÿ‘ GPT2 BPE add lower casing support (#1260)
    • FAISS Embedding Search Space [3/5]
    • Return len of tokens of each sequence in SeqTokenTensorizer (#1254)
    • Vocab Limited Pretrained Embedding [2/5] (#1248)
    • โž• add Stage.OTHERS and allow TB to print to a seperate prefix not in (TRAIN, TEST, EVAL) (#1258)
    • โž• Add option to skip 2 stage tokenizer and bpe decode sequences in the debug file (#1257)
    • โž• Add Testcase for Wordpiece Tokenizer (#1249)
    • modify accuracy calculation for multi-label classification (#1244)
    • Enable tests in pytext/config:pytext_all_config_test
    • ๐ŸŒฒ Introduce Class Usage Logging (#1243)
    • ๐Ÿ‘‰ Make PyText compatible with Any type (#1242)
    • ๐Ÿ‘‰ Make dict_embedding Torchscript friendly (#1240)
    • ๐Ÿ‘Œ Support MultipleData for export and kd generation
    • โœ‚ delete flaky/broken tests (#1238)
    • โž• Add support for returning start & end indices.