pytext v0.3.2 Release Notes
Release Date: 2020-04-27 // about 3 years ago-
๐ New features
- โ Add Roberta model into BertPairwiseModel (#1336)
- ๐ Support read file from http URL (#1317)
- add a new PyText get_num_examples_from_batch function in model (#1319)
- โ Add support for length label smoothing (#1308)
- โ Add new metrics type for Masked Seq2Seq Joint Model (#1304)
- โ Add mask generator and strategy (#1302)
- โ Add separate logging for label loss and length loss (#1294)
- โ Add tensorizer support for masking of target tokens (#1297)
- โ Add length prediction and basic masked generator (#1290)
- Add self attention option to conv_encoder and conv_decoder (#1291)
- Entity Saliency modeling on PyText: EntitySalienceMetricReporter/EntitySalienceTask
- In-batch negative training for BertPairwiseModel
- ๐ Support embedding from decoder (#1284)
- โ Add dense features to Roberta
- โ Add projection layer to HuggingFace encoder (#1273)
- โ add PyText Embedding TorchScript Wrapper
- โ Add option to pad missing label in LabelListTensorizer (#1269)
- โ Integrate PET and Introduce ElasticTrainer (#1266)
- ๐ support PoolingType in DocNN. (#1259)
- โ Added WordSeqEmbedding (#1255)
- Open source Assistant NLU seq2seq model (#1236)
- ๐ Support multi label classification
- BART in decoupled model
๐ Bug fixes
- ๐ Fix Incorrect State Dict Assumption (#1326)
- ๐ Bug fix for "RoBERTaTensorizer object has no attribute is_input" (#1334)
- Cast model output to cpu (#1329)
- ๐ Fix OSS predict-py API (#1320)
- ๐ Fix "calling median on empty tensor" issue in MR (#1322)
- โ Add ScriptDoNothingTokenizer so that torchscriptification of SPM does not fail (#1316)
- ๐ Fix creating generator everytime (#1301)
- ๐ fix dense feature for fp16
- ๐ Avoid edge cases with quantization by setting a known seed (#1295)
- ๐ Make torchscript predictions even on empty text / token inputs
- ๐ fix dense feature TorchScript typing (#1281)
- avoid zero division error in metrics reporter (#1271)
- ๐ Fix contiguous issue in bilstm export (#1270)
- ๐ fix debug file generation for multilabel classification (#1247)
- ๐ Fix fp16 optimizer attribute name
Other
- Simplify contextual embedding dimension computation in PyText (#1331)
- ๐ New Debug File for masked seq2seq
- ๐ Move MockConfigLoader to OSS (#1324)
- โก๏ธ Pass in optimizer config instead of create_optimizer to trainer
- โ Remove unnecessary torch.no_grad() block (#1323)
- ๐ Fix Memory Issues in Metric Reporter for Classification Tasks over large Label Spaces
- โ Add contextual embedding support to OS seq2seq model (#1299)
- recover xlm_r tutorial notebook (#1305)
- Enable controlling bias in MLP decoder
- Migrate serving tutorial to TorchScript (#1310)
- โ delete caffe2 export (#1307)
- โ add whitelist for ONNX export
- ๐ Use dynamic quantization api for BeamSearch (#1303)
- โ Remove requirement that eos/bos be supplied for sequence export. (#1300)
- ๐ Multicolumn support
- ๐ Multicolumn support in torchscriptify
- โ Add caching support to RawExample and batch predict API (#1298)
- โ Add save-pytext-snapshot command to PyText cmdline (#1285)
- โก๏ธ Update with Whatsapp calling data + support dictionary features (#1293)
- add arrange_caffe2_model_inputs in BaseModel (#1292)
- โ Replace unit-tests on LMModel and FLLanguageModelingTask by LiteLMModel and FLLiteLMTask (#1296)
- ๐ changes to make mbart work (#1911)
- ๐ handle encoder and decoder embedding
- โ Add tutorial for semantic parsing. (#1288)
- โ Add new fb beam search with fused operator (#1287)
- ๐ Move generator builder to constructor so that it can easily overridden. (#1286)
- Torchscriptify ELTensorizer (#1282)
- Torchscript export for Seq2Seq model (#1265)
- ๐ Change Seq2Seq model from_config() to a more general api (#1280)
- add max_seq_len to DocNN TorchScript model (#1279)
- ๐ support XLM-R model Embedding in TorchScript (#1278)
- Generic PyText Checkpoint Manager Interface (#1267)
- ๐ Fix backward compatibility issue of pad_missing in LabelListTensorizer (#1277)
- โก๏ธ Update mean reduction in NLLLoss (#1272)
- migrate pages.integrity.scam.docnn_models.xxx (#1275)
- Unify model input for ByteTokensDocumentModel (#1274)
- Torchscriptify TokenTensorizer
- ๐ Allow dictionaries to overwrite entries with #fairseq:overwrite comment (#1073)
- ๐ Make WordSeqEmbedding ONNX compatible
- If the snapshot path provided is not valid, throw error (#1268)
- ๐ support vocab filter by min count
- Unify input for TorchScript Tensorizers and Models (#1256)
- Torchscriptify XLM-R
- โ Add class logging to task (#1264)
- โ Add usage logging to exporter (#1262)
- โ Add usage logging across models (#1263)
- ๐ฒ Usage logging on data classes (#1261)
- ๐ GPT2 BPE add lower casing support (#1260)
- FAISS Embedding Search Space [3/5]
- Return len of tokens of each sequence in SeqTokenTensorizer (#1254)
- Vocab Limited Pretrained Embedding [2/5] (#1248)
- โ add Stage.OTHERS and allow TB to print to a seperate prefix not in (TRAIN, TEST, EVAL) (#1258)
- โ Add option to skip 2 stage tokenizer and bpe decode sequences in the debug file (#1257)
- โ Add Testcase for Wordpiece Tokenizer (#1249)
- modify accuracy calculation for multi-label classification (#1244)
- Enable tests in pytext/config:pytext_all_config_test
- ๐ฒ Introduce Class Usage Logging (#1243)
- ๐ Make PyText compatible with Any type (#1242)
- ๐ Make dict_embedding Torchscript friendly (#1240)
- ๐ Support MultipleData for export and kd generation
- โ delete flaky/broken tests (#1238)
- โ Add support for returning start & end indices.