Questions
- RoBERTa uses a byte-level byte-pair encoding tokenizer. (True/False)
- A trained Hugging Face tokenizer produces
merges.txt
andvocab.json
. (True/False) - RoBERTa does not use token-type IDs. (True/False)
- DistilBERT has 6 layers and 12 heads. (True/False)
- A transformer model with 80 million parameters is enormous. (True/False)
- We cannot train a tokenizer. (True/False)
- A BERT-like model has 6 decoder layers. (True/False)
- Masked Language Modeling (MLM) predicts a word contained in a mask token in a sentence. (True/False)
- A BERT-like model has no self-attention sublayers. (True/False)
- Data collators are helpful for backpropagation. (True/False)