Questions
- A tokenized dictionary contains every word that exists in a language. (True/False)
- Pretrained tokenizers can encode any dataset. (True/False)
- It is good practice to check a database before using it. (True/False)
- It is good practice to eliminate obscene data from datasets. (True/False)
- It is good practice to delete data containing discriminating assertions. (True/False)
- Raw datasets might sometimes produce relationships between noisy content and useful content. (True/False)
- A standard pretrained tokenizer contains the English vocabulary of the past 700 years. (True/False)
- Old English can create problems when encoding data with a tokenizer trained in modern English. (True/False)
- Medical and other types of jargon can create problems when encoding data with a tokenizer trained in modern English. (True/False)
- Controlling the output of the encoded data produced by a pretrained tokenizer is good practice. (True/False...