References
- Al-Sabbagh, K.W., et al. Selective regression testing based on big data: comparing feature extraction techniques. in 2020 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). 2020. IEEE.
- Staron, M., et al. Improving Quality of Code Review Datasets–Token-Based Feature Extraction Method. in Software Quality: Future Perspectives on Software Engineering Quality: 13th International Conference, SWQD 2021, Vienna, Austria, January 19–21, 2021, Proceedings 13. 2021. Springer.
- Sennrich, R., B. Haddow, and A. Birch, Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909, 2015.
- Gage, P., A new algorithm for data compression. C Users Journal, 1994. 12(2): p. 23-38.
- Kudo, T. and J. Richardson, SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. arXiv preprint arXiv:1808.06226, 2018.