Tokenization is the process of breaking a text into smaller units such as words, phrases, or sentences.