The process of generating spoken language from text, using techniques such as text-to-speech synthesis and voice cloning.