Welcome to our interactive demo on tokenization!
You can enter text in the input below and then click the Tokenize text button to see how LLMs view the text you send them.
Tokenization is the process of converting a string of text into smaller, manageable units called tokens. These tokens are the basic building blocks that machines use to understand and generate human language.
In this demo, you can type any text into the input field and click the "Tokenize Text" button. Behind the scenes, we use the tiktoken
library to convert your input into tokens.
This process mimics how LLMs process text when we submit it to them. By seeing your text split into tokens, you can get a clearer picture of how these models interpret and work with language.
Run it yourself
This demo is open-source and available here. Feel free to browse the code or run or modify the project as you see fit.