Openai tokenizer
OpenAiTokenizer
Bases: BaseTokenizer
Source code in griptape/griptape/tokenizers/openai_tokenizer.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
DEFAULT_ENCODING = 'cl100k_base'
class-attribute
instance-attribute
DEFAULT_MAX_TOKENS = 2049
class-attribute
instance-attribute
DEFAULT_OPENAI_GPT_3_CHAT_MODEL = 'gpt-3.5-turbo'
class-attribute
instance-attribute
DEFAULT_OPENAI_GPT_3_COMPLETION_MODEL = 'text-davinci-003'
class-attribute
instance-attribute
DEFAULT_OPENAI_GPT_4_MODEL = 'gpt-4'
class-attribute
instance-attribute
EMBEDDING_MODELS = ['text-embedding-ada-002', 'text-embedding-ada-001']
class-attribute
instance-attribute
MODEL_PREFIXES_TO_MAX_TOKENS = {'gpt-4-1106': 128000, 'gpt-4-32k': 32768, 'gpt-4': 8192, 'gpt-3.5-turbo-16k': 16384, 'gpt-3.5-turbo': 4096, 'gpt-35-turbo-16k': 16384, 'gpt-35-turbo': 4096, 'text-davinci-003': 4097, 'text-davinci-002': 4097, 'code-davinci-002': 8001, 'text-embedding-ada-002': 8191, 'text-embedding-ada-001': 2046}
class-attribute
instance-attribute
TOKEN_OFFSET = 8
class-attribute
instance-attribute
encoding: tiktoken.Encoding
property
max_tokens: int
property
model: str = field(kw_only=True)
class-attribute
instance-attribute
count_tokens(text, model=None)
Handles the special case of ChatML. Implementation adopted from the official OpenAI notebook: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb