Tokenizers
__all__ = ['BaseTokenizer', 'OpenAiTokenizer', 'CohereTokenizer', 'HuggingFaceTokenizer', 'AnthropicTokenizer', 'BedrockTitanTokenizer', 'BedrockJurassicTokenizer', 'BedrockClaudeTokenizer']
module-attribute
AnthropicTokenizer
Bases: BaseTokenizer
Source code in griptape/griptape/tokenizers/anthropic_tokenizer.py
DEFAULT_MAX_TOKENS = 100000
class-attribute
instance-attribute
DEFAULT_MODEL = 'claude-2'
class-attribute
instance-attribute
max_tokens: int
property
model: str = field(kw_only=True)
class-attribute
instance-attribute
BaseTokenizer
Bases: ABC
Source code in griptape/griptape/tokenizers/base_tokenizer.py
max_tokens: int
abstractmethod
property
stop_sequences: list[str] = field(default=Factory(lambda : [utils.constants.RESPONSE_STOP_SEQUENCE]), kw_only=True)
class-attribute
instance-attribute
count_tokens(text)
abstractmethod
BedrockClaudeTokenizer
Bases: AnthropicTokenizer
Source code in griptape/griptape/tokenizers/bedrock_claude_tokenizer.py
DEFAULT_MAX_TOKENS = 8192
class-attribute
instance-attribute
DEFAULT_MODEL = 'anthropic.claude-v2'
class-attribute
instance-attribute
BedrockJurassicTokenizer
Bases: BaseTokenizer
Source code in griptape/griptape/tokenizers/bedrock_jurassic_tokenizer.py
DEFAULT_MAX_TOKENS = 8192
class-attribute
instance-attribute
DEFAULT_MODEL = 'ai21.j2-ultra-v1'
class-attribute
instance-attribute
bedrock_client: Any = field(default=Factory(lambda : self.session.client('bedrock-runtime'), takes_self=True), kw_only=True)
class-attribute
instance-attribute
max_tokens: int
property
model: str = field(kw_only=True)
class-attribute
instance-attribute
session: boto3.Session = field(default=Factory(lambda : import_optional_dependency('boto3').Session()), kw_only=True)
class-attribute
instance-attribute
count_tokens(text)
Source code in griptape/griptape/tokenizers/bedrock_jurassic_tokenizer.py
BedrockTitanTokenizer
Bases: BaseTokenizer
Source code in griptape/griptape/tokenizers/bedrock_titan_tokenizer.py
DEFAULT_EMBEDDING_MODELS = 'amazon.titan-embed-text-v1'
class-attribute
instance-attribute
DEFAULT_MAX_TOKENS = 4096
class-attribute
instance-attribute
DEFAULT_MODEL = 'amazon.titan-text-express-v1'
class-attribute
instance-attribute
bedrock_client: Any = field(default=Factory(lambda : self.session.client('bedrock-runtime'), takes_self=True), kw_only=True)
class-attribute
instance-attribute
max_tokens: int
property
model: str = field(kw_only=True)
class-attribute
instance-attribute
session: boto3.Session = field(default=Factory(lambda : import_optional_dependency('boto3').Session()), kw_only=True)
class-attribute
instance-attribute
stop_sequences: list[str] = field(factory=list, kw_only=True)
class-attribute
instance-attribute
count_tokens(text)
Source code in griptape/griptape/tokenizers/bedrock_titan_tokenizer.py
CohereTokenizer
Bases: BaseTokenizer
Source code in griptape/griptape/tokenizers/cohere_tokenizer.py
DEFAULT_MODEL = 'command'
class-attribute
instance-attribute
MAX_TOKENS = 2048
class-attribute
instance-attribute
client: Client = field(kw_only=True)
class-attribute
instance-attribute
max_tokens: int
property
model: str = field(kw_only=True)
class-attribute
instance-attribute
HuggingFaceTokenizer
Bases: BaseTokenizer
Source code in griptape/griptape/tokenizers/hugging_face_tokenizer.py
max_tokens: int = field(default=Factory(lambda : self.tokenizer.model_max_length, takes_self=True), kw_only=True)
class-attribute
instance-attribute
tokenizer: PreTrainedTokenizerBase = field(kw_only=True)
class-attribute
instance-attribute
OpenAiTokenizer
Bases: BaseTokenizer
Source code in griptape/griptape/tokenizers/openai_tokenizer.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
DEFAULT_ENCODING = 'cl100k_base'
class-attribute
instance-attribute
DEFAULT_MAX_TOKENS = 2049
class-attribute
instance-attribute
DEFAULT_OPENAI_GPT_3_CHAT_MODEL = 'gpt-3.5-turbo'
class-attribute
instance-attribute
DEFAULT_OPENAI_GPT_3_COMPLETION_MODEL = 'text-davinci-003'
class-attribute
instance-attribute
DEFAULT_OPENAI_GPT_4_MODEL = 'gpt-4'
class-attribute
instance-attribute
EMBEDDING_MODELS = ['text-embedding-ada-002', 'text-embedding-ada-001']
class-attribute
instance-attribute
MODEL_PREFIXES_TO_MAX_TOKENS = {'gpt-4-1106': 128000, 'gpt-4-32k': 32768, 'gpt-4': 8192, 'gpt-3.5-turbo-16k': 16384, 'gpt-3.5-turbo': 4096, 'gpt-35-turbo-16k': 16384, 'gpt-35-turbo': 4096, 'text-davinci-003': 4097, 'text-davinci-002': 4097, 'code-davinci-002': 8001, 'text-embedding-ada-002': 8191, 'text-embedding-ada-001': 2046}
class-attribute
instance-attribute
TOKEN_OFFSET = 8
class-attribute
instance-attribute
encoding: tiktoken.Encoding
property
max_tokens: int
property
model: str = field(kw_only=True)
class-attribute
instance-attribute
count_tokens(text, model=None)
Handles the special case of ChatML. Implementation adopted from the official OpenAI notebook: https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb