OpenAI Toolkit

OpenAI Fine Tuning API

class FineTuningAPI(openai_key=None, enable_timeouts=False, timeouts_options=None)[source]

Bases: object

A class to interact with the OpenAI API, specifically for fine-tuning operations.

This class initializes a client for the OpenAI API, handling the API key validation and configuring timeout options for the API requests. It is designed to work with fine-tuning tasks, providing an interface to interact with OpenAI’s fine-tuning capabilities.

Parameters:

openai_key (str, optional) – The OpenAI API key. If not provided, it defaults to the value set in the environment variable ‘OPENAI_API_KEY’, optional
enable_timeouts (bool, optional) – Flag to enable custom timeout settings for API requests. If False, default timeout settings are used, defaults to False
timeouts_options (dict, optional) – A dictionary specifying custom timeout settings. Required if ‘enable_timeouts’ is True. It should contain keys ‘total’, ‘read’, ‘write’, and ‘connect’ with corresponding timeout values in seconds, optional

Raises:

ValueError – If no valid OpenAI API key is provided or found in the environment variable

cancel_fine_tuning_job(job_id: str) → Dict[source]

Cancel a specific fine-tuning job.

This method allows for the cancellation of a fine-tuning job identified by its job ID. It interacts with the OpenAI API to send a cancellation request and handles various potential errors that might occur during this process.

Method cancel_fine_tuning_job:

Cancels a fine-tuning job.

Parameters:

job_id (str) – The ID of the fine-tuning job to cancel.

Returns:

Confirmation of the cancellation.

Return type:

Dict

Raises:

ValueError – If the job_id is not provided.
openai.APIConnectionError – If there’s a connection error with the API.
openai.RateLimitError – If the request is rate-limited by the API.
openai.APIStatusError – If there’s a status error from the API.
AttributeError – If an attribute error occurs during the process.
Exception – For any other exceptions that occur during the cancellation process.

Example:

>>> api = FineTuningAPI(api_key="your-api-key")
>>> cancellation_result = api.cancel_fine_tuning_job(job_id="ft-xyz789")
>>> print(cancellation_result)

create_fine_tune_file(file_path: str, purpose: str | None = 'fine-tune') → str[source]

Uploads a specified file to OpenAI for fine-tuning purposes and returns the file’s identifier.

This method is integral for preparing datasets for language model fine-tuning on OpenAI’s platform. It takes a local file path, uploads the file, and returns the unique identifier of the uploaded file. The method is robust, encapsulating error handling for file accessibility and API interaction issues.

Parameters:

file_path (str) – Absolute or relative path to the JSONL file designated for fine-tuning.
purpose (str, optional) – Intended use of the uploaded file, influencing how OpenAI processes the file. Defaults to ‘fine-tune’.

Returns:

Unique identifier of the uploaded file, typically used for subsequent API interactions.

Return type:

str

Raises:

FileNotFoundError – Raised when the specified file_path does not point to an existing file.
PermissionError – Raised when access to the specified file is restricted due to insufficient permissions.
Exception – Generic exception for capturing and signaling failures during the API upload process.

Example:

>>> api = FineTuningAPI(api_key="sk-your-api-key")
>>> file_id = api.create_fine_tune_file("/path/to/your/dataset.jsonl")
>>> print(file_id)
>>> >'file-xxxxxxxxxxxxxxxxxxxxx'

Start a fine-tuning job using the OpenAI Python SDK.

This method initiates a fine-tuning job with the specified model and training file. It allows customization of additional parameters such as batch size, learning rate multiplier, number of epochs, and the validation file.

Method create_fine_tuning_job:

Initiates a fine-tuning job for a model.

Parameters:

training_file (str) – The file ID of the training data uploaded to OpenAI API.
model (str) – The name of the model to fine-tune.
suffix (str, optional) – A suffix to append to the fine-tuned model’s name, optional.
batch_size (str or int, optional) – Number of examples in each batch, can be a specific number or ‘auto’, optional.
learning_rate_multiplier (str or float, optional) – Scaling factor for the learning rate, can be a specific number or ‘auto’, optional.
n_epochs (str or int, optional) – The number of epochs to train the model for, can be a specific number or ‘auto’, optional.
validation_file (str, optional) – The file ID of the validation data uploaded to OpenAI API, optional.

Returns:

A dictionary containing information about the fine-tuning job, including its ID.

Return type:

dict

Raises:

ValueError – If the training_file is not provided.
Exception – If an error occurs during the creation of the fine-tuning job.

Example:

>>> api = FineTuningAPI(api_key="your-api-key")
>>> job_info = api.create_fine_tuning_job(training_file="file-abc123", 
                                        model="gpt-3.5-turbo",
                                        suffix="custom-model-name",
                                        batch_size=4,
                                        learning_rate_multiplier=0.1,
                                        n_epochs=2,
                                        validation_file="file-def456")
>>> print(job_info)
{'id': 'ft-xyz789', ...}

delete_fine_tuned_model(model_id: str) → Dict[source]

Delete a fine-tuned model. The caller must be the owner of the organization the model was created in.

This method facilitates the deletion of a fine-tuned model identified by its model ID. It manages the API interaction to delete the model and handles various potential errors that might occur during this process.

Method delete_fine_tuned_model:

Deletes a fine-tuned model.

Parameters:

model_id (str) – The ID of the fine-tuned model to delete.

Returns:

Confirmation of the deletion.

Return type:

Dict

Raises:

ValueError – If the model_id is not provided.
openai.APIConnectionError – If there’s a connection error with the API.
openai.RateLimitError – If the request is rate-limited by the API.
openai.APIStatusError – If there’s a status error from the API.
AttributeError – If an attribute error occurs during the process.
Exception – For any other exceptions that occur during the process.

Example:

>>> api = FineTuningAPI(api_key="your-api-key")
>>> deletion_result = api.delete_fine_tuned_model(model_id="ft-model-12345")
>>> print(deletion_result)

list_events_fine_tuning_job(fine_tuning_job_id: str, limit: int = 10) → List[Dict][source]

List up to a specified number of events from a fine-tuning job.

This method retrieves a list of events associated with a specific fine-tuning job, identified by its job ID. It allows setting a limit on the number of events to be returned and handles various potential errors that might occur during the API interaction.

Method list_events_fine_tuning_job:

Retrieves a list of events from a specified fine-tuning job.

Parameters:

fine_tuning_job_id (str) – The ID of the fine-tuning job to list events from.
limit (int, optional) – The maximum number of events to return, defaults to 10.

Returns:

A list of dictionaries, each representing an event from the fine-tuning job.

Return type:

List[Dict]

Raises:

ValueError – If the fine_tuning_job_id is not provided.
openai.APIConnectionError – If there’s a connection error with the API.
openai.RateLimitError – If the request is rate-limited by the API.
openai.APIStatusError – If there’s a status error from the API.
AttributeError – If an attribute error occurs during the process.
Exception – For any other exceptions that occur during the process.

Example:

>>> api = FineTuningAPI(api_key="your-api-key")
>>> events = api.list_events_fine_tuning_job(fine_tuning_job_id="ft-xyz789", limit=5)
>>> for event in events:
>>>     print(event)

list_fine_tune_files() → List[Dict][source]

List files that have been uploaded to OpenAI for fine-tuning.

This method allows the retrieval of a list of files uploaded to the OpenAI API, primarily for the purpose of fine-tuning models. The list includes comprehensive details such as file IDs, creation dates, and the purposes of the files.

Method list_fine_tune_files:: Retrieves a list of uploaded files for fine-tuning.
Returns:: A list of dictionaries, each containing details of an uploaded file.
Return type:: List[Dict]
Raises:: Exception – If an error occurs during the API request.
Example:

>>> api = FineTuningAPI(api_key="your-api-key")
>>> files = api.list_fine_tune_files()
>>> for file in files:
>>>     print(file)

list_fine_tuning_jobs(limit: int = 10) → List[Dict][source]

List the fine-tuning jobs with an option to limit the number of jobs returned.

This method retrieves a list of fine-tuning jobs. An optional parameter ‘limit’ can be set to restrict the number of jobs returned. It interacts with the OpenAI API and processes the response to provide a concise list of fine-tuning jobs.

Method list_fine_tuning_jobs:: Retrieves a list of fine-tuning jobs.
Parameters:: limit (int, optional) – The maximum number of fine-tuning jobs to return, defaults to 10.
Returns:: A list of dictionaries, each representing a fine-tuning job.
Return type:: List[Dict]
Raises:: openai.error.OpenAIError – If an error occurs with the OpenAI API request.
Example:

>>> api = FineTuningAPI(api_key="your-api-key")
>>> jobs = api.list_fine_tuning_jobs(limit=5)
>>> for job in jobs:
>>>     print(job)

retrieve_fine_tuning_job(job_id: str) → Dict[source]

Retrieve the state of a specific fine-tuning job.

This method is used to obtain detailed information about a specific fine-tuning job, identified by its job ID. It interacts with the OpenAI API to retrieve and present the state and other relevant details of the requested fine-tuning job.

Method retrieve_fine_tuning_job:

Retrieves details of a specific fine-tuning job.

Parameters:

job_id (str) – The ID of the fine-tuning job to retrieve.

Returns:

A dictionary containing details about the fine-tuning job.

Return type:

Dict

Raises:

ValueError – If the job_id is not provided.
openai.error.OpenAIError – If an error occurs with the OpenAI API request.

Example:

>>> api = FineTuningAPI(api_key="your-api-key")
>>> job_details = api.retrieve_fine_tuning_job(job_id="ft-xyz789")
>>> print(job_details)

run_dashboard()[source]

This method runs a dashboard for various fine-tuning operations related to a model.

Method run_dashboard:: Launches an interactive dashboard allowing the user to perform various operations related to fine-tuning a model.
Choice:: User’s choice from the dashboard menu for different operations.
File_path:: File path for creating a fine-tune file.
Purpose:: Purpose of the file, either for fine-tuning or other purposes.
Training_file:: ID of the training file used for creating a fine-tuning job.
Model:: Name of the model used for fine-tuning.
Suffix:: Suffix for the fine-tuned model name.
Batch_size:: Batch size for training, either automatic or a specific number.
Learning_rate_multiplier:: Learning rate multiplier, either automatic or a specific number.
N_epochs:: Number of epochs for training, either automatic or a specific number.
Validation_file:: ID of the validation file, if provided.
Job_id:: ID of the fine-tuning job for retrieving state, cancelling, or listing events.
Model_name:: Name of the fine-tuned model for usage.
User_prompt:: User prompt for testing the fine-tuned model.
System_prompt:: System prompt for testing the fine-tuned model.
Model_id:: ID of the fine-tuned model to be deleted.
Returns:: None

use_fine_tuned_model(model_name: str, user_prompt: str, system_prompt='You are a helpful assistant.') → str[source]

This method enables interaction with a fine-tuned model to generate responses based on provided messages.

Method use_fine_tuned_model:

Uses a specified fine-tuned model to generate responses to messages.

Parameters:

model_name (str) – The name of the fine-tuned model used for generating responses.
user_prompt (str) – The user’s message prompt for the model.
system_prompt (str, optional) – A predefined system message prompt, defaulting to “You are a helpful assistant.”

Returns:

The response generated by the fine-tuned model.

Return type:

str

Raises:

Exception – If an error occurs during the API request or while processing the response.

Example:

>>> api = FineTuningAPI(api_key="your-api-key")
>>> response = api.use_fine_tuned_model(
    "ft:gpt-3.5-turbo:my-org:custom_suffix:id", 
    user_prompt="Hello!",
    system_prompt="You are a helpful assistant."
)
>>> print(response)
'Response from the model...'

OpenAI Pricing

class OpenAIPricing(json_data: dict)[source]

Bases: object

A class dedicated to handling and accessing OpenAI’s pricing data for various models and services.

This class simplifies the process of accessing detailed pricing information for OpenAI’s diverse range of models, including language models, assistants API, fine-tuning models, embedding models, base models, image models, and audio models.

The pricing data is structured as follows:

Language Models: Pricing for various language models like GPT-4, GPT-3.5, etc., including input and output token costs.

Assistants API: Pricing for tools such as Code Interpreter and Retrieval, including cost per session and any special notes.

Fine Tuning Models: Costs associated with training and input/output usage for models like gpt-3.5-turbo and davinci-002.

Embedding Models: Usage costs for models such as ada v2.

Base Models: Token usage costs for base models like davinci-002 and babbage-002.

Image Models: Pricing for different resolutions and quality levels in image models like DALL·E 3 and DALL·E 2.

Audio Models: Usage costs for models like Whisper and Text-To-Speech (TTS) models.

This information provides a comprehensive overview of the pricing structure, aiding users in making informed decisions based on their specific needs.

Method __init__:: Initialize the OpenAIPricing instance with JSON-formatted pricing data.
Parameters:: json_data (dict) – The JSON data containing the pricing information of OpenAI models and services.
Returns:: An instance of OpenAIPricing with parsed pricing data.
Return type:: OpenAIPricing
Example:

>>> json_data = {
        "release_date": "2023-11-15",
        "pricing": {
            "language_models": {
                "GPT-4 Turbo": {
                    "context": "128k context, fresher knowledge ...",
                    "models": {
                        "gpt-4-1106-preview": {"input": 0.01, "output": 0.03}
                    }
                }
            }
        }
    }
>>> pricing = OpenAIPricing(json_data=json_data)
>>> print(type(pricing))
<class 'OpenAIPricing'>

calculate_token_usage_for_dataset(dataset_path, model='gpt-3.5-turbo-0613')[source]

Calculates the total number of tokens used by a dataset, based on the provided model’s tokenization scheme.

Method calculate_token_usage_for_dataset:

Estimate the total token count for a dataset using a specified model.

Parameters:

dataset_path (str) – The file path of the dataset for which token usage is to be calculated.
model (str, optional) – Identifier of the model for estimating token count. Defaults to “gpt-3.5-turbo-0613”.

Returns:

The total token count for the dataset as per the model’s encoding scheme.

Return type:

int

Example:

>>> calculate_token_usage_for_dataset("dataset.jsonl")
# Assuming the model 'gpt-3.5-turbo-0613', this returns the total token count for the dataset.

calculate_token_usage_for_messages(messages, model='gpt-3.5-turbo-0613')[source]

Calculates the total number of tokens used by a list of messages, considering the specified model’s tokenization scheme.

Method calculate_token_usage_for_messages:

Determine the total token count for a given set of messages based on the model’s encoding.

Parameters:

messages (list of dict) – A list of dictionaries representing messages, with keys such as ‘role’, ‘name’, and ‘content’.
model (str, optional) – Identifier of the model for estimating token count. Defaults to “gpt-3.5-turbo-0613”.

Returns:

The total token count for the provided messages as encoded by the specified model.

Return type:

int

Raises:

KeyError – If the token encoding for the specified model is not found in the encoding data.
NotImplementedError – If the function does not support token counting for the given model.

Example:

>>> messages = [{"role": "user", "content": "Hello!"}, 
...             {"role": "assistant", "content": "Hi there!"}]
>>> calculate_token_usage_for_messages(messages)
# Assuming the model 'gpt-3.5-turbo-0613', this returns the total token count for the messages.

estimate_finetune_training_cost(number_of_tokens: int, model_name: str = 'gpt-3.5-turbo') → float[source]

Estimates the cost of training or fine-tuning a model based on token count and model selection.

Method estimate_finetune_training_cost:

Calculate the estimated cost for training or fine-tuning.

Parameters:

number_of_tokens (int) – The total number of tokens that will be processed during training or fine-tuning.
model_name (str, optional) – The name of the AI model for which the training or fine-tuning is being estimated. Defaults to ‘gpt-3.5-turbo’ if not specified.

Returns:

The calculated cost of training or fine-tuning for the given number of tokens and the specified model.

Return type:

float

Raises:

ValueError – If the pricing data for the specified model is unavailable, resulting in an inability to estimate the cost.

Example:

>>> estimate_finetune_training_cost(10000, "gpt-3.5-turbo")
# Assuming a hypothetical cost calculation, this could return a float representing the cost.

estimate_inference_cost(input_tokens: int, output_tokens: int, model_name: str = 'gpt-3.5-turbo') → float[source]

Estimates the cost of inference operations based on the number of input and output tokens, as well as the chosen model.

Method estimate_inference_cost:

Calculate the estimated cost for inference operations.

Parameters:

input_tokens (int) – The number of tokens to be processed as input during the inference.
output_tokens (int) – The number of tokens expected to be generated as output during the inference.
model_name (str, optional) – The name of the AI model used for the inference operation. Defaults to ‘gpt-3.5-turbo’ if not specified.

Returns:

The calculated cost of inference for the specified number of input and output tokens with the given model.

Return type:

float

Raises:

ValueError – If there is no pricing information available for the specified model, thus hindering the cost estimation.

Example:

>>> estimate_inference_cost(100, 50, "gpt-3.5-turbo")
# Assuming hypothetical cost rates, this will return a float representing the estimated cost for 100 input tokens and 50 output tokens using 'gpt-3.5-turbo'.

get_assistants_api_pricing(tool_name=None)[source]

Retrieves pricing information for the Assistants API tools from the stored pricing data.

This method provides the ability to access pricing details for specific tools or all tools within the Assistants API category. When a specific tool name is provided, it returns the pricing for that tool. If no tool name is specified, the method returns pricing information for all tools under the Assistants API.

Parameters:

tool_name (str, optional): The name of the specific Assistants API tool for which pricing: information is required. If None, returns pricing for all tools.

Returns:

dict: A dictionary containing the pricing information. If a specific tool name is given,: the dictionary includes the tool’s cost and any additional notes. If no tool name is specified, it returns a dictionary with tool names as keys and their respective pricing information as values.

Raises:

ValueError: If a specific tool_name is provided but not found in the Assistants API pricing data.

Example:

>>> pricing_data.get_assistants_api_pricing('Code interpreter')
{'input': 0.03, 'note': 'Free until 11/17/2023'}

get_audio_model_pricing(model_name=None)[source]

Retrieves pricing information for audio models from the stored pricing data.

This method is intended to provide detailed pricing information for audio-related models, including services like transcription and text-to-speech. Users can specify a particular audio model to get its pricing details. If no model name is provided, the method returns pricing information for all available audio models.

Parameters:

model_name (str, optional): The name of the specific audio model for which pricing information is sought.: If None, the method returns pricing information for all audio models.

Returns:

dict or None: A dictionary containing the pricing information for the requested audio model(s).: Each key in the dictionary represents a model name, with corresponding pricing details. Returns None if the requested model name does not exist in the pricing data.

Raises:

ValueError: If the specified model_name is not found in the audio models pricing data.

Example:

>>> pricing_data = OpenAIPricing(json_data)
>>> pricing_data.get_audio_model_pricing('Whisper')
{'usage': 0.006}

>>> pricing_data.get_audio_model_pricing()
{
    'Whisper': {'usage': 0.006},
    'TTS': {'usage': 0.015},
    'TTS HD': {'usage': 0.03}
}

get_base_model_pricing(model_name=None)[source]

Retrieves pricing information for base models from the stored pricing data.

This method facilitates access to cost details associated with the use of base language models. Users can obtain pricing information for a specific model if the model name is provided. If no model name is given, the method returns pricing details for all base models.

Parameters:

model_name (str, optional): The name of the specific base model for which pricing information is sought.: If None, the method returns pricing information for all base models.

Returns:

dict: A dictionary containing pricing details. If a specific model name is provided,: it returns a dictionary with details about the usage cost for that model. If no model name is specified, it returns a dictionary with each model name as a key and their respective pricing details as values.

Raises:

ValueError: If a model_name is provided but not found in the base models pricing data.

Example:

>>> pricing_data = OpenAIPricing(json_data)
>>> pricing_data.get_base_model_pricing('davinci-002')
{'usage': 0.002}

>>> pricing_data.get_base_model_pricing()
{'davinci-002': {'usage': 0.002}, 'babbage-002': {'usage': 0.0004}}

get_embedding_model_pricing(model_name=None)[source]

Retrieves pricing information for embedding models from the stored pricing data.

This method allows for fetching pricing details of specific embedding models or all embedding models, based on the provided model name. If a model name is specified, it returns the pricing for that particular embedding model. If no model name is given, it returns pricing details for all embedding models in a comprehensive dictionary.

Parameters:

model_name (str, optional): The name of the specific embedding model for which pricing information is desired.: If None, pricing information for all embedding models is returned.

Returns:

dict: A dictionary containing the pricing information. If a specific model name is provided,: the dictionary includes the ‘usage’ cost for that model. If no model name is specified, it returns a dictionary with model names as keys and their respective pricing information as values.

Raises:

ValueError: If a model_name is specified but not found in the embedding models pricing data.

Example:

>>> pricing_data = OpenAIPricing(json_data)
>>> pricing_data.get_embedding_model_pricing('ada v2')
{'usage': 0.0001}

get_fine_tuning_model_pricing(model_name=None)[source]

Retrieves pricing information for fine-tuning models from the stored pricing data.

This method is designed to access the cost associated with training, input usage, and output usage for fine-tuning models. Users can specify a particular model to obtain its specific pricing details. If no model name is provided, the method returns comprehensive pricing information for all available fine-tuning models.

Parameters:

model_name (str, optional): The name of the specific fine-tuning model for which pricing: information is desired. If None, pricing information for all fine-tuning models is returned. The default value is None.

Returns:

dict: A dictionary containing the pricing information. If a specific model name is given,: the dictionary includes keys ‘training’, ‘input_usage’, and ‘output_usage’ with their respective costs. If no model name is specified, it returns a dictionary with model names as keys and their respective pricing details as values.

Raises:

ValueError: If the provided model_name does not exist in the fine-tuning models pricing data.

Example:

>>> pricing_data = OpenAIPricing(json_data)
>>> pricing_data.get_fine_tuning_model_pricing('gpt-3.5-turbo')
{'training': 0.008, 'input_usage': 0.003, 'output_usage': 0.006}

get_image_model_pricing(model_name=None)[source]

Retrieves pricing information for image models from the stored pricing data.

This method is designed to provide users with pricing details for various image generation models offered by OpenAI. If a specific model or category name is given, it returns the pricing specific to that model or category. If no model name is specified, the method returns pricing information for all available image models.

Parameters:

model_name (str, optional): The name of the specific image model or category for which pricing information is sought.: If None, pricing information for all image models is returned.

Returns:

dict or None: A dictionary containing the pricing information for the requested image model(s).: Each key in the dictionary represents a model name or category, with corresponding pricing details in a nested dictionary. Returns None if the requested model name does not exist in the pricing data.

Raises:

ValueError: If the specified model_name is not found in the image models pricing data.

Example:

>>> pricing_data = OpenAIPricing(json_data)
>>> pricing_data.get_image_model_pricing('DALL·E 3')
{
    'Standard': {
        '1024x1024': 0.04,
        '1024x1792_1792x1024': 0.08
    },
    'HD': {
        '1024x1024': 0.08,
        '1024x1792_1792x1024': 0.12
    }
}

>>> pricing_data.get_image_model_pricing()
{
    'DALL·E 3': { ... },
    'DALL·E 2': { ... }
}

get_language_model_pricing(model_name=None)[source]

Retrieves pricing information for language models based on the provided model name.

This method accesses the stored pricing data to return details for a specific language model or for all language models. If a model name is specified, it fetches pricing for that particular model. If no model name is given, it returns comprehensive pricing details for all language models.

Parameters:: model_name (Optional[str]) – The name of a specific language model for which pricing information is required. If None, pricing information for all language models is returned.
Returns:: A dictionary containing the pricing information. If a specific model name is provided, it returns a dictionary with ‘input’ and ‘output’ costs. Otherwise, it returns a dictionary with each model category as keys and their respective pricing information as values.
Return type:: dict
Raises:: ValueError – If a specified model_name is not found in the stored language models pricing data.
Example:

>>> pricing_data = OpenAIPricing(json_data)
>>> pricing_data.get_language_model_pricing('GPT-4')
{'input': 0.03, 'output': 0.06}

get_release_date()[source]

Retrieves the release date of the pricing data.

Returns:: str: The release date.

load_dataset(file_path)[source]

Loads a dataset from a specified file, supporting CSV, JSON, or JSONL formats.

Method load_dataset:: Read and load data from a file into an appropriate data structure.
Parameters:: file_path (str) – The path to the file containing the dataset.
Returns:: A list of dictionaries where each dictionary represents a record in the dataset.
Return type:: list of dict
Raises:: ValueError – If the file format is not supported. Only CSV, JSON, or JSONL formats are acceptable.
Example:

>>> load_dataset("data.csv")
# This will return the contents of 'data.csv' as a list of dictionaries.