Document Loaders

Context Loaders

class ContextExtractor[source]

Bases: object

A class to extract specific columns from a CSV file as pandas Series or DataFrame.

This class provides methods to read a specified column from a CSV file and return it as either a pandas Series or DataFrame. It’s useful for data processing tasks where only specific column data is required from a larger dataset.

Method from_csv_as_series:

Reads a CSV file and returns the specified column as a pandas Series.

Method from_csv_as_df:

Reads a CSV file and returns the specified column as a pandas DataFrame.

from_csv_as_df(csv_file_path, column_name='context', encoding='utf-8')[source]

Reads a CSV file and returns the specified column as a pandas DataFrame.

This method extracts a single column from a CSV file and presents it as a pandas DataFrame. It’s particularly useful when only one column of data is needed for analysis or processing.

Parameters:
  • csv_file_path (str) – The path to the CSV file.

  • column_name (str, optional) – The name of the column to extract. Default is “context”.

  • encoding (str, optional) – The encoding of the CSV file. Default is “utf-8”.

Returns:

The specified column as a pandas DataFrame.

Return type:

pandas.DataFrame

Raises:
  • FileNotFoundError – If the CSV file does not exist.

  • ValueError – If the specified column does not exist in the CSV.

  • Exception – For any other exceptions that may occur.

Example:

>>> extractor = ContextExtractor()
>>> dataframe = extractor.from_csv_as_df("data.csv", "context")
from_csv_as_series(csv_file_path, column_name='context', encoding='utf-8')[source]

Reads a CSV file and returns the specified column as a pandas Series.

This method is designed to extract a single column from a CSV file and present it as a pandas Series, which can be useful for further data analysis or processing.

Parameters:
  • csv_file_path (str) – The path to the CSV file.

  • column_name (str, optional) – The name of the column to extract. Default is “context”.

  • encoding (str, optional) – The encoding of the CSV file. Default is “utf-8”.

Returns:

The specified column as a pandas Series.

Return type:

pandas.Series

Raises:
  • FileNotFoundError – If the CSV file does not exist.

  • ValueError – If the specified column does not exist in the CSV.

  • Exception – For any other exceptions that may occur.

Example:

>>> extractor = ContextExtractor()
>>> series = extractor.from_csv_as_series("data.csv", "context")