opencf.io_handlers package

Submodules

PDF File I/O Handlers

This module provides classes for reading and writing PDF files using the PyPDF library. It includes abstract base classes and concrete implementations for converting between PDF files and PyPDF PdfReader objects.

class opencf.io_handlers.aspose.AsposeReader

Bases: Reader

Reads content from a PDF file and returns it as an Aspose.Words Document.

_check_input_format(content: Document) bool

Checks if the provided content is a aw.Document object.

Parameters:

content (aw.Document) – The content to be checked.

Returns:

True if the content is a aw.Document object, False otherwise.

Return type:

bool

_read_content(input_path: Path) Document

Reads and returns the content from the given PDF file path.

Parameters:

input_path (Path) – The path to the PDF file.

Returns:

The Aspose.Words Document object read from the PDF file.

Return type:

aw.Document

class opencf.io_handlers.aspose.AsposeWriter

Bases: Writer

Writes content from an Aspose.Words Document to a DOCX file.

_check_output_format(content: Document) bool

Checks if the provided content is an Aspose.Words Document.

Parameters:

content (aw.Document) – The content to be checked.

Returns:

True if the content is an Aspose.Words Document, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Document])

Writes the provided Aspose.Words Document to the given DOCX file path.

Parameters:
  • output_path (Path) – The path to the DOCX file.

  • output_content (aw.Document) – The Aspose.Words Document to be written to the file.

File: img_opencv.py Author: Hermann Agossou Description: This module provides classes for reading and writing images using OpenCV.

class opencf.io_handlers.opencv.FramesToGIFWriterWithImageIO

Bases: Writer

Writes a list of frames to a GIF file using imageio.

_check_output_format(content) bool

Validates if the provided content is a list of frames.

Parameters:

content – The content to be validated.

Returns:

True if the content is a list of frames, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Mat | ndarray])

Writes the provided list of frames to the given output GIF file.

Parameters:
  • output_gif (Path) – The path to the output GIF file.

  • output_content (List[MatLike]) – The list of frames to be written to the GIF file.

class opencf.io_handlers.opencv.ImageToOpenCVReader

Bases: Reader

Reads an image file and returns an OpenCV image object.

_check_input_format(content: Mat | ndarray) bool

Validates if the provided content is an OpenCV image object.

Parameters:

content (opencv_image) – The content to be validated.

Returns:

True if the content is an OpenCV image object, False otherwise.

Return type:

bool

_read_content(input_path: Path) Mat | ndarray

Reads and returns the content from the given input path as an OpenCV image object.

Parameters:

input_path (Path) – The path to the input image file.

Returns:

The OpenCV image object read from the input file.

Return type:

opencv_image

class opencf.io_handlers.opencv.OpenCVToImageWriter

Bases: Writer

Writes an OpenCV image object to an image file.

_check_output_format(content: Mat | ndarray) bool

Validates if the provided content is an OpenCV image object.

Parameters:

content – The content to be validated.

Returns:

True if the content is an OpenCV image object, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content)

Writes the provided OpenCV image object to the given output path as an image file.

Parameters:
  • output_path (Path) – The path to the output image file.

  • output_content – The OpenCV image object to be written to the output file.

class opencf.io_handlers.opencv.VideoArrayWriter

Bases: Writer

Writes a video to a file using a list of image arrays.

_check_output_format(content: Mat | ndarray | List[Mat | ndarray]) bool

Validates if the provided content is suitable for writing as a video.

Parameters:

content – Content to be validated.

Returns:

True if the content is suitable for writing as a video, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: Mat | ndarray | List[Mat | ndarray], fps: int = 15) bool

Writes a video to a file using a list of image arrays.

Parameters:
  • output_path (Path) – Path to save the video.

  • output_content (Union[cv2Reader, list]) – Video frames as a numpy array or a list of numpy arrays.

  • fps (int, optional) – Frames per second. Defaults to 15.

Returns:

True if the video is successfully written, False otherwise.

Return type:

bool

class opencf.io_handlers.opencv.VideoToFramesReaderWithOpenCV

Bases: Reader

Reads a video file and returns a list of frames in MatLike format.

_check_input_format(content: List[Mat | ndarray]) bool

Validates if the provided content is a list of MatLike objects.

Parameters:

content (List[MatLike]) – The content to be validated.

Returns:

True if the content is a list of MatLike objects, False otherwise.

Return type:

bool

_read_content(input_path: Path) List[Mat | ndarray]

Reads and returns the frames from the given video file as a list of MatLike objects.

Parameters:

input_path (Path) – The path to the input video file.

Returns:

A list containing frames read from the video file.

Return type:

List[MatLike]

input_format

alias of List[Mat | ndarray]

class opencf.io_handlers.pdf2docx.Pdf2DocxReader

Bases: Reader

Reads content from a PDF file and returns it as an pdf2docx Document.

_check_input_format(content: Converter) bool

Checks if the provided content is a PDF file.

Parameters:

content (pdf2docx.Converter) – The content to be checked.

Returns:

True if the content is a PDF file, False otherwise.

Return type:

bool

_read_content(input_path: Path) Converter

Reads and returns the content from the given PDF file path.

Parameters:

input_path (Path) – The path to the PDF file.

Returns:

The pdf2Docx Document object read from the PDF file.

Return type:

pdf2docx.Converter

class opencf.io_handlers.pdf2docx.Pdf2DocxWriter

Bases: Writer

Writes content from a PDF to DOCX using pdf2docx.

_check_output_format(content: List[Converter]) bool

Checks if the provided content isList[pdf2docx.Converter] objects.

Parameters:

content (List[pdf2docx.Converter]) – The content to be checked.

Returns:

True if the content is a pdf2docx.Converter object, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Converter])

Writes the first provided pdf2docx.Converter object to the given DOCX file path.

Parameters:
  • output_path (Path) – The path to the DOCX file.

  • output_content (List[pdf2docxConverter]) – The pdf2docx.Converter objects to be written to the file.

Image File I/O Handlers

This module provides classes for reading and writing image files using the Pillow library. It includes abstract base classes and concrete implementations for converting between image files and Pillow Image objects.

class opencf.io_handlers.pillow.ImageToPillowReader

Bases: Reader

Reads an image file and returns a Pillow Image object.

_check_input_format(content: Image) bool

Validates if the provided content is a Pillow Image object.

Parameters:

content (PillowImage.Image) – The content to be validated.

Returns:

True if the content is a Pillow Image object, False otherwise.

Return type:

bool

_read_content(input_path: Path) Image

Reads and returns the content from the given input path as a Pillow Image object.

Parameters:

input_path (Path) – The path to the input image file.

Returns:

The Pillow Image object read from the input file.

Return type:

PillowImage.Image

class opencf.io_handlers.pillow.PillowToImageWriter

Bases: Writer

Writes a Pillow Image object to an image file.

_check_output_format(content: Image) bool

Validates if the provided content is a Pillow Image object.

Parameters:

content (PillowImage.Image) – The content to be validated.

Returns:

True if the content is a Pillow Image object, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: Image)

Writes the provided Pillow Image object to the given output path as an image file.

Parameters:
  • output_path (Path) – The path to the output image file.

  • output_content (PillowImage.Image) – The Pillow Image object to be written to the output file.

class opencf.io_handlers.pillow.PillowToPDFWriter

Bases: Writer

_check_output_format(content: List[Image]) bool

Validates if the provided content is a PyPDF PdfWriter object.

Parameters:

content (PdfWriter) – The content to be validated.

Returns:

True if the content is a PyPDF PdfWriter object, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Image])

Writes the provided PillowImage.Image objects to the given output path as a PDF file.

Parameters:
  • output_path (Path) – The path to the output PDF file.

  • output_content (List[PillowImage.Image]) – The PillowImage.Image objects to be written to the output file.

class opencf.io_handlers.pymupdf.PdfToPymupdfReader

Bases: Reader

Reads content from a PDF file and returns it as a fitz.Document object.

_check_input_format(content: Path) bool

Checks if the provided content is a valid PDF file.

Parameters:

content (Path) – The path to the PDF file to be checked.

Returns:

True if the content is a valid PDF file, False otherwise.

Return type:

bool

_read_content(input_path: Path) Document

Reads and returns the content from the given input path.

Parameters:

input_path (Path) – The path to the input PDF file.

Returns:

The content read from the input PDF file.

Return type:

fitz.Document

class opencf.io_handlers.pymupdf.PymupdfToImageExtractorWriter

Bases: Writer

Extracts images from a fitz.Document and saves them as image files.

_check_output_format(content: List[Document]) bool

Checks if the provided content matches the expected output format.

Parameters:

content (List[fitz.Page]) – The content to be checked.

Returns:

True if the content matches the expected output format, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Document])

Writes the provided content to the given output path.

Parameters:
  • output_path (Path) – The path to the output file.

  • output_content (List[fitz.Page]) – The content to be written to the output file.

class opencf.io_handlers.pymupdf.PymupdfToImageWriter

Bases: Writer

Writes PDF pages as images to a specified folder.

_check_output_format(content: List[Page]) bool

Checks if the provided content is a list of fitz Page objects.

Parameters:

content (List[fitz.Page]) – The content to be checked.

Returns:

True if the content is a list of fitz Page objects, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Document])

Writes the provided PDF pages as images to the specified folder.

Parameters:
  • output_path (Path) – The path to the output folder.

  • output_content (List[fitz.Page]) – The list of PDF pages to be written as images.

PDF File I/O Handlers

This module provides classes for reading and writing PDF files using the PyPDF library. It includes abstract base classes and concrete implementations for converting between PDF files and PyPDF PdfReader objects.

class opencf.io_handlers.pypdf.PdfToPyPdfReader

Bases: Reader

Reads a PDF file and returns a [PyPDF PdfReader object](https://pypdf.readthedocs.io/en/4.2.0/modules/PdfReader.html).

_check_input_format(content: PdfReader) bool

Validates if the provided content is a PyPDF PdfReader object.

Parameters:

content (PdfReader) – The content to be validated.

Returns:

True if the content is a PyPDF PdfReader object, False otherwise.

Return type:

bool

_read_content(input_path: Path) PdfReader

Reads and returns the content from the given input path as a PyPDF PdfReader object.

Parameters:

input_path (Path) – The path to the input PDF file.

Returns:

The PyPDF PdfReader object read from the input file.

Return type:

PdfReader

class opencf.io_handlers.pypdf.PillowToPDFWriterwithPypdf

Bases: Writer

Writes a collection of Pillow images to a PDF file using PyPDF.

_check_output_format(content: List[Image]) bool

Checks if the provided content is a list of Pillow images.

Parameters:

content (List[PillowImage.Image]) – The content to be checked.

Returns:

True if the content is a list of Pillow images, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Image])

Writes the provided list of Pillow images to the given PDF file path.

Parameters:
  • output_path (Path) – The path to the PDF file.

  • output_content (List[PillowImage.Image]) – The list of Pillow images to be written to the file.

class opencf.io_handlers.pypdf.PyPdfToPdfWriter

Bases: Writer

Writes the provided [PyPDF PdfWriter object](https://pypdf.readthedocs.io/en/4.2.0/modules/PdfWriter.html)

_check_output_format(content: PdfWriter) bool

Validates if the provided content is a PyPDF PdfWriter object.

Parameters:

content (PdfWriter) – The content to be validated.

Returns:

True if the content is a PyPDF PdfWriter object, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: PdfWriter)

Writes the provided PyPDF PdfWriter object to the given output path as a PDF file.

Parameters:
  • output_path (Path) – The path to the output PDF file.

  • output_content (PdfWriter) – The PyPDF PdfWriter object to be written to the output file.

class opencf.io_handlers.pypdf.PypdfToImageExtractorWriter

Bases: Writer

_check_output_format(content: List[PdfReader]) bool

Validates if the provided content is a PyPDF PdfReader object.

Parameters:

content (List[PdfReader]) – The content to be validated.

Returns:

True if the content is a list of PyPDF PdfReader objects, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[PdfReader])

Writes the provided PdfReader objects to the given output folder.

read more [here](https://pypdf.readthedocs.io/en/4.2.0/user/extract-images.html)

Parameters:
  • output_path (Path) – The path to the output folder.

  • output_content (List[PdfReader]) – The PdfReader objects to be written to the output folder.

Spreadsheet I/O Handlers

This module provides classes for reading and writing spreadsheet files using the pandas library. It includes abstract base classes and concrete implementations for converting between spreadsheet files and pandas DataFrame objects.

class opencf.io_handlers.spreadsheet.DictToXlsxWriter

Bases: Writer

Writes content from a list of dictionaries to an XLSX file.

Example

>>> writer = DictToXlsxWriter()
>>> content = [{'name': 'John', 'age': 30}, {'name': 'Jane', 'age': 25}]
>>> writer.write(Path('output.xlsx'), content)
_check_output_format(content: List[Dict[str, Any]]) bool

Validates the output content to ensure it is a list of dictionaries.

Parameters:

content (List[Dict[str, Any]]) – The content to validate.

Returns:

True if the content is a list of dictionaries, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Dict[str, Any]]) None

Writes the list of dictionaries content to an XLSX file at the given path.

Parameters:
  • output_path (Path) – The path to the XLSX file.

  • content (List[Dict[str, Any]]) – The list of dictionaries content to write.

output_format

alias of List[Dict[str, Any]]

class opencf.io_handlers.spreadsheet.XlsxToDictReader

Bases: Reader

Reads content from an XLSX file and returns it as a list of dictionaries.

Example

>>> reader = XlsxToDictReader()
>>> content = reader.read(Path('input.xlsx'))
>>> print(content)
[{'name': 'John', 'age': 30}, {'name': 'Jane', 'age': 25}]
_check_input_format(content: List[Dict[str, Any]]) bool

Validates the input content to ensure it is a list of dictionaries.

Parameters:

content (List[Dict[str, Any]]) – The content to validate.

Returns:

True if the content is a list of dictionaries, False otherwise.

Return type:

bool

_read_content(input_path: Path) List[Dict[str, Any]]

Reads and parses the content from the XLSX file at the given path.

Parameters:

input_path (Path) – The path to the XLSX file.

Returns:

The parsed content as a list of dictionaries.

Return type:

List[Dict[str, Any]]

input_format

alias of List[Dict[str, Any]]

Module contents