opencf.io_handlers package
Submodules
PDF File I/O Handlers
This module provides classes for reading and writing PDF files using the PyPDF library. It includes abstract base classes and concrete implementations for converting between PDF files and PyPDF PdfReader objects.
- class opencf.io_handlers.aspose.AsposeReader
Bases:
ReaderReads content from a PDF file and returns it as an Aspose.Words Document.
- _check_input_format(content: Document) bool
Checks if the provided content is a aw.Document object.
- Parameters:
content (aw.Document) – The content to be checked.
- Returns:
True if the content is a aw.Document object, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) Document
Reads and returns the content from the given PDF file path.
- Parameters:
input_path (Path) – The path to the PDF file.
- Returns:
The Aspose.Words Document object read from the PDF file.
- Return type:
aw.Document
- class opencf.io_handlers.aspose.AsposeWriter
Bases:
WriterWrites content from an Aspose.Words Document to a DOCX file.
- _check_output_format(content: Document) bool
Checks if the provided content is an Aspose.Words Document.
- Parameters:
content (aw.Document) – The content to be checked.
- Returns:
True if the content is an Aspose.Words Document, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Document])
Writes the provided Aspose.Words Document to the given DOCX file path.
- Parameters:
output_path (Path) – The path to the DOCX file.
output_content (aw.Document) – The Aspose.Words Document to be written to the file.
File: img_opencv.py Author: Hermann Agossou Description: This module provides classes for reading and writing images using OpenCV.
- class opencf.io_handlers.opencv.FramesToGIFWriterWithImageIO
Bases:
WriterWrites a list of frames to a GIF file using imageio.
- _check_output_format(content) bool
Validates if the provided content is a list of frames.
- Parameters:
content – The content to be validated.
- Returns:
True if the content is a list of frames, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Mat | ndarray])
Writes the provided list of frames to the given output GIF file.
- Parameters:
output_gif (Path) – The path to the output GIF file.
output_content (List[MatLike]) – The list of frames to be written to the GIF file.
- class opencf.io_handlers.opencv.ImageToOpenCVReader
Bases:
ReaderReads an image file and returns an OpenCV image object.
- _check_input_format(content: Mat | ndarray) bool
Validates if the provided content is an OpenCV image object.
- Parameters:
content (opencv_image) – The content to be validated.
- Returns:
True if the content is an OpenCV image object, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) Mat | ndarray
Reads and returns the content from the given input path as an OpenCV image object.
- Parameters:
input_path (Path) – The path to the input image file.
- Returns:
The OpenCV image object read from the input file.
- Return type:
opencv_image
- class opencf.io_handlers.opencv.OpenCVToImageWriter
Bases:
WriterWrites an OpenCV image object to an image file.
- _check_output_format(content: Mat | ndarray) bool
Validates if the provided content is an OpenCV image object.
- Parameters:
content – The content to be validated.
- Returns:
True if the content is an OpenCV image object, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content)
Writes the provided OpenCV image object to the given output path as an image file.
- Parameters:
output_path (Path) – The path to the output image file.
output_content – The OpenCV image object to be written to the output file.
- class opencf.io_handlers.opencv.VideoArrayWriter
Bases:
WriterWrites a video to a file using a list of image arrays.
- _check_output_format(content: Mat | ndarray | List[Mat | ndarray]) bool
Validates if the provided content is suitable for writing as a video.
- Parameters:
content – Content to be validated.
- Returns:
True if the content is suitable for writing as a video, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: Mat | ndarray | List[Mat | ndarray], fps: int = 15) bool
Writes a video to a file using a list of image arrays.
- Parameters:
output_path (Path) – Path to save the video.
output_content (Union[cv2Reader, list]) – Video frames as a numpy array or a list of numpy arrays.
fps (int, optional) – Frames per second. Defaults to 15.
- Returns:
True if the video is successfully written, False otherwise.
- Return type:
bool
- class opencf.io_handlers.opencv.VideoToFramesReaderWithOpenCV
Bases:
ReaderReads a video file and returns a list of frames in MatLike format.
- _check_input_format(content: List[Mat | ndarray]) bool
Validates if the provided content is a list of MatLike objects.
- Parameters:
content (List[MatLike]) – The content to be validated.
- Returns:
True if the content is a list of MatLike objects, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) List[Mat | ndarray]
Reads and returns the frames from the given video file as a list of MatLike objects.
- Parameters:
input_path (Path) – The path to the input video file.
- Returns:
A list containing frames read from the video file.
- Return type:
List[MatLike]
- input_format
alias of
List[Mat|ndarray]
- class opencf.io_handlers.pdf2docx.Pdf2DocxReader
Bases:
ReaderReads content from a PDF file and returns it as an pdf2docx Document.
- _check_input_format(content: Converter) bool
Checks if the provided content is a PDF file.
- Parameters:
content (pdf2docx.Converter) – The content to be checked.
- Returns:
True if the content is a PDF file, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) Converter
Reads and returns the content from the given PDF file path.
- Parameters:
input_path (Path) – The path to the PDF file.
- Returns:
The pdf2Docx Document object read from the PDF file.
- Return type:
pdf2docx.Converter
- class opencf.io_handlers.pdf2docx.Pdf2DocxWriter
Bases:
WriterWrites content from a PDF to DOCX using pdf2docx.
- _check_output_format(content: List[Converter]) bool
Checks if the provided content isList[pdf2docx.Converter] objects.
- Parameters:
content (List[pdf2docx.Converter]) – The content to be checked.
- Returns:
True if the content is a pdf2docx.Converter object, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Converter])
Writes the first provided pdf2docx.Converter object to the given DOCX file path.
- Parameters:
output_path (Path) – The path to the DOCX file.
output_content (List[pdf2docxConverter]) – The pdf2docx.Converter objects to be written to the file.
Image File I/O Handlers
This module provides classes for reading and writing image files using the Pillow library. It includes abstract base classes and concrete implementations for converting between image files and Pillow Image objects.
- class opencf.io_handlers.pillow.ImageToPillowReader
Bases:
ReaderReads an image file and returns a Pillow Image object.
- _check_input_format(content: Image) bool
Validates if the provided content is a Pillow Image object.
- Parameters:
content (PillowImage.Image) – The content to be validated.
- Returns:
True if the content is a Pillow Image object, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) Image
Reads and returns the content from the given input path as a Pillow Image object.
- Parameters:
input_path (Path) – The path to the input image file.
- Returns:
The Pillow Image object read from the input file.
- Return type:
PillowImage.Image
- class opencf.io_handlers.pillow.PillowToImageWriter
Bases:
WriterWrites a Pillow Image object to an image file.
- _check_output_format(content: Image) bool
Validates if the provided content is a Pillow Image object.
- Parameters:
content (PillowImage.Image) – The content to be validated.
- Returns:
True if the content is a Pillow Image object, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: Image)
Writes the provided Pillow Image object to the given output path as an image file.
- Parameters:
output_path (Path) – The path to the output image file.
output_content (PillowImage.Image) – The Pillow Image object to be written to the output file.
- class opencf.io_handlers.pillow.PillowToPDFWriter
Bases:
Writer- _check_output_format(content: List[Image]) bool
Validates if the provided content is a PyPDF PdfWriter object.
- Parameters:
content (PdfWriter) – The content to be validated.
- Returns:
True if the content is a PyPDF PdfWriter object, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Image])
Writes the provided PillowImage.Image objects to the given output path as a PDF file.
- Parameters:
output_path (Path) – The path to the output PDF file.
output_content (List[PillowImage.Image]) – The PillowImage.Image objects to be written to the output file.
- class opencf.io_handlers.pymupdf.PdfToPymupdfReader
Bases:
ReaderReads content from a PDF file and returns it as a fitz.Document object.
- _check_input_format(content: Path) bool
Checks if the provided content is a valid PDF file.
- Parameters:
content (Path) – The path to the PDF file to be checked.
- Returns:
True if the content is a valid PDF file, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) Document
Reads and returns the content from the given input path.
- Parameters:
input_path (Path) – The path to the input PDF file.
- Returns:
The content read from the input PDF file.
- Return type:
fitz.Document
- class opencf.io_handlers.pymupdf.PymupdfToImageExtractorWriter
Bases:
WriterExtracts images from a fitz.Document and saves them as image files.
- _check_output_format(content: List[Document]) bool
Checks if the provided content matches the expected output format.
- Parameters:
content (List[fitz.Page]) – The content to be checked.
- Returns:
True if the content matches the expected output format, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Document])
Writes the provided content to the given output path.
- Parameters:
output_path (Path) – The path to the output file.
output_content (List[fitz.Page]) – The content to be written to the output file.
- class opencf.io_handlers.pymupdf.PymupdfToImageWriter
Bases:
WriterWrites PDF pages as images to a specified folder.
- _check_output_format(content: List[Page]) bool
Checks if the provided content is a list of fitz Page objects.
- Parameters:
content (List[fitz.Page]) – The content to be checked.
- Returns:
True if the content is a list of fitz Page objects, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Document])
Writes the provided PDF pages as images to the specified folder.
- Parameters:
output_path (Path) – The path to the output folder.
output_content (List[fitz.Page]) – The list of PDF pages to be written as images.
PDF File I/O Handlers
This module provides classes for reading and writing PDF files using the PyPDF library. It includes abstract base classes and concrete implementations for converting between PDF files and PyPDF PdfReader objects.
- class opencf.io_handlers.pypdf.PdfToPyPdfReader
Bases:
ReaderReads a PDF file and returns a [PyPDF PdfReader object](https://pypdf.readthedocs.io/en/4.2.0/modules/PdfReader.html).
- _check_input_format(content: PdfReader) bool
Validates if the provided content is a PyPDF PdfReader object.
- Parameters:
content (PdfReader) – The content to be validated.
- Returns:
True if the content is a PyPDF PdfReader object, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) PdfReader
Reads and returns the content from the given input path as a PyPDF PdfReader object.
- Parameters:
input_path (Path) – The path to the input PDF file.
- Returns:
The PyPDF PdfReader object read from the input file.
- Return type:
PdfReader
- class opencf.io_handlers.pypdf.PillowToPDFWriterwithPypdf
Bases:
WriterWrites a collection of Pillow images to a PDF file using PyPDF.
- _check_output_format(content: List[Image]) bool
Checks if the provided content is a list of Pillow images.
- Parameters:
content (List[PillowImage.Image]) – The content to be checked.
- Returns:
True if the content is a list of Pillow images, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Image])
Writes the provided list of Pillow images to the given PDF file path.
- Parameters:
output_path (Path) – The path to the PDF file.
output_content (List[PillowImage.Image]) – The list of Pillow images to be written to the file.
- class opencf.io_handlers.pypdf.PyPdfToPdfWriter
Bases:
WriterWrites the provided [PyPDF PdfWriter object](https://pypdf.readthedocs.io/en/4.2.0/modules/PdfWriter.html)
- _check_output_format(content: PdfWriter) bool
Validates if the provided content is a PyPDF PdfWriter object.
- Parameters:
content (PdfWriter) – The content to be validated.
- Returns:
True if the content is a PyPDF PdfWriter object, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: PdfWriter)
Writes the provided PyPDF PdfWriter object to the given output path as a PDF file.
- Parameters:
output_path (Path) – The path to the output PDF file.
output_content (PdfWriter) – The PyPDF PdfWriter object to be written to the output file.
- class opencf.io_handlers.pypdf.PypdfToImageExtractorWriter
Bases:
Writer- _check_output_format(content: List[PdfReader]) bool
Validates if the provided content is a PyPDF PdfReader object.
- Parameters:
content (List[PdfReader]) – The content to be validated.
- Returns:
True if the content is a list of PyPDF PdfReader objects, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[PdfReader])
Writes the provided PdfReader objects to the given output folder.
read more [here](https://pypdf.readthedocs.io/en/4.2.0/user/extract-images.html)
- Parameters:
output_path (Path) – The path to the output folder.
output_content (List[PdfReader]) – The PdfReader objects to be written to the output folder.
Spreadsheet I/O Handlers
This module provides classes for reading and writing spreadsheet files using the pandas library. It includes abstract base classes and concrete implementations for converting between spreadsheet files and pandas DataFrame objects.
- class opencf.io_handlers.spreadsheet.DictToXlsxWriter
Bases:
WriterWrites content from a list of dictionaries to an XLSX file.
Example
>>> writer = DictToXlsxWriter() >>> content = [{'name': 'John', 'age': 30}, {'name': 'Jane', 'age': 25}] >>> writer.write(Path('output.xlsx'), content)
- _check_output_format(content: List[Dict[str, Any]]) bool
Validates the output content to ensure it is a list of dictionaries.
- Parameters:
content (List[Dict[str, Any]]) – The content to validate.
- Returns:
True if the content is a list of dictionaries, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Dict[str, Any]]) None
Writes the list of dictionaries content to an XLSX file at the given path.
- Parameters:
output_path (Path) – The path to the XLSX file.
content (List[Dict[str, Any]]) – The list of dictionaries content to write.
- output_format
alias of
List[Dict[str,Any]]
- class opencf.io_handlers.spreadsheet.XlsxToDictReader
Bases:
ReaderReads content from an XLSX file and returns it as a list of dictionaries.
Example
>>> reader = XlsxToDictReader() >>> content = reader.read(Path('input.xlsx')) >>> print(content) [{'name': 'John', 'age': 30}, {'name': 'Jane', 'age': 25}]
- _check_input_format(content: List[Dict[str, Any]]) bool
Validates the input content to ensure it is a list of dictionaries.
- Parameters:
content (List[Dict[str, Any]]) – The content to validate.
- Returns:
True if the content is a list of dictionaries, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) List[Dict[str, Any]]
Reads and parses the content from the XLSX file at the given path.
- Parameters:
input_path (Path) – The path to the XLSX file.
- Returns:
The parsed content as a list of dictionaries.
- Return type:
List[Dict[str, Any]]
- input_format
alias of
List[Dict[str,Any]]