opencf.converters package

Submodules

Conversion Handlers - Document

This module provides classes for converting between document different file formats. It includes concrete implementations of conversion classes for various file types (pdf, docx, epub, …).

class opencf.converters.document.ImageToPDFConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts image files to PDF format.

file_reader: FileReader = <opencf.io_handlers.img_pillow.ImageToPillowReader object>
file_writer: FileWriter = None
folder_as_output: bool = False
class opencf.converters.document.ImageToPDFConverterWithPyPdf2(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts image files to PDF format using PyPDF2.

file_reader: FileReader = <opencf.io_handlers.img_pillow.ImageToPillowReader object>
file_writer: FileWriter = <opencf.io_handlers.pdf.PyPdfToPdfWriter object>
class opencf.converters.document.PDFToDocxConvertor(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts PDF files to docx format using [pdf2docx](https://github.com/ArtifexSoftware/pdf2docx) as recommanded [here](https://stackoverflow.com/a/65932031/16668046) There s also a cli interface as presented in [their online](https://pdf2docx.readthedocs.io/en/latest/quickstart.cli.html)

_convert(input_contents: List[Path], output_file: Path)
file_reader: FileReader = None
file_writer: FileWriter = None
folder_as_output: bool = False
class opencf.converters.document.PDFToHTML(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

i could use this [tool](https://linux.die.net/man/1/pdftohtml) to do it

class opencf.converters.document.PDFToImageConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts PDF files to image format. Not implemented yet

file_reader: FileReader = <opencf.io_handlers.pdf.PdfToPyPdfReader object>
file_writer: FileWriter = None
folder_as_output: bool = True
class opencf.converters.document.PDFToImageExtractor(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts PDF files to image format.

_convert(input_contents: List[PdfReader], output_folder: Path)
file_reader: FileReader = <opencf.io_handlers.pdf.PdfToPyPdfReader object>
file_writer: FileWriter = None
folder_as_output: bool = True

Conversion Handlers - Textual/Markup

This module provides classes for converting between different markup file formats. It includes concrete implementations of conversion classes for various file types (txt, md, html, …).

class opencf.converters.markup.TXTToMDConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: TextToTextConverter

Converts text files to Markdown format.

class opencf.converters.markup.TextToTextConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts text files to text format.

file_reader: FileReader = <opencf_core.io_handler.TxtToStrReader object>
file_writer: FileWriter = <opencf_core.io_handler.StrToTxtWriter object>

Conversion Handlers - Structured

This module provides classes for converting between stuctured different file formats. It includes concrete implementations of conversion classes for various file types (xml, json, xlsx, csv, …).

class opencf.converters.structured.CSVToXMLConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts CSV files to XML format.

file_reader: FileReader = <opencf_core.io_handler.CsvToListReader object>
file_writer: FileWriter = <opencf_core.io_handler.StrToXmlWriter object>
class opencf.converters.structured.JSONToCSVConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts JSON files to CSV format.

file_reader: FileReader = <opencf_core.io_handler.JsonToDictReader object>
file_writer: FileWriter = <opencf_core.io_handler.ListToCsvWriter object>
class opencf.converters.structured.XLSXToCSVConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts Excel files to CSV format.

file_reader: FileReader = <opencf.io_handlers.spreadsheet.SpreadsheetToPandasReader object>
file_writer: FileWriter = <opencf_core.io_handler.ListToCsvWriter object>
class opencf.converters.structured.XMLToJSONConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts XML files to JSON format.

file_reader: FileReader = <opencf_core.io_handler.XmlToStrReader object>
file_writer: FileWriter = <opencf_core.io_handler.DictToJsonWriter object>

Conversion Handlers - Video

This module provides classes for converting between different video file formats. It includes concrete implementations of conversion classes for various file types (images, mp4, avi, gif, …).

class opencf.converters.video.ImageToVideoConverterWithOpenCV(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts image files to video format.

_convert(input_contents: List[ndarray])

Converts a list of image files to a video file.

Parameters:
  • input_contents (List[np.ndarray]) – List of input images.

  • output_file (Path) – Output video file path.

file_reader: FileReader = <opencf.io_handlers.img_opencv.ImageToOpenCVReader object>
file_writer: FileWriter = <opencf.io_handlers.video.VideoArrayWriter object>
class opencf.converters.video.ImageToVideoConverterWithPillow(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts image files to video format.

_convert(input_contents: List[Image])

Converts a list of image files to a video file.

Parameters:
  • input_contents (List[PillowImage.Image]) – List of input images.

  • output_file (Path) – Output video file path.

file_reader: FileReader = <opencf.io_handlers.img_pillow.ImageToPillowReader object>
file_writer: FileWriter = <opencf.io_handlers.video.VideoArrayWriter object>
class opencf.converters.video.VideoToGIFConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter

Converts a video file to GIF format.

_convert(input_contents: List[List[Mat | ndarray]])

Converts a list of video frames to a GIF.

Parameters:

input_contents (List[MatLike]) – List of video frames.

Returns:

The converted GIF content.

Return type:

bytes

file_reader: FileReader = <opencf.io_handlers.video.VideoToFramesReaderWithOpenCV object>
file_writer: FileWriter = <opencf.io_handlers.video.FramesToGIFWriterWithImageIO object>

Module contents

Conversion Handlers

This module provides classes for converting between different file formats. It includes concrete implementations of conversion classes for various file types.