opencf.converters package

Submodules

Conversion Handlers - Document

This module provides classes for converting between document different file formats. It includes concrete implementations of conversion classes for various file types (pdf, docx, epub, …).

class opencf.converters.document.PDFToDocxConvertorwithPdf2docx(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts PDF files to docx format using [pdf2docx](https://github.com/ArtifexSoftware/pdf2docx) as recommanded [here](https://stackoverflow.com/a/65932031/16668046) There s also a cli interface as presented in [their online](https://pdf2docx.readthedocs.io/en/latest/quickstart.cli.html)

file_reader: Reader | None = <opencf.io_handlers.pdf2docx.Pdf2DocxReader object>
file_writer: Writer | None = <opencf.io_handlers.pdf2docx.Pdf2DocxWriter object>
folder_as_output: bool | None = False
class opencf.converters.document.PDFToDocxWithAspose(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts PDF files to docx format using Aspose.Words for Python.

file_reader: Reader | None = <opencf.io_handlers.aspose.AsposeReader object>
file_writer: Writer | None = <opencf.io_handlers.aspose.AsposeWriter object>
folder_as_output: bool | None = False
class opencf.converters.document.PDFToHTML(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

i could use this [tool](https://linux.die.net/man/1/pdftohtml) to do it

Conversion Handlers - Textual/Markup

This module provides classes for converting between different markup file formats. It includes concrete implementations of conversion classes for various file types (txt, md, html, …).

class opencf.converters.markup.TextToTextConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

A converter class for converting text-based files to text format.

file_reader: Reader | None = <opencf_core.io_handler.TxtToStrReader object>
file_writer: Writer | None = <opencf_core.io_handler.StrToTxtWriter object>

Conversion Handlers - PDF

This module provides classes for manipulating PDF file format. It includes concrete implementations of conversion classes between pdf and raster images, ….

class opencf.converters.pdf.ImageToPDFConverterWithPillow(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts img files to pdf format using pillow

file_reader: Reader | None = <opencf.io_handlers.pillow.ImageToPillowReader object>
file_writer: Writer | None = <opencf.io_handlers.pillow.PillowToPDFWriter object>
folder_as_output: bool | None = False
class opencf.converters.pdf.ImageToPDFConverterWithPyPdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts image files to PDF format using PyPDF.

file_reader: Reader | None = <opencf.io_handlers.pillow.ImageToPillowReader object>
file_writer: Writer | None = <opencf.io_handlers.pypdf.PillowToPDFWriterwithPypdf object>
folder_as_output: bool | None = False
class opencf.converters.pdf.MergePDFswithPypdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Merges multiple PDF files into a single PDF.

file_reader: Reader | None = <opencf.io_handlers.pypdf.PdfToPyPdfReader object>
file_writer: Writer | None = <opencf.io_handlers.pypdf.PyPdfToPdfWriter object>
folder_as_output: bool | None = False
class opencf.converters.pdf.PDFToImageConverterwithPymupdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts PDF files to image format using pymupdf

file_reader: Reader | None = <opencf.io_handlers.pymupdf.PdfToPymupdfReader object>
file_writer: Writer | None = <opencf.io_handlers.pymupdf.PymupdfToImageWriter object>
folder_as_output: bool | None = True
class opencf.converters.pdf.PDFToImageExtractorwithPymupdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts PDF files to image format.

file_reader: Reader | None = <opencf.io_handlers.pymupdf.PdfToPymupdfReader object>
file_writer: Writer | None = <opencf.io_handlers.pymupdf.PymupdfToImageExtractorWriter object>
folder_as_output: bool | None = True
class opencf.converters.pdf.PDFToImageExtractorwithPypdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts PDF files to image format using pypdf

file_reader: Reader | None = <opencf.io_handlers.pypdf.PdfToPyPdfReader object>
file_writer: Writer | None = <opencf.io_handlers.pypdf.PypdfToImageExtractorWriter object>
folder_as_output: bool | None = True

Conversion Handlers - Image

This module provides classes for converting between different image file formats. It includes concrete implementations of conversion classes for various file types (jpg, png, svg, …).

Conversion Handlers - Structured

This module provides classes for converting between stuctured different file formats. It includes concrete implementations of conversion classes for various file types (xml, json, xlsx, csv, …).

class opencf.converters.spreadsheet.CSVToXLSXConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts CSV files to XLSX format.

file_reader: CsvToDictReader = <opencf_core.io_handler.CsvToDictReader object>
file_writer: DictToXlsxWriter = <opencf.io_handlers.spreadsheet.DictToXlsxWriter object>
class opencf.converters.spreadsheet.CSVToXMLConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts CSV files to XML format.

_convert(input_contents: List[Dict[str, Any]], args=None) Element

Converts the list of dictionaries to an XML ElementTree element.

Parameters:
  • input_contents (List[Dict[str, Any]]) – The list of dictionaries to convert.

  • args (optional) – Additional arguments for the conversion process.

Returns:

The resulting XML ElementTree element.

Return type:

ET.Element

file_reader: CsvToDictReader = <opencf_core.io_handler.CsvToDictReader object>
file_writer: TreeToXmlWriter = <opencf_core.io_handler.TreeToXmlWriter object>
class opencf.converters.spreadsheet.XLSXToCSVConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts XLSX files to CSV format.

_convert(input_contents: List[Dict[str, Any]], args=None) List[Dict[str, Any]]

Converts the list of dictionaries from XLSX to the format suitable for CSV.

Parameters:
  • input_contents (List[Dict[str, Any]]) – The list of dictionaries to convert.

  • args (optional) – Additional arguments for the conversion process.

Returns:

The resulting list of dictionaries for CSV.

Return type:

List[Dict[str, Any]]

file_reader: XlsxToDictReader = <opencf.io_handlers.spreadsheet.XlsxToDictReader object>
file_writer: DictToCsvWriter = <opencf_core.io_handler.DictToCsvWriter object>
class opencf.converters.spreadsheet.XMLToJSONConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts XML files to JSON format.

file_reader: XmlToTreeReader = <opencf_core.io_handler.XmlToTreeReader object>
file_writer: DictToJsonWriter = <opencf_core.io_handler.DictToJsonWriter object>

Conversion Handlers - Video

This module provides classes for converting between different video file formats. It includes concrete implementations of conversion classes for various file types (images, mp4, avi, gif, …).

class opencf.converters.video.ImageToVideoConverterWithOpenCV(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts image files to video format.

_convert(input_contents: List[Mat | ndarray], args: None) List[Mat | ndarray]

Converts a list of image files to a video file.

Parameters:
  • input_contents (List[MatLike]) – List of input images.

  • output_file (Path) – Output video file path.

file_reader: Reader | None = <opencf.io_handlers.opencv.ImageToOpenCVReader object>
file_writer: Writer | None = <opencf.io_handlers.opencv.VideoArrayWriter object>
class opencf.converters.video.ImageToVideoConverterWithPillow(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts image files to video format.

_convert(input_contents: List[Image], args: None) List[ndarray]

Converts a list of image files to a video file.

Parameters:
  • input_contents (List[PillowReader]) – List of input images.

  • output_file (Path) – Output video file path.

file_reader: Reader | None = <opencf.io_handlers.pillow.ImageToPillowReader object>
file_writer: Writer | None = <opencf.io_handlers.opencv.VideoArrayWriter object>
class opencf.converters.video.VideoToGIFConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: WriterBasedConverter

Converts a video file to GIF format.

_convert(input_contents: List[List[Mat | ndarray]], args: None)

Converts a list of video frames to a GIF.

Parameters:

input_contents (List[MatLike]) – List of video frames.

Returns:

The converted GIF content.

Return type:

bytes

file_reader: Reader | None = <opencf.io_handlers.opencv.VideoToFramesReaderWithOpenCV object>
file_writer: Writer | None = <opencf.io_handlers.opencv.FramesToGIFWriterWithImageIO object>

Module contents

Conversion Handlers

This module provides classes for converting between different file formats. It includes concrete implementations of conversion classes for various file types.