opencf.converters package
Submodules
Conversion Handlers - Document
This module provides classes for converting between document different file formats. It includes concrete implementations of conversion classes for various file types (pdf, docx, epub, …).
- class opencf.converters.document.PDFToDocxConvertorwithPdf2docx(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts PDF files to docx format using [pdf2docx](https://github.com/ArtifexSoftware/pdf2docx) as recommanded [here](https://stackoverflow.com/a/65932031/16668046) There s also a cli interface as presented in [their online](https://pdf2docx.readthedocs.io/en/latest/quickstart.cli.html)
- file_reader: Reader | None = <opencf.io_handlers.pdf2docx.Pdf2DocxReader object>
- file_writer: Writer | None = <opencf.io_handlers.pdf2docx.Pdf2DocxWriter object>
- folder_as_output: bool | None = False
- class opencf.converters.document.PDFToDocxWithAspose(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts PDF files to docx format using Aspose.Words for Python.
- file_reader: Reader | None = <opencf.io_handlers.aspose.AsposeReader object>
- file_writer: Writer | None = <opencf.io_handlers.aspose.AsposeWriter object>
- folder_as_output: bool | None = False
- class opencf.converters.document.PDFToHTML(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverteri could use this [tool](https://linux.die.net/man/1/pdftohtml) to do it
Conversion Handlers - Textual/Markup
This module provides classes for converting between different markup file formats. It includes concrete implementations of conversion classes for various file types (txt, md, html, …).
- class opencf.converters.markup.TextToTextConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterA converter class for converting text-based files to text format.
- file_reader: Reader | None = <opencf_core.io_handler.TxtToStrReader object>
- file_writer: Writer | None = <opencf_core.io_handler.StrToTxtWriter object>
Conversion Handlers - PDF
This module provides classes for manipulating PDF file format. It includes concrete implementations of conversion classes between pdf and raster images, ….
- class opencf.converters.pdf.ImageToPDFConverterWithPillow(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts img files to pdf format using pillow
- file_reader: Reader | None = <opencf.io_handlers.pillow.ImageToPillowReader object>
- file_writer: Writer | None = <opencf.io_handlers.pillow.PillowToPDFWriter object>
- folder_as_output: bool | None = False
- class opencf.converters.pdf.ImageToPDFConverterWithPyPdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts image files to PDF format using PyPDF.
- file_reader: Reader | None = <opencf.io_handlers.pillow.ImageToPillowReader object>
- file_writer: Writer | None = <opencf.io_handlers.pypdf.PillowToPDFWriterwithPypdf object>
- folder_as_output: bool | None = False
- class opencf.converters.pdf.MergePDFswithPypdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterMerges multiple PDF files into a single PDF.
- file_reader: Reader | None = <opencf.io_handlers.pypdf.PdfToPyPdfReader object>
- file_writer: Writer | None = <opencf.io_handlers.pypdf.PyPdfToPdfWriter object>
- folder_as_output: bool | None = False
- class opencf.converters.pdf.PDFToImageConverterwithPymupdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts PDF files to image format using pymupdf
- file_reader: Reader | None = <opencf.io_handlers.pymupdf.PdfToPymupdfReader object>
- file_writer: Writer | None = <opencf.io_handlers.pymupdf.PymupdfToImageWriter object>
- folder_as_output: bool | None = True
- class opencf.converters.pdf.PDFToImageExtractorwithPymupdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts PDF files to image format.
- file_reader: Reader | None = <opencf.io_handlers.pymupdf.PdfToPymupdfReader object>
- file_writer: Writer | None = <opencf.io_handlers.pymupdf.PymupdfToImageExtractorWriter object>
- folder_as_output: bool | None = True
- class opencf.converters.pdf.PDFToImageExtractorwithPypdf(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts PDF files to image format using pypdf
- file_reader: Reader | None = <opencf.io_handlers.pypdf.PdfToPyPdfReader object>
- file_writer: Writer | None = <opencf.io_handlers.pypdf.PypdfToImageExtractorWriter object>
- folder_as_output: bool | None = True
Conversion Handlers - Image
This module provides classes for converting between different image file formats. It includes concrete implementations of conversion classes for various file types (jpg, png, svg, …).
Conversion Handlers - Structured
This module provides classes for converting between stuctured different file formats. It includes concrete implementations of conversion classes for various file types (xml, json, xlsx, csv, …).
- class opencf.converters.spreadsheet.CSVToXLSXConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts CSV files to XLSX format.
- file_reader: CsvToDictReader = <opencf_core.io_handler.CsvToDictReader object>
- file_writer: DictToXlsxWriter = <opencf.io_handlers.spreadsheet.DictToXlsxWriter object>
- class opencf.converters.spreadsheet.CSVToXMLConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts CSV files to XML format.
- _convert(input_contents: List[Dict[str, Any]], args=None) Element
Converts the list of dictionaries to an XML ElementTree element.
- Parameters:
input_contents (List[Dict[str, Any]]) – The list of dictionaries to convert.
args (optional) – Additional arguments for the conversion process.
- Returns:
The resulting XML ElementTree element.
- Return type:
ET.Element
- file_reader: CsvToDictReader = <opencf_core.io_handler.CsvToDictReader object>
- file_writer: TreeToXmlWriter = <opencf_core.io_handler.TreeToXmlWriter object>
- class opencf.converters.spreadsheet.XLSXToCSVConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts XLSX files to CSV format.
- _convert(input_contents: List[Dict[str, Any]], args=None) List[Dict[str, Any]]
Converts the list of dictionaries from XLSX to the format suitable for CSV.
- Parameters:
input_contents (List[Dict[str, Any]]) – The list of dictionaries to convert.
args (optional) – Additional arguments for the conversion process.
- Returns:
The resulting list of dictionaries for CSV.
- Return type:
List[Dict[str, Any]]
- file_reader: XlsxToDictReader = <opencf.io_handlers.spreadsheet.XlsxToDictReader object>
- file_writer: DictToCsvWriter = <opencf_core.io_handler.DictToCsvWriter object>
- class opencf.converters.spreadsheet.XMLToJSONConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts XML files to JSON format.
- file_reader: XmlToTreeReader = <opencf_core.io_handler.XmlToTreeReader object>
- file_writer: DictToJsonWriter = <opencf_core.io_handler.DictToJsonWriter object>
Conversion Handlers - Video
This module provides classes for converting between different video file formats. It includes concrete implementations of conversion classes for various file types (images, mp4, avi, gif, …).
- class opencf.converters.video.ImageToVideoConverterWithOpenCV(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts image files to video format.
- _convert(input_contents: List[Mat | ndarray], args: None) List[Mat | ndarray]
Converts a list of image files to a video file.
- Parameters:
input_contents (List[MatLike]) – List of input images.
output_file (Path) – Output video file path.
- file_reader: Reader | None = <opencf.io_handlers.opencv.ImageToOpenCVReader object>
- file_writer: Writer | None = <opencf.io_handlers.opencv.VideoArrayWriter object>
- class opencf.converters.video.ImageToVideoConverterWithPillow(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts image files to video format.
- _convert(input_contents: List[Image], args: None) List[ndarray]
Converts a list of image files to a video file.
- Parameters:
input_contents (List[PillowReader]) – List of input images.
output_file (Path) – Output video file path.
- file_reader: Reader | None = <opencf.io_handlers.pillow.ImageToPillowReader object>
- file_writer: Writer | None = <opencf.io_handlers.opencv.VideoArrayWriter object>
- class opencf.converters.video.VideoToGIFConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
WriterBasedConverterConverts a video file to GIF format.
- _convert(input_contents: List[List[Mat | ndarray]], args: None)
Converts a list of video frames to a GIF.
- Parameters:
input_contents (List[MatLike]) – List of video frames.
- Returns:
The converted GIF content.
- Return type:
bytes
- file_reader: Reader | None = <opencf.io_handlers.opencv.VideoToFramesReaderWithOpenCV object>
- file_writer: Writer | None = <opencf.io_handlers.opencv.FramesToGIFWriterWithImageIO object>
Module contents
Conversion Handlers
This module provides classes for converting between different file formats. It includes concrete implementations of conversion classes for various file types.