opencf.converters package
Submodules
Conversion Handlers - Document
This module provides classes for converting between document different file formats. It includes concrete implementations of conversion classes for various file types (pdf, docx, epub, …).
- class opencf.converters.document.ImageToPDFConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts image files to PDF format.
- file_reader: FileReader = <opencf.io_handlers.img_pillow.ImageToPillowReader object>
- file_writer: FileWriter = None
- folder_as_output: bool = False
- class opencf.converters.document.ImageToPDFConverterWithPyPdf2(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts image files to PDF format using PyPDF2.
- file_reader: FileReader = <opencf.io_handlers.img_pillow.ImageToPillowReader object>
- file_writer: FileWriter = <opencf.io_handlers.pdf.PyPdfToPdfWriter object>
- class opencf.converters.document.PDFToDocxConvertor(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts PDF files to docx format using [pdf2docx](https://github.com/ArtifexSoftware/pdf2docx) as recommanded [here](https://stackoverflow.com/a/65932031/16668046) There s also a cli interface as presented in [their online](https://pdf2docx.readthedocs.io/en/latest/quickstart.cli.html)
- _convert(input_contents: List[Path], output_file: Path)
read more [here](https://pypdf2.readthedocs.io/en/3.0.0/user/extract-images.html)
- file_reader: FileReader = None
- file_writer: FileWriter = None
- folder_as_output: bool = False
- class opencf.converters.document.PDFToHTML(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverteri could use this [tool](https://linux.die.net/man/1/pdftohtml) to do it
- class opencf.converters.document.PDFToImageConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts PDF files to image format. Not implemented yet
- file_reader: FileReader = <opencf.io_handlers.pdf.PdfToPyPdfReader object>
- file_writer: FileWriter = None
- folder_as_output: bool = True
- class opencf.converters.document.PDFToImageExtractor(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts PDF files to image format.
- _convert(input_contents: List[PdfReader], output_folder: Path)
read more [here](https://pypdf2.readthedocs.io/en/3.0.0/user/extract-images.html)
- file_reader: FileReader = <opencf.io_handlers.pdf.PdfToPyPdfReader object>
- file_writer: FileWriter = None
- folder_as_output: bool = True
Conversion Handlers - Textual/Markup
This module provides classes for converting between different markup file formats. It includes concrete implementations of conversion classes for various file types (txt, md, html, …).
- class opencf.converters.markup.TXTToMDConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
TextToTextConverterConverts text files to Markdown format.
- class opencf.converters.markup.TextToTextConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts text files to text format.
- file_reader: FileReader = <opencf_core.io_handler.TxtToStrReader object>
- file_writer: FileWriter = <opencf_core.io_handler.StrToTxtWriter object>
Conversion Handlers - Structured
This module provides classes for converting between stuctured different file formats. It includes concrete implementations of conversion classes for various file types (xml, json, xlsx, csv, …).
- class opencf.converters.structured.CSVToXMLConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts CSV files to XML format.
- file_reader: FileReader = <opencf_core.io_handler.CsvToListReader object>
- file_writer: FileWriter = <opencf_core.io_handler.StrToXmlWriter object>
- class opencf.converters.structured.JSONToCSVConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts JSON files to CSV format.
- file_reader: FileReader = <opencf_core.io_handler.JsonToDictReader object>
- file_writer: FileWriter = <opencf_core.io_handler.ListToCsvWriter object>
- class opencf.converters.structured.XLSXToCSVConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts Excel files to CSV format.
- file_reader: FileReader = <opencf.io_handlers.spreadsheet.SpreadsheetToPandasReader object>
- file_writer: FileWriter = <opencf_core.io_handler.ListToCsvWriter object>
- class opencf.converters.structured.XMLToJSONConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts XML files to JSON format.
- file_reader: FileReader = <opencf_core.io_handler.XmlToStrReader object>
- file_writer: FileWriter = <opencf_core.io_handler.DictToJsonWriter object>
Conversion Handlers - Video
This module provides classes for converting between different video file formats. It includes concrete implementations of conversion classes for various file types (images, mp4, avi, gif, …).
- class opencf.converters.video.ImageToVideoConverterWithOpenCV(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts image files to video format.
- _convert(input_contents: List[ndarray])
Converts a list of image files to a video file.
- Parameters:
input_contents (List[np.ndarray]) – List of input images.
output_file (Path) – Output video file path.
- file_reader: FileReader = <opencf.io_handlers.img_opencv.ImageToOpenCVReader object>
- file_writer: FileWriter = <opencf.io_handlers.video.VideoArrayWriter object>
- class opencf.converters.video.ImageToVideoConverterWithPillow(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts image files to video format.
- _convert(input_contents: List[Image])
Converts a list of image files to a video file.
- Parameters:
input_contents (List[PillowImage.Image]) – List of input images.
output_file (Path) – Output video file path.
- file_reader: FileReader = <opencf.io_handlers.img_pillow.ImageToPillowReader object>
- file_writer: FileWriter = <opencf.io_handlers.video.VideoArrayWriter object>
- class opencf.converters.video.VideoToGIFConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverterConverts a video file to GIF format.
- _convert(input_contents: List[List[Mat | ndarray]])
Converts a list of video frames to a GIF.
- Parameters:
input_contents (List[MatLike]) – List of video frames.
- Returns:
The converted GIF content.
- Return type:
bytes
- file_reader: FileReader = <opencf.io_handlers.video.VideoToFramesReaderWithOpenCV object>
- file_writer: FileWriter = <opencf.io_handlers.video.FramesToGIFWriterWithImageIO object>
Module contents
Conversion Handlers
This module provides classes for converting between different file formats. It includes concrete implementations of conversion classes for various file types.