Skip to main content

Media & Document Processing

Complete reference of media processing and document generation capabilities.

Document Parsing

Supported Input Formats

FormatLibraryFeatures
PDFpdf-parseText extraction, multi-page
DOCX/WordmammothRich text, formatting
RTFNativeControl code stripping
CSVNativeHeaders, row parsing
XLSX/ExcelNativeRead-only, cell extraction
JSONNativeStructured data
YAMLNativeConfiguration files
XMLNativeStructured markup
HTMLNativeWeb content
MarkdownNativeRich text
Plain TextUTF-8All text files

Code File Parsing

All major programming languages with syntax detection:

TypeScript, JavaScript, Python, Java, Go, Rust, C, C++, C#,
Ruby, PHP, Swift, Kotlin, Scala, Shell, SQL, HTML, CSS,
JSON, YAML, XML, Markdown, Dockerfile, Makefile

Document Generation

PDF Generation

Libraries: pdf-lib, Playwright

FeatureStatus
Multi-page documentsSupported
Custom fontsSupported (Helvetica default)
Text wrappingSupported
Headings and body stylesSupported
Color customization (RGB)Supported
Page sizes (A4, Letter, Legal)Supported
Landscape/PortraitSupported
Headers/FootersSupported
Print backgroundsSupported

Example:

// Generate PDF from HTML sections
const pdf = await pdfGenerator.generate({
sections: [
{ type: "heading", content: "Report Title" },
{ type: "body", content: "Report content..." }
]
});

Word Documents (DOCX)

Library: docx

FeatureStatus
Heading levelsSupported
Text runsSupported
ParagraphsSupported
MetadataSupported

Screenshots (PNG)

Library: Playwright

FeatureStatus
Full page captureSupported
Element screenshotsSupported
Quality settingsSupported
Viewport configurationSupported

Other Formats

FormatStatusUse Case
SVGSupportedVector diagrams
CSVSupportedSpreadsheet data
JSONSupportedStructured data
MarkdownSupportedRich text
Plain TextSupportedSimple output

Planned Document Formats

High Priority

FormatStatusUse Case
PPTX/PowerPointPlannedPresentations
XLSX/Excel (write)PlannedSpreadsheets
LaTeXPlannedAcademic papers

Medium Priority

FormatStatusUse Case
ODT (OpenDocument)PlannedLibreOffice docs
ODS (OpenDocument)PlannedLibreOffice sheets
EPUBPlannedE-books
RTF (write)PlannedRich text

OCR (Optical Character Recognition)

Active Providers

ProviderStatusFeatures
Tesseract.jsSupportedLSTM neural network, multi-language

OCR Features

FeatureStatus
Block classificationSupported
Confidence scoringSupported
Bounding boxesSupported
Word positioningSupported
Multi-image batchSupported

Planned OCR Providers

ProviderStatusNotes
Azure Form RecognizerPlannedDocument analysis
Google Vision APIPlannedHigh accuracy
AWS TextractPlannedAWS ecosystem

Image Processing

Image Formats

FormatReadWrite
PNGYesYes
JPEG/JPGYesYes
WebPYesYes
GIFYesYes
BMPYesYes
TIFFYesNo
SVGYesYes

Image Operations

OperationToolDescription
Resizemedia.transformChange dimensions
Cropmedia.transformExtract region
Format conversionmedia.transformPNG ↔ JPEG ↔ WebP
Enhancemedia.transformImprove quality
Redact/blurmedia.transformHide sensitive content

Image Annotations

TypeStatus
Highlight areasSupported
Arrows/pointersSupported
Text labelsSupported
RectanglesSupported
Circles/ellipsesSupported
Freehand drawingSupported
Blur/redactSupported
Pin markersSupported

Image Generation

Active Providers

ProviderStatusOutput
DALL-E (OpenAI)SupportedPNG 1024x1024

Planned Providers

ProviderStatusNotes
Stable DiffusionPlannedOpen source
FluxPlannedHigh quality
MidjourneyPlannedArtistic style

Generation Tools

ToolDescription
media.createGenerate from text prompt
media.transformModify existing image
media.analyzeAnalyze image content

Video Processing

Video Formats

FormatStatus
MP4Supported
WebMSupported
OGGSupported
MOV/QuickTimeSupported
AVISupported
MKV/MatroskaSupported

Video Features

Library: FFmpeg/FFprobe

FeatureStatus
Frame extractionSupported
Metadata extractionSupported
Duration detectionSupported
Codec informationSupported
Resolution detectionSupported
FPS detectionSupported
Key frame extractionSupported
ThumbnailsSupported
TranscodingPlanned

Planned Video Generation

ProviderStatus
RunwayPlanned
PikaPlanned
Sora (OpenAI)Planned

Audio Processing

Audio Formats

FormatStatus
MP3Supported
WAVSupported
OGGSupported
FLACSupported
AACSupported
WebM AudioSupported
M4ASupported

Audio Features

FeatureStatus
Metadata extractionSupported
Duration detectionSupported
TranscriptionPlanned (requires speech-to-text)

Diagrams & Visualization

Diagram Formats

FormatStatusUse Case
MermaidSupportedFlowcharts, sequence, state, class
PlantUMLSupportedUML diagrams
ExcalidrawSupportedHand-drawn style
Graphviz (DOT)SupportedGraph visualization

Diagram Features

FeatureStatus
Theme supportSupported (default, dark, forest, neutral)
Editable diagramsSupported
SVG exportSupported
Source preservationSupported

Chart Types

TypeStatus
Line chartsSupported
Bar chartsSupported
Pie chartsSupported
Scatter plotsSupported
Area chartsPlanned
HeatmapsPlanned

Dynamic Card System

Content Types (21)

TypeDescription
documentPDF, DOCX, text documents
imagePNG, JPG, SVG with annotations
videoMP4, WebM with frames
audioMP3, WAV recordings
codeSyntax-highlighted source
markdownRich text documents
htmlWeb content
jsonStructured data viewer
csvSpreadsheet data
formDynamic forms
diagramMermaid, PlantUML, etc.
blueprintDesign blueprints
spreadsheetExcel-like data
terminalCLI output
diffCode diffs
databaseQuery results
searchSearch results
agentAgent configs
errorError displays
linkURL previews
previewWeb previews

Export Formats

FormatDescription
PDFPrint-ready documents
DOCXMicrosoft Word
PNGScreenshot/image
SVGVector graphics
CSVSpreadsheet data
JSONStructured data
MarkdownRich text
Plain TextSimple text

Cloud Storage Integration

Supported Providers

ProviderStatus
Google DriveSupported
OneDriveSupported
DropboxSupported
AWS S3Supported

Features

FeatureStatus
UploadSupported
DownloadSupported
VersioningSupported
SharingSupported

Document Analysis

Layout Analysis

FeatureStatus
Text block detectionSupported
Image region detectionSupported
Table detectionSupported
Form field detectionPlanned

Document Tools

ToolDescription
document.readExtract structured data
document.queryQuery specific fields
document.computeCalculations on data
document.createCreate new document
document.reportGenerate report
document.blueprintCreate architecture docs
document.artifactCreate code artifacts
document.specCreate specifications