Media & Document Processing
Complete reference of media processing and document generation capabilities.
Document Parsing
Supported Input Formats
| Format | Library | Features |
|---|---|---|
pdf-parse | Text extraction, multi-page | |
| DOCX/Word | mammoth | Rich text, formatting |
| RTF | Native | Control code stripping |
| CSV | Native | Headers, row parsing |
| XLSX/Excel | Native | Read-only, cell extraction |
| JSON | Native | Structured data |
| YAML | Native | Configuration files |
| XML | Native | Structured markup |
| HTML | Native | Web content |
| Markdown | Native | Rich text |
| Plain Text | UTF-8 | All text files |
Code File Parsing
All major programming languages with syntax detection:
TypeScript, JavaScript, Python, Java, Go, Rust, C, C++, C#,
Ruby, PHP, Swift, Kotlin, Scala, Shell, SQL, HTML, CSS,
JSON, YAML, XML, Markdown, Dockerfile, Makefile
Document Generation
PDF Generation
Libraries: pdf-lib, Playwright
| Feature | Status |
|---|---|
| Multi-page documents | Supported |
| Custom fonts | Supported (Helvetica default) |
| Text wrapping | Supported |
| Headings and body styles | Supported |
| Color customization (RGB) | Supported |
| Page sizes (A4, Letter, Legal) | Supported |
| Landscape/Portrait | Supported |
| Headers/Footers | Supported |
| Print backgrounds | Supported |
Example:
// Generate PDF from HTML sections
const pdf = await pdfGenerator.generate({
sections: [
{ type: "heading", content: "Report Title" },
{ type: "body", content: "Report content..." }
]
});
Word Documents (DOCX)
Library: docx
| Feature | Status |
|---|---|
| Heading levels | Supported |
| Text runs | Supported |
| Paragraphs | Supported |
| Metadata | Supported |
Screenshots (PNG)
Library: Playwright
| Feature | Status |
|---|---|
| Full page capture | Supported |
| Element screenshots | Supported |
| Quality settings | Supported |
| Viewport configuration | Supported |
Other Formats
| Format | Status | Use Case |
|---|---|---|
| SVG | Supported | Vector diagrams |
| CSV | Supported | Spreadsheet data |
| JSON | Supported | Structured data |
| Markdown | Supported | Rich text |
| Plain Text | Supported | Simple output |
Planned Document Formats
High Priority
| Format | Status | Use Case |
|---|---|---|
| PPTX/PowerPoint | Planned | Presentations |
| XLSX/Excel (write) | Planned | Spreadsheets |
| LaTeX | Planned | Academic papers |
Medium Priority
| Format | Status | Use Case |
|---|---|---|
| ODT (OpenDocument) | Planned | LibreOffice docs |
| ODS (OpenDocument) | Planned | LibreOffice sheets |
| EPUB | Planned | E-books |
| RTF (write) | Planned | Rich text |
OCR (Optical Character Recognition)
Active Providers
| Provider | Status | Features |
|---|---|---|
| Tesseract.js | Supported | LSTM neural network, multi-language |
OCR Features
| Feature | Status |
|---|---|
| Block classification | Supported |
| Confidence scoring | Supported |
| Bounding boxes | Supported |
| Word positioning | Supported |
| Multi-image batch | Supported |
Planned OCR Providers
| Provider | Status | Notes |
|---|---|---|
| Azure Form Recognizer | Planned | Document analysis |
| Google Vision API | Planned | High accuracy |
| AWS Textract | Planned | AWS ecosystem |
Image Processing
Image Formats
| Format | Read | Write |
|---|---|---|
| PNG | Yes | Yes |
| JPEG/JPG | Yes | Yes |
| WebP | Yes | Yes |
| GIF | Yes | Yes |
| BMP | Yes | Yes |
| TIFF | Yes | No |
| SVG | Yes | Yes |
Image Operations
| Operation | Tool | Description |
|---|---|---|
| Resize | media.transform | Change dimensions |
| Crop | media.transform | Extract region |
| Format conversion | media.transform | PNG ↔ JPEG ↔ WebP |
| Enhance | media.transform | Improve quality |
| Redact/blur | media.transform | Hide sensitive content |
Image Annotations
| Type | Status |
|---|---|
| Highlight areas | Supported |
| Arrows/pointers | Supported |
| Text labels | Supported |
| Rectangles | Supported |
| Circles/ellipses | Supported |
| Freehand drawing | Supported |
| Blur/redact | Supported |
| Pin markers | Supported |
Image Generation
Active Providers
| Provider | Status | Output |
|---|---|---|
| DALL-E (OpenAI) | Supported | PNG 1024x1024 |
Planned Providers
| Provider | Status | Notes |
|---|---|---|
| Stable Diffusion | Planned | Open source |
| Flux | Planned | High quality |
| Midjourney | Planned | Artistic style |
Generation Tools
| Tool | Description |
|---|---|
media.create | Generate from text prompt |
media.transform | Modify existing image |
media.analyze | Analyze image content |
Video Processing
Video Formats
| Format | Status |
|---|---|
| MP4 | Supported |
| WebM | Supported |
| OGG | Supported |
| MOV/QuickTime | Supported |
| AVI | Supported |
| MKV/Matroska | Supported |
Video Features
Library: FFmpeg/FFprobe
| Feature | Status |
|---|---|
| Frame extraction | Supported |
| Metadata extraction | Supported |
| Duration detection | Supported |
| Codec information | Supported |
| Resolution detection | Supported |
| FPS detection | Supported |
| Key frame extraction | Supported |
| Thumbnails | Supported |
| Transcoding | Planned |
Planned Video Generation
| Provider | Status |
|---|---|
| Runway | Planned |
| Pika | Planned |
| Sora (OpenAI) | Planned |
Audio Processing
Audio Formats
| Format | Status |
|---|---|
| MP3 | Supported |
| WAV | Supported |
| OGG | Supported |
| FLAC | Supported |
| AAC | Supported |
| WebM Audio | Supported |
| M4A | Supported |
Audio Features
| Feature | Status |
|---|---|
| Metadata extraction | Supported |
| Duration detection | Supported |
| Transcription | Planned (requires speech-to-text) |
Diagrams & Visualization
Diagram Formats
| Format | Status | Use Case |
|---|---|---|
| Mermaid | Supported | Flowcharts, sequence, state, class |
| PlantUML | Supported | UML diagrams |
| Excalidraw | Supported | Hand-drawn style |
| Graphviz (DOT) | Supported | Graph visualization |
Diagram Features
| Feature | Status |
|---|---|
| Theme support | Supported (default, dark, forest, neutral) |
| Editable diagrams | Supported |
| SVG export | Supported |
| Source preservation | Supported |
Chart Types
| Type | Status |
|---|---|
| Line charts | Supported |
| Bar charts | Supported |
| Pie charts | Supported |
| Scatter plots | Supported |
| Area charts | Planned |
| Heatmaps | Planned |
Dynamic Card System
Content Types (21)
| Type | Description |
|---|---|
document | PDF, DOCX, text documents |
image | PNG, JPG, SVG with annotations |
video | MP4, WebM with frames |
audio | MP3, WAV recordings |
code | Syntax-highlighted source |
markdown | Rich text documents |
html | Web content |
json | Structured data viewer |
csv | Spreadsheet data |
form | Dynamic forms |
diagram | Mermaid, PlantUML, etc. |
blueprint | Design blueprints |
spreadsheet | Excel-like data |
terminal | CLI output |
diff | Code diffs |
database | Query results |
search | Search results |
agent | Agent configs |
error | Error displays |
link | URL previews |
preview | Web previews |
Export Formats
| Format | Description |
|---|---|
| Print-ready documents | |
| DOCX | Microsoft Word |
| PNG | Screenshot/image |
| SVG | Vector graphics |
| CSV | Spreadsheet data |
| JSON | Structured data |
| Markdown | Rich text |
| Plain Text | Simple text |
Cloud Storage Integration
Supported Providers
| Provider | Status |
|---|---|
| Google Drive | Supported |
| OneDrive | Supported |
| Dropbox | Supported |
| AWS S3 | Supported |
Features
| Feature | Status |
|---|---|
| Upload | Supported |
| Download | Supported |
| Versioning | Supported |
| Sharing | Supported |
Document Analysis
Layout Analysis
| Feature | Status |
|---|---|
| Text block detection | Supported |
| Image region detection | Supported |
| Table detection | Supported |
| Form field detection | Planned |
Document Tools
| Tool | Description |
|---|---|
document.read | Extract structured data |
document.query | Query specific fields |
document.compute | Calculations on data |
document.create | Create new document |
document.report | Generate report |
document.blueprint | Create architecture docs |
document.artifact | Create code artifacts |
document.spec | Create specifications |