Skip to content
synthreo.ai

File To Text

File To Text node for Builder — extract readable text from PDFs, Word documents, spreadsheets, and images for LLM ingestion, search indexing, and text analysis workflows.


The FileToText node converts documents (PDFs, images, office files) into machine-readable text using OCR (Optical Character Recognition) and parsing techniques. This allows workflows to analyze, search, and process unstructured documents at scale.

  • File Path / Folder Path (String, Optional): Path to the file(s) to be processed. Can be static or dynamically supplied from previous nodes.
  • Extracted Text: The text content extracted from the file(s).
  • Structured Output: Depending on configuration, may include metadata and original file information.
NameTypeRequiredDefaultDescription
File PathStringNo(empty)Specific file path to process. Supports dynamic input.
Folder PathStringNo(empty)Base folder path for batch file processing.
Remove File After ProcessingBoolean (Toggle)NoOffDeletes the original file after extraction.
Limit OCRBoolean (Toggle)NoOffLimits OCR depth for faster but less detailed processing.
OptionDropdownNoOriginal with appended result columnDefines output format: keep original data + extracted text, or return text only.
Result Property NameStringNotext_resultName of the property that stores extracted text.

The node supports a wide range of document and image formats:

  • Documents: PDF, DOCX, TXT, RTF
  • Spreadsheets: XLSX, CSV (basic parsing into text)
  • Images: PNG, JPG/JPEG, TIFF, BMP
  • Scanned Files: Multi-page PDFs and image-based PDFs (via OCR)

⚠️ Note: OCR quality may vary depending on scan resolution, file quality, and language.

  • Setup:
    • Folder Path = /uploads/invoices/
    • Keep Remove File After Processing = Off
    • Enable Limit OCR
  • Result: Invoices are extracted to invoice_text for automated data entry.
  • Setup:
    • Dynamic File Path from email attachments
    • produceChunksFromPdf = Off
    • Option = Return result column only
  • Result: Attachments are extracted into plain text for categorization and AI triage.
  • Organize files in clear folder structures for easier batch processing.
  • Enable Limit OCR for high-volume, simple documents.
  • Use chunking for large or complex PDFs that need contextual sectioning.
  • Always test with sample files before scaling up.
  • Be cautious with Remove File After Processing in compliance-heavy industries.
  • Given: invoice.jpg with Limit OCR = On
    Expected: Extracted text with faster but less detailed results.
  • Given: Folder with 3 PDFs, Option = Return result column only
    Expected: Output array of plain text results, one per file.