Skip to main content

FileToText

FileToText Node Documentation

Overview

The FileToText node automatically converts various file formats (PDFs, images, documents) into readable text content that your workflow can process. This powerful node uses advanced OCR (Optical Character Recognition) and document parsing technology to extract text from files, making it perfect for automating document processing workflows.

What This Node Does: Takes files from your specified location and converts them into text format, allowing other nodes in your workflow to analyze, search, or manipulate the content.

Business Value: Eliminates manual document transcription, enables automated document analysis, and allows you to process hundreds of files in minutes instead of hours.

Configuration Parameters

Data Source Section

File Path

  • Field Name: getterTemplate
  • Type: Smart text field with dynamic data support
  • Default Value: Empty
  • Simple Description: Specifies the exact path or pattern to locate the files you want to convert to text
  • When to Change This: Set this to point to specific files or use dynamic values from previous workflow nodes
  • Business Impact: Accurate file paths ensure your workflow processes the right documents every time

Examples:

  • /documents/contracts/contract-001.pdf (specific file)
  • /uploads/invoices/ (all files in folder)
  • Use data from previous nodes to dynamically specify file locations

Folder Path

  • Field Name: filesFolderPath
  • Type: Smart text field with location mode
  • Default Value: Empty
  • Simple Description: Sets the base folder where your files are stored
  • When to Change This: Point to different document storage locations or use dynamic folder paths
  • Business Impact: Proper folder configuration ensures your workflow can access all relevant documents

Processing Section

Remove File After Processing

  • Field Name: removeFileAfterProcessing
  • Type: Toggle switch (On/Off)
  • Default Value: Off
  • Simple Description: Automatically deletes the original file after successfully extracting its text
  • When to Change This:
    • Turn On: When processing temporary files or when you only need the text content
    • Keep Off: When you need to preserve original files for compliance or backup purposes
  • Business Impact: Helps manage storage space and maintains clean file systems, but use carefully to avoid losing important documents

Limit OCR

  • Field Name: limitOcr
  • Type: Toggle switch (On/Off)
  • Default Value: Off
  • Simple Description: Restricts the OCR processing to improve performance and reduce processing time
  • When to Change This:
    • Turn On: When processing large volumes of files or when basic text extraction is sufficient
    • Keep Off: When you need maximum accuracy for complex documents with challenging layouts
  • Business Impact: Faster processing times but potentially lower accuracy for complex documents

Produce Chunks from PDF

  • Field Name: produceChunksFromPdf
  • Type: Toggle switch (On/Off)
  • Default Value: Off
  • Simple Description: Breaks PDF content into smaller, manageable text segments instead of one large block
  • When to Change This:
    • Turn On: When processing large PDFs that need to be analyzed in sections
    • Keep Off: When you need the complete document as one continuous text block
  • Business Impact: Chunked content is easier for AI analysis and search functions, improving downstream processing efficiency

Output Section

Output Format Options

  • Field Name: outTransformId
  • Type: Dropdown menu with options:
    • Original with appended result column: Keeps all original data and adds the extracted text as a new column
    • Return result column only: Provides only the extracted text content, removing other data
  • Default Value: Original with appended result column
  • Simple Description: Controls how the extracted text is formatted in your workflow output
  • When to Change This: Choose "result column only" when you only need the text content for further processing
  • Business Impact: Proper output formatting ensures compatibility with subsequent workflow nodes

Result Property Name

  • Field Name: outColumnName
  • Type: Text field
  • Default Value: "text_result"
  • Simple Description: Names the column or property that will contain your extracted text
  • When to Change This: Use descriptive names like "contract_text", "invoice_content", or "document_summary" for better workflow organization
  • Business Impact: Clear naming makes your workflow data easier to understand and maintain

Real-World Use Cases

Business Situation: A law firm receives hundreds of PDF contracts daily that need to be reviewed for specific clauses and terms.

What You'll Configure:

  • Set "Folder Path" to your contract storage directory
  • Enable "Produce chunks from PDF" to break contracts into sections
  • Choose "Original with appended result column" to keep file metadata
  • Name the result column "contract_text"

What Happens: Each PDF contract is automatically converted to searchable text, allowing subsequent nodes to find specific clauses, extract key terms, or flag important provisions.

Business Value: Reduces contract review time by 75% and ensures no important clauses are missed during initial screening.

Invoice Processing Automation

Business Situation: An accounting department needs to extract text from scanned invoices to populate their accounting system automatically.

What You'll Configure:

  • Point "File Path" to your invoice upload folder
  • Keep "Remove file after processing" off for audit compliance
  • Enable "Limit OCR" for faster processing of standard invoice formats
  • Set result column name to "invoice_text"

What Happens: Scanned invoices are converted to text, enabling automatic extraction of vendor names, amounts, dates, and line items.

Business Value: Eliminates manual data entry, reduces processing errors by 90%, and speeds up invoice approval workflows.

Customer Support Document Analysis

Business Situation: A support team receives various document types (PDFs, images, scanned forms) that need to be analyzed for customer issues and requests.

What You'll Configure:

  • Use dynamic file paths from previous nodes (like email attachments)
  • Keep "Produce chunks from PDF" off for complete context
  • Choose "Return result column only" to focus on text content
  • Name the output "support_document_text"

What Happens: All document attachments are converted to text, allowing AI nodes to categorize issues, extract customer information, and route requests appropriately.

Business Value: Improves response times by 60% and ensures consistent handling of all document types.

Step-by-Step Configuration

Adding the Node

  1. Drag the FileToText node from the left panel onto your workflow canvas
  2. Connect it to the previous node using the arrow connector
  3. Click on the FileToText node to open the configuration panel

Setting Up Data Source

  1. In the "Data Source" section, enter your file path in the "File Path" field
  2. If processing multiple files, specify the "Folder Path" where your documents are stored
  3. Use the smart text features to include dynamic values from previous workflow steps

Configuring Processing Options

  1. In the "Processing" section, decide whether to remove files after processing
  2. Toggle "Limit OCR" on if you need faster processing for simple documents
  3. Enable "Produce chunks from PDF" if you're working with large PDF files that need sectioning

Setting Output Format

  1. In the "Output" section, choose your preferred output format from the dropdown
  2. Enter a descriptive name for your result column (like "extracted_text" or "document_content")
  3. Click "Save Configuration" to apply your settings

Testing Your Setup

  1. Use the "Test Configuration" button to verify your settings
  2. Check that files are being found and processed correctly
  3. Review the output format to ensure it matches your needs

Industry Applications

Healthcare Organizations

Common Challenge: Medical records, lab reports, and patient forms exist in various formats that need to be digitized and searchable.

How This Node Helps: Converts all document types into searchable text, enabling automated patient record updates and compliance reporting.

Configuration Recommendations:

  • Keep "Remove file after processing" off for HIPAA compliance
  • Use "Produce chunks from PDF" for multi-page medical reports
  • Set descriptive output names like "patient_record_text"

Results: Healthcare providers reduce record processing time by 80% and improve patient data accessibility.

Financial Services

Common Challenge: Banks and financial institutions process thousands of loan applications, statements, and compliance documents daily.

How This Node Helps: Automatically extracts text from financial documents for risk assessment, compliance checking, and data entry automation.

Configuration Recommendations:

  • Enable careful OCR processing for accuracy with financial data
  • Maintain original files for audit trails
  • Use chunking for complex multi-page financial reports

Results: Financial institutions see 70% faster loan processing and 95% reduction in data entry errors.

Real Estate

Common Challenge: Property documents, contracts, and inspection reports need to be processed and analyzed for key information.

How This Node Helps: Converts property documents into searchable text for automated analysis of terms, conditions, and property details.

Configuration Recommendations:

  • Process contracts in chunks for easier clause analysis
  • Preserve original documents for legal compliance
  • Use descriptive output naming for different document types

Results: Real estate firms reduce document processing time by 65% and improve accuracy in property analysis.

Best Practices

File Organization

  • Organize source files in clearly named folders
  • Use consistent file naming conventions
  • Ensure files are in supported formats (PDF, images, common document types)

Performance Optimization

  • Enable "Limit OCR" for high-volume processing of simple documents
  • Use chunking for large PDFs that will be analyzed by AI nodes
  • Consider file cleanup policies based on your business requirements

Quality Assurance

  • Test with sample files before processing large batches
  • Verify output formatting meets your downstream node requirements
  • Monitor processing results for accuracy and completeness

Security Considerations

  • Be cautious with the "Remove file after processing" option for sensitive documents
  • Ensure proper access controls on source file locations
  • Consider compliance requirements when configuring file handling options

The FileToText node transforms your document processing workflows from manual, time-consuming tasks into automated, efficient operations that scale with your business needs.