FileToText

FileToText Node Documentation

Overview

The FileToText node automatically converts various file formats (PDFs, images, documents) into readable text content that your workflow can process. This powerful node uses advanced OCR (Optical Character Recognition) and document parsing technology to extract text from files, making it perfect for automating document processing workflows.

What This Node Does: Takes files from your specified location and converts them into text format, allowing other nodes in your workflow to analyze, search, or manipulate the content.

Business Value: Eliminates manual document transcription, enables automated document analysis, and allows you to process hundreds of files in minutes instead of hours.

Configuration Parameters

Data Source Section

File Path

Field Name: getterTemplate
Type: Smart text field with dynamic data support
Default Value: Empty
Simple Description: Specifies the exact path or pattern to locate the files you want to convert to text
When to Change This: Set this to point to specific files or use dynamic values from previous workflow nodes
Business Impact: Accurate file paths ensure your workflow processes the right documents every time

Examples:

/documents/contracts/contract-001.pdf (specific file)
/uploads/invoices/ (all files in folder)
Use data from previous nodes to dynamically specify file locations

Folder Path

Field Name: filesFolderPath
Type: Smart text field with location mode
Default Value: Empty
Simple Description: Sets the base folder where your files are stored
When to Change This: Point to different document storage locations or use dynamic folder paths
Business Impact: Proper folder configuration ensures your workflow can access all relevant documents

Processing Section

Remove File After Processing

Field Name: removeFileAfterProcessing
Type: Toggle switch (On/Off)
Default Value: Off
Simple Description: Automatically deletes the original file after successfully extracting its text
When to Change This:
- Turn On: When processing temporary files or when you only need the text content
- Keep Off: When you need to preserve original files for compliance or backup purposes
Business Impact: Helps manage storage space and maintains clean file systems, but use carefully to avoid losing important documents

Limit OCR

Field Name: limitOcr
Type: Toggle switch (On/Off)
Default Value: Off
Simple Description: Restricts the OCR processing to improve performance and reduce processing time
When to Change This:
- Turn On: When processing large volumes of files or when basic text extraction is sufficient
- Keep Off: When you need maximum accuracy for complex documents with challenging layouts
Business Impact: Faster processing times but potentially lower accuracy for complex documents

Produce Chunks from PDF

Field Name: produceChunksFromPdf
Type: Toggle switch (On/Off)
Default Value: Off
Simple Description: Breaks PDF content into smaller, manageable text segments instead of one large block
When to Change This:
- Turn On: When processing large PDFs that need to be analyzed in sections
- Keep Off: When you need the complete document as one continuous text block
Business Impact: Chunked content is easier for AI analysis and search functions, improving downstream processing efficiency

Output Section

Output Format Options

Field Name: outTransformId
Type: Dropdown menu with options:
- Original with appended result column: Keeps all original data and adds the extracted text as a new column
- Return result column only: Provides only the extracted text content, removing other data
Default Value: Original with appended result column
Simple Description: Controls how the extracted text is formatted in your workflow output
When to Change This: Choose "result column only" when you only need the text content for further processing
Business Impact: Proper output formatting ensures compatibility with subsequent workflow nodes

Result Property Name

Field Name: outColumnName
Type: Text field
Default Value: "text_result"
Simple Description: Names the column or property that will contain your extracted text
When to Change This: Use descriptive names like "contract_text", "invoice_content", or "document_summary" for better workflow organization
Business Impact: Clear naming makes your workflow data easier to understand and maintain

Real-World Use Cases

Legal Document Processing

Business Situation: A law firm receives hundreds of PDF contracts daily that need to be reviewed for specific clauses and terms.

What You'll Configure:

Set "Folder Path" to your contract storage directory
Enable "Produce chunks from PDF" to break contracts into sections
Choose "Original with appended result column" to keep file metadata
Name the result column "contract_text"

What Happens: Each PDF contract is automatically converted to searchable text, allowing subsequent nodes to find specific clauses, extract key terms, or flag important provisions.

Business Value: Reduces contract review time by 75% and ensures no important clauses are missed during initial screening.

Invoice Processing Automation

Business Situation: An accounting department needs to extract text from scanned invoices to populate their accounting system automatically.

What You'll Configure:

Point "File Path" to your invoice upload folder
Keep "Remove file after processing" off for audit compliance
Enable "Limit OCR" for faster processing of standard invoice formats
Set result column name to "invoice_text"

What Happens: Scanned invoices are converted to text, enabling automatic extraction of vendor names, amounts, dates, and line items.

Business Value: Eliminates manual data entry, reduces processing errors by 90%, and speeds up invoice approval workflows.

Customer Support Document Analysis

Business Situation: A support team receives various document types (PDFs, images, scanned forms) that need to be analyzed for customer issues and requests.

What You'll Configure:

Use dynamic file paths from previous nodes (like email attachments)
Keep "Produce chunks from PDF" off for complete context
Choose "Return result column only" to focus on text content
Name the output "support_document_text"

What Happens: All document attachments are converted to text, allowing AI nodes to categorize issues, extract customer information, and route requests appropriately.

Business Value: Improves response times by 60% and ensures consistent handling of all document types.

Step-by-Step Configuration

Adding the Node

Drag the FileToText node from the left panel onto your workflow canvas
Connect it to the previous node using the arrow connector
Click on the FileToText node to open the configuration panel

Setting Up Data Source

In the "Data Source" section, enter your file path in the "File Path" field
If processing multiple files, specify the "Folder Path" where your documents are stored
Use the smart text features to include dynamic values from previous workflow steps

Configuring Processing Options

In the "Processing" section, decide whether to remove files after processing
Toggle "Limit OCR" on if you need faster processing for simple documents
Enable "Produce chunks from PDF" if you're working with large PDF files that need sectioning

Setting Output Format

In the "Output" section, choose your preferred output format from the dropdown
Enter a descriptive name for your result column (like "extracted_text" or "document_content")
Click "Save Configuration" to apply your settings

Testing Your Setup

Use the "Test Configuration" button to verify your settings
Check that files are being found and processed correctly
Review the output format to ensure it matches your needs

Industry Applications

Healthcare Organizations

Common Challenge: Medical records, lab reports, and patient forms exist in various formats that need to be digitized and searchable.

How This Node Helps: Converts all document types into searchable text, enabling automated patient record updates and compliance reporting.

Configuration Recommendations:

Keep "Remove file after processing" off for HIPAA compliance
Use "Produce chunks from PDF" for multi-page medical reports
Set descriptive output names like "patient_record_text"

Results: Healthcare providers reduce record processing time by 80% and improve patient data accessibility.

Financial Services

Common Challenge: Banks and financial institutions process thousands of loan applications, statements, and compliance documents daily.

How This Node Helps: Automatically extracts text from financial documents for risk assessment, compliance checking, and data entry automation.

Configuration Recommendations:

Enable careful OCR processing for accuracy with financial data
Maintain original files for audit trails
Use chunking for complex multi-page financial reports

Results: Financial institutions see 70% faster loan processing and 95% reduction in data entry errors.

Real Estate

Common Challenge: Property documents, contracts, and inspection reports need to be processed and analyzed for key information.

How This Node Helps: Converts property documents into searchable text for automated analysis of terms, conditions, and property details.

Configuration Recommendations:

Process contracts in chunks for easier clause analysis
Preserve original documents for legal compliance
Use descriptive output naming for different document types

Results: Real estate firms reduce document processing time by 65% and improve accuracy in property analysis.

Best Practices

File Organization

Organize source files in clearly named folders
Use consistent file naming conventions
Ensure files are in supported formats (PDF, images, common document types)

Performance Optimization

Enable "Limit OCR" for high-volume processing of simple documents
Use chunking for large PDFs that will be analyzed by AI nodes
Consider file cleanup policies based on your business requirements

Quality Assurance

Test with sample files before processing large batches
Verify output formatting meets your downstream node requirements
Monitor processing results for accuracy and completeness

Security Considerations

Be cautious with the "Remove file after processing" option for sensitive documents
Ensure proper access controls on source file locations
Consider compliance requirements when configuring file handling options

The FileToText node transforms your document processing workflows from manual, time-consuming tasks into automated, efficient operations that scale with your business needs.

FileToText Node Documentation

Overview​

Configuration Parameters​

Data Source Section​

File Path​

Folder Path​

Processing Section​

Remove File After Processing​

Limit OCR​

Produce Chunks from PDF​

Output Section​

Output Format Options​

Result Property Name​

Real-World Use Cases​

Legal Document Processing​

Invoice Processing Automation​

Customer Support Document Analysis​

Step-by-Step Configuration​

Adding the Node​

Setting Up Data Source​

Configuring Processing Options​

Setting Output Format​

Testing Your Setup​

Industry Applications​

Healthcare Organizations​

Financial Services​

Real Estate​

Best Practices​

File Organization​

Performance Optimization​

Quality Assurance​

Security Considerations​

Overview

Configuration Parameters

Data Source Section

File Path

Folder Path

Processing Section

Remove File After Processing

Limit OCR

Produce Chunks from PDF

Output Section

Output Format Options

Result Property Name

Real-World Use Cases

Legal Document Processing

Invoice Processing Automation

Customer Support Document Analysis

Step-by-Step Configuration

Adding the Node

Setting Up Data Source

Configuring Processing Options

Setting Output Format

Testing Your Setup

Industry Applications

Healthcare Organizations

Financial Services

Real Estate

Best Practices

File Organization

Performance Optimization

Quality Assurance

Security Considerations