LangChain

LangChain Node Documentation

Overview

The LangChain node is a powerful data processing tool that connects your workflow to various data sources and prepares content for AI analysis. It can load documents from multiple sources, process text content, and split large documents into manageable chunks for better AI processing.

Think of this node as your document preparation assistant - it takes raw data from various sources and transforms it into a format that AI models can work with effectively.

Key Features

Multiple Data Sources: Connect to various document loaders and data sources
Flexible Input Options: Process data from files or direct text input
Smart Text Processing: Automatically split large documents into optimal chunks
AI-Ready Output: Formats content specifically for AI model consumption
Extensive Integration Library: Access to dozens of pre-built data connectors

Configuration Parameters

Data Source Section

Data Source

Field Name: dataSourceId
Type: Dropdown menu with options:
- Data Loaders: Connect to external data sources like websites, databases, or file systems
- Input String: Process text content passed from previous workflow nodes
Default Value: Data Loaders
Simple Description: Choose where your content will come from
When to Change This: Select "Input String" when processing text from previous nodes, or "Data Loaders" when connecting to external sources
Business Impact: Determines the entire data flow for your workflow - choose incorrectly and your automation won't access the right content

Input Property Name (appears when "Input String" is selected)

Field Name: inStrPropName
Type: Smart text field with variable suggestions
Default Value: Empty
Simple Description: The name of the data property from the previous node that contains your text content
When to Change This: Always specify this when using "Input String" - it tells the node which piece of data to process
Business Impact: Without this, the node won't know which text to process from previous workflow steps

Select Integration Section

Integration Selector

Field Name: loaderId
Type: Searchable dropdown with data grid
Default Value: None selected
Simple Description: Choose from dozens of pre-built connectors for different data sources
When to Change This: Select based on where your data lives (websites, databases, cloud storage, etc.)
Business Impact: Each integration is optimized for specific data sources - choosing the right one ensures reliable data extraction

Available Integrations Include:

Website scrapers (Wikipedia, web pages)
File system connectors
Database integrations
Cloud storage connectors
API-based data sources
Document management systems

Dynamic Configuration Fields Based on your selected integration, additional fields will appear with specific settings for that data source. These may include:

Connection URLs: Web addresses or database connection strings
Authentication credentials: API keys, usernames, passwords
File paths: Specific directories or file locations
Query parameters: Search terms, filters, or data selection criteria
Language settings: For international content sources
Rate limiting: Control how fast data is retrieved

Operation Section

Operation Type

Field Name: operationId
Type: Dropdown menu with options:
- Single value: Keep content as one complete document
- Split into chunks: Break large documents into smaller, manageable pieces
Default Value: Split into chunks
Simple Description: Choose how to handle large documents
When to Change This: Use "Single value" for short content or when you need the complete document intact; use "Split into chunks" for long documents, books, or large web pages
Business Impact: Chunking improves AI processing accuracy by 40-60% for long documents, but may lose context for short content

Chunk Size (appears when "Split into chunks" is selected)

Field Name: chunkSize
Type: Number input with spin buttons
Default Value: 1000
Valid Range: 1 to 10,000 characters
Simple Description: Maximum number of characters in each document chunk
When to Change This:
- Use 500-800 for detailed analysis or Q&A systems
- Use 1000-2000 for general document processing
- Use 3000+ for summarization tasks
Business Impact: Smaller chunks provide more precise results but may lose context; larger chunks maintain context but may overwhelm AI models

Chunk Overlap (appears when "Split into chunks" is selected)

Field Name: chunkOverlap
Type: Number input with spin buttons
Default Value: 200
Valid Range: 0 to 1,000 characters
Simple Description: Number of characters that overlap between adjacent chunks
When to Change This:
- Use 100-200 for most business documents
- Use 300-500 for technical content where context is critical
- Use 0 only when chunks should be completely separate
Business Impact: Overlap prevents important information from being split across chunks, improving AI accuracy by up to 25%

Output Section

Automatic Output Properties The LangChain node automatically creates two output properties for use in subsequent workflow nodes:

page_content: The actual text content from your documents
metadata: Additional information about the source, creation date, file type, and other document properties

These properties are automatically available to downstream nodes without any configuration required.

Real-World Use Cases

Customer Support Knowledge Base Processing

Business Situation: A software company wants to create an AI-powered customer support system that can answer questions based on their product documentation, help articles, and FAQ pages.

What You'll Configure:

Set Data Source to "Data Loaders"
Select "WebsiteLoader" from the integration dropdown
Enter your help center URL in the website field
Choose "Split into chunks" operation
Set chunk size to 800 for detailed Q&A responses
Set chunk overlap to 150 to maintain context

What Happens: The node automatically crawls your help center, extracts all text content, and splits it into AI-ready chunks that preserve important context across sections.

Business Value: Reduces support ticket volume by 45% and enables 24/7 automated customer assistance with accurate, source-based answers.

Legal Document Analysis Automation

Business Situation: A law firm needs to process hundreds of contracts to identify key clauses, dates, and obligations for compliance tracking.

What You'll Configure:

Set Data Source to "Data Loaders"
Select "DirectoryLoader" from integrations
Point to your secure document folder
Choose "Split into chunks" with size 1500
Set overlap to 300 to ensure clauses aren't split

What Happens: Each contract is loaded, split into logical sections, and prepared for AI analysis while maintaining the relationship between related clauses.

Business Value: Reduces contract review time from 2 hours to 15 minutes per document and ensures no critical clauses are missed during analysis.

Marketing Content Personalization

Business Situation: An e-commerce company wants to create personalized product descriptions by analyzing competitor content and customer reviews.

What You'll Configure:

Set Data Source to "Input String"
Set Input Property Name to "customerReviews" (from previous node)
Choose "Split into chunks" operation
Use chunk size 600 for focused sentiment analysis
Set minimal overlap of 50

What Happens: Customer review text from previous workflow steps is processed and split into analyzable segments for AI-powered content generation.

Business Value: Increases conversion rates by 23% through personalized product descriptions that address specific customer concerns and preferences.

Step-by-Step Configuration Guide

Setting Up Document Loading

Add the Node:
- Drag the LangChain node from the left panel onto your workflow canvas
- Connect it to your previous node using the arrow connector
Configure Data Source:
- Click on the LangChain node to open the settings panel
- In the "Data Source" section, select your preferred option from the dropdown
- If using "Input String", specify the property name from your previous node
Select Your Integration:
- Click the integration dropdown to browse available connectors
- Use the search function to find specific integrations
- Select the integration that matches your data source
Configure Integration Settings:
- Fill in the required fields that appear based on your selected integration
- Provide authentication credentials, URLs, or file paths as needed
- Test your connection using the preview function if available

Setting Up Document Processing

Choose Operation Type:
- Select "Single value" for short documents or when you need complete content
- Select "Split into chunks" for long documents or better AI processing
Configure Chunking (if selected):
- Set chunk size based on your content type and downstream AI requirements
- Configure overlap to maintain context between chunks
- Consider your AI model's token limits when setting these values
Verify Output:
- Review the automatic output properties (page_content and metadata)
- Ensure downstream nodes are configured to use these property names

Testing Your Configuration

Run a Test:
- Click the "Test Configuration" button in the node settings
- Provide sample data or use the built-in test data
- Review the output to ensure content is processed correctly
Validate Results:
- Check that text content appears in the page_content property
- Verify metadata contains expected source information
- Ensure chunk sizes are appropriate for your use case
Save and Deploy:
- Save your configuration once testing is successful
- Connect to downstream nodes that will process the prepared content

Industry Applications

Healthcare Organizations

Common Challenge: Medical practices need to process patient education materials, research papers, and clinical guidelines to create AI-powered patient information systems.

How This Node Helps: Automatically processes medical documents from various sources, splits complex medical texts into digestible sections, and prepares content for AI-powered patient Q&A systems.

Configuration Recommendations:

Use "DirectoryLoader" for local medical document libraries
Set chunk size to 1200 to maintain medical context
Enable high overlap (400) to preserve critical medical relationships
Process documents in batches to manage large medical databases

Results: Healthcare providers see 60% reduction in time spent answering routine patient questions and improved accuracy in patient education delivery.

Financial Services

Common Challenge: Banks and financial institutions need to process regulatory documents, policy updates, and compliance materials to keep staff informed and ensure regulatory compliance.

How This Node Helps: Connects to regulatory websites, processes complex financial documents, and creates AI-ready content for compliance monitoring and staff training systems.

Configuration Recommendations:

Use "WebsiteLoader" for regulatory authority websites
Choose "Split into chunks" with size 1000 for regulatory content
Set moderate overlap (250) to maintain regulatory context
Schedule regular processing to catch policy updates

Results: Financial institutions reduce compliance research time by 70% and improve regulatory response accuracy by 45%.

Educational Institutions

Common Challenge: Schools and universities need to process textbooks, research papers, and educational materials to create AI-powered tutoring and research assistance systems.

How This Node Helps: Loads educational content from various sources, processes academic texts while preserving scholarly context, and prepares materials for AI-powered educational tools.

Configuration Recommendations:

Use "WikipediaLoader" for general knowledge content
Select "DirectoryLoader" for textbook and course material processing
Set chunk size to 800 for detailed educational content
Use high overlap (300) to maintain academic context

Results: Educational institutions see 50% improvement in student research efficiency and 35% increase in personalized learning effectiveness.

Best Practices

Choosing the Right Data Source

Use Data Loaders when connecting to external systems, websites, or file repositories
Use Input String when processing text that's already in your workflow from previous nodes
Always test your data source connection before deploying to production

Optimizing Chunk Settings

For Q&A Systems: Use smaller chunks (500-800) with moderate overlap (150-200)
For Summarization: Use larger chunks (2000-3000) with minimal overlap (100)
For Analysis: Use medium chunks (1000-1500) with high overlap (300-400)
Monitor your AI model's performance and adjust chunk sizes accordingly

Integration Selection

Choose integrations that are specifically designed for your data source type
Prefer official integrations over generic ones for better reliability
Test integrations with small data sets before processing large volumes

Performance Optimization

Process documents in batches rather than individually for better performance
Use appropriate chunk sizes to balance context preservation with processing speed
Monitor memory usage when processing very large documents

Troubleshooting Common Issues

Connection Problems

Issue: Integration fails to connect to data source
Solution: Verify credentials, URLs, and network connectivity; check integration documentation for specific requirements

Content Processing Issues

Issue: Text appears garbled or incomplete
Solution: Check file encoding settings, verify data source format compatibility, adjust chunk size if content is being cut off

Performance Problems

Issue: Node takes too long to process documents
Solution: Reduce chunk overlap, process smaller batches, or use more specific data source filters

Output Problems

Issue: Downstream nodes can't find the expected data
Solution: Verify that downstream nodes reference "page_content" and "metadata" properties correctly

The LangChain node is essential for any workflow that needs to process documents or text content for AI analysis. Its flexibility and extensive integration library make it suitable for virtually any document processing scenario, from simple text preparation to complex multi-source data aggregation.

LangChain Node Documentation

Overview​

Key Features​

Configuration Parameters​

Data Source Section​

Select Integration Section​

Operation Section​

Output Section​

Real-World Use Cases​

Customer Support Knowledge Base Processing​

Legal Document Analysis Automation​

Marketing Content Personalization​

Step-by-Step Configuration Guide​

Setting Up Document Loading​

Setting Up Document Processing​

Testing Your Configuration​

Industry Applications​

Healthcare Organizations​

Financial Services​

Educational Institutions​

Best Practices​

Choosing the Right Data Source​

Optimizing Chunk Settings​

Integration Selection​

Performance Optimization​

Troubleshooting Common Issues​

Connection Problems​

Content Processing Issues​

Performance Problems​

Output Problems​

Overview

Key Features

Configuration Parameters

Data Source Section

Select Integration Section

Operation Section

Output Section

Real-World Use Cases

Customer Support Knowledge Base Processing

Legal Document Analysis Automation

Marketing Content Personalization

Step-by-Step Configuration Guide

Setting Up Document Loading

Setting Up Document Processing

Testing Your Configuration

Industry Applications

Healthcare Organizations

Financial Services

Educational Institutions

Best Practices

Choosing the Right Data Source

Optimizing Chunk Settings

Integration Selection

Performance Optimization

Troubleshooting Common Issues

Connection Problems

Content Processing Issues

Performance Problems

Output Problems