Running from Source¶
This guide explains how to install and run PDF Extract with OCR directly on your host system without containers.
Prerequisites¶
Before installing, ensure you have the following requirements:
- Python 3.8 or higher
- pip (Python package manager)
- Git (optional, for cloning the repository)
System Dependencies¶
Install the required system dependencies based on your operating system:
# Using winget (run as Administrator)
'tesseract-ocr.tesseract', 'SQLite.SQLite' |
% { winget install --id=$_ }
For Redis, you can either:
- Install using Redis Windows
- Use the Windows Subsystem for Linux (WSL)
- Skip Redis by using a SQLite-based task queue
Installation Steps¶
-
Clone the repository (or download the source code):
-
Create a virtual environment and activate it:
-
Install the required Python dependencies:
Configuration¶
-
Create a
.envfile by copying the example file: -
Edit the
.envfile to set your configuration options:.env# API configuration API_PORT=8080 UPLOADS_DIR=./uploads # Database configuration (choose SQLite or PostgreSQL) # For SQLite: DATABASE_URL=sqlite:///local.db # For PostgreSQL: # DATABASE_URL=postgresql://user:password@localhost:5432/ocr # Celery configuration CELERY_BROKER_URL=redis://localhost:6379/0
Running the Application¶
Option 1: Run the Flask application only¶
For simple usage where background processing isn't needed:
Option 2: Run with background processing (recommended)¶
For optimal performance with background task processing:
-
Start Redis (if not already running):
-
Start the Celery worker in a separate terminal:
-
Start the Flask application:
Accessing the Application¶
Once running, you can access:
- Web interface: http://localhost:8080 (or the port you configured)
- API endpoint: http://localhost:8080/upload
Troubleshooting¶
Common Issues¶
- Tesseract not found: Ensure Tesseract is installed and in your PATH
- Redis connection errors: Verify Redis is running (
redis-cli pingshould return PONG) - Database errors: Check your database configuration in
.env
Logs¶
Check the application logs for more detailed error information:
For more help, refer to the GitHub repository or open an issue.