Extract text from scanned PDFs using AI-powered OCR — convert image-based PDFs to searchable text.
PDF OCR is a free online optical character recognition tool that extracts text from scanned and image-based PDF documents. Upload a PDF, select your language (13 supported including English, Urdu, Hindi, Arabic, Chinese), and the tool renders each page at high resolution and runs Tesseract.js WASM-based OCR to recognize text. Results include per-page confidence scores, and you can copy or download all extracted text. Everything runs in your browser — no files are uploaded to any server.
Drop or click to upload a PDF that contains scanned images or photos of text.
Choose the primary language of the text in your document. This improves recognition accuracy.
Click the Run OCR button. Each page is rendered and processed — progress is shown in real-time.
Extracted text is shown per page with confidence scores. Green = high accuracy, yellow = moderate, red = low.
Copy all text to clipboard or download as a .txt file. Filter by specific pages if needed.
Combine multiple PDF files into one document. Drag to reorder pages before merging. 100% browser-based.
Reduce PDF file size by optimizing images and removing metadata. See before/after compression ratio.
Split a PDF into individual pages or custom page ranges. Extract specific pages instantly.
Convert PDF pages to high-quality PNG, JPEG or WebP images. Batch export all pages at once.
Convert JPEG, PNG and WebP images to a single PDF document. Custom page size and margins.
Extract all text content from PDF files. Preserves paragraphs and formatting structure.