Open Source optical character recognition (OCR) engine, which includes a command line program - tesseract. Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box". Tesseract supports various output formats: plain text, hOCR (HTML), PDF, TSV.