Skip to main content

Document Text Extraction

tagd-ai automatically extracts text from images, PDFs, and documents, making all your content searchable.

Automatic Extraction

When you upload files to a tag, text extraction happens automatically:

Supported File Types

TypeFormatsExtraction Method
ImagesPNG, JPEG, WebP, GIF, BMP, TIFFOCR (Optical Character Recognition)
PDFsPDFPDF text extraction + OCR for scanned pages
DocumentsDOCX, DOCDocument parsing
SpreadsheetsXLSX, XLSCell content extraction

What Gets Extracted

  • Images: Any visible text in photos, screenshots, or graphics
  • PDFs: All text content, including scanned documents
  • Documents: Full document text and formatting
  • Spreadsheets: Cell values and structure

How OCR Works

For images and scanned PDFs:

  1. Upload - Add image or scanned PDF to tag
  2. Analysis - AI identifies text regions
  3. Recognition - Characters are converted to text
  4. Indexing - Text becomes searchable

OCR Accuracy

Best results with:

  • Clear, legible text
  • Good image quality
  • Standard fonts
  • Adequate contrast

Works well for:

  • Printed documents
  • Screenshots
  • Photos of text
  • Handwritten text (limited)

Using Extracted Text

Search across document content:

"Find invoices with amount over $500"
"Documents mentioning product warranty"
"PDFs from 2024"

AI Chat

Ask questions about document content:

"What are the key terms in this contract?"
"Summarize this PDF"
"What dates are mentioned in this document?"

View Extracted Text

  1. Open tag with document
  2. Click on the file block
  3. Click View Extracted Text
  4. See the full extracted content

Use Cases

Receipts and Invoices

Upload photos of receipts:

  • Amount is extracted
  • Vendor name captured
  • Date recognized
  • Searchable records

Business Cards

Photograph business cards:

  • Contact name extracted
  • Phone/email captured
  • Company identified
  • Easy to find later

Whiteboard Photos

Capture meeting whiteboards:

  • Text becomes searchable
  • Ideas preserved
  • Notes accessible

Upload contracts and agreements:

  • Full text searchable
  • Find specific clauses
  • AI answers questions

Product Labels

Photograph product information:

  • Specifications extracted
  • Model numbers captured
  • Ingredients readable

Handwritten Notes

Photograph handwritten pages:

  • Text recognized (when legible)
  • Notes become searchable
  • Works best with clear writing

Best Practices

For Better Extraction

Image quality:

  • Good lighting
  • Clear focus
  • Minimal glare
  • Adequate resolution (300+ DPI for print)

Document tips:

  • Standard fonts work best
  • High contrast (black on white)
  • Avoid decorative fonts
  • Clean, undamaged pages

Organizing Documents

  • Name files descriptively
  • Group related documents
  • Use folders for categories
  • Add context in tag title

Multi-Language Support

OCR supports text in:

  • English
  • Spanish
  • French
  • German
  • Italian
  • Portuguese
  • Chinese (Simplified & Traditional)
  • Japanese
  • Korean
  • Arabic
  • Hebrew
  • Russian
  • And 50+ more languages

Mixed-language documents are handled automatically.

Privacy & Security

Data Processing

  • Documents processed securely
  • Extracted text stored encrypted
  • Only accessible to you
  • Deleted with the file

No Data Training

  • Your documents are never used for AI training
  • Content remains private
  • Enterprise-grade security

Troubleshooting

Text Not Extracted

  1. Check file format is supported
  2. Verify image quality
  3. Ensure text is visible/legible
  4. Try higher resolution

Poor Accuracy

  1. Improve image lighting
  2. Increase resolution
  3. Crop to relevant area
  4. Avoid blurry images

Processing Failed

  1. Check file isn't corrupted
  2. Verify file size is reasonable
  3. Try re-uploading
  4. Contact support if persistent

Slow Processing

Large documents may take longer:

  • Multi-page PDFs: 1-2 minutes
  • High-resolution images: 30 seconds
  • Standard documents: Under 10 seconds

Plan Limits

PlanPages/Month
Free50 pages
Pro500 pages
EnterpriseUnlimited

Each image counts as 1 page. PDFs count by actual page count.

Next Steps