feat(PROJ-35): OCR & Anhang-Volltext-Indexierung
Asynchrone OCR fuer PDF- und Bild-Anhaenge via tesseract + poppler-utils. Extrahierter Text wird in Manticore (attachment_text) gespeichert und ist ueber die normale Volltextsuche auffindbar. - internal/ocr: ExtractText + Worker (queue + drain) - internal/storage/ocr.go: SetOCRStatus, OCREnabled, GetMailsByOCRStatus - emails.ocr_status (pending|done|failed|skipped|disabled) - tenants.ocr_enabled (Default TRUE, opt-out) - Manticore: attachment_text-Feld + UpdateAttachmentText - Boot-resume: pending Jobs nach Restart automatisch in die Queue - CLI: archivmail ocr-reprocess --tenant N --status pending|failed|all - update.sh: tesseract-ocr + poppler-utils optional installieren
This commit is contained in:
@@ -84,6 +84,7 @@ ALTER TABLE tenants ADD COLUMN IF NOT EXISTS retention_days INT NOT NULL DEFAULT
|
||||
ALTER TABLE tenants ADD COLUMN IF NOT EXISTS max_storage_bytes BIGINT;
|
||||
ALTER TABLE tenants ADD COLUMN IF NOT EXISTS max_users INT;
|
||||
ALTER TABLE tenants ADD COLUMN IF NOT EXISTS max_emails BIGINT;
|
||||
ALTER TABLE tenants ADD COLUMN IF NOT EXISTS ocr_enabled BOOLEAN NOT NULL DEFAULT TRUE;
|
||||
`
|
||||
|
||||
// New connects to PostgreSQL and initialises the tenant schema.
|
||||
|
||||
Reference in New Issue
Block a user