With data loss becoming more common every day, organizations have to ensure that they can thoroughly scan all content leaving the organization. Today, this means more than just text.
Most organizations frequently save or convert documents to PDFs without thinking about the implications this has on security policies. From a Data Loss Protection perspective, PDF documents could be sent around the organization and outside it, with relative impunity, as traditional DLP solutions cannot detect the sensitive information contained within these files; the text, inside the images, inside the PDF. This risk applies to all image-based file, such as screenshots or images (eg. jpg, BMP, gif, png and tiff) embedded in other files such as Microsoft Office.
Optical Character Recognition (OCR) allows us to detect and extract text from image-based files. Clearswift's Deep Content Inspection Engine can then scan the inferred text to ascertain whether it contains sensitive information.
Read our Solution Brief to find out more about how Clearswift uses OCR to extend its DLP capabilities to images and PDFs.