OCR Xpress向使用.NET以及ActiveX COM工具包的軟件開發人員提供了快速和準確的全頁面視覺特性識別(OCR)功能。用OCR Xpress能將全頁面文本識別、自動輪顯以及創建可搜索的文檔功能添加到你的應用程序。該軟件開發工具包同樣支持deskew、binarization、字符位置信息以及文檔到圖片與文本的分割。它支持輸出到多文本以及文本加圖片的格式,包括與Microsoft® Word®兼容的RTF文件以及標準的Adobe®PDF文件。
OCR Xpress delivers fast and accurate full-page optical character recognition (OCR) to software developers in .NET and ActiveX COM toolkits. Use OCR Xpress to add full-page text recognition, auto rotate, and searchable document creation to your application. This software development kit (SDK) also supports deskew, binarization, character position information, and segmentation of documents into image and text elements. It supports output to multiple text and text-plus-image formats including Microsoft® Word®-compatible RTF files and standard Adobe® PDF files.
識別13種語言的文本:英語、法國、德語、意大利語、西班牙語、葡萄牙語、丹麥語、荷蘭語、瑞典語、挪威語、匈牙利語、波蘭語以及芬蘭語。OCR Xpress為每一種語言都提供了詞典并且也支持應用程序專用的用戶自定義的詞典。
OCR Xpress中的自動輪顯功能可檢查圖片里的文本的正確方向并按照正確方向輪顯整個頁面。它也可以調整在掃描過程中變傾斜的文檔。
字符位置信息允許OCR Xpress的用戶通過使用OCR Xpress中的NotateXpress控件編校或加亮在原始圖片上的文字。用戶也可以自己創建PDF文件并使用位置信息將隱藏的文本放置到正確的位置。通過對每個字符的識別信心,OCR Xpress可聯合其它的OCR引擎進行使用,就像使用SmartZone進行投票,因此可以提高識別精確度。
OCR Xpress標記出了識別出的不確定的字符,這樣能讓開發人員在他們的程序中創建文本驗證與字符替換功能。這使用戶可以在輸出前重新檢查和修改文本。
OCR Xpress引用了高級分割功能以標記出輸入圖片的位置以及識別圖片(可保留其顏色)或包含可識別文本的區域??稍L問不同的區域以進行個別處理或自動合并具備完整格式的文檔。Binarization功能可將彩色文檔轉換為黑白文檔以在不影響非文本區域的情況下提高識別率。為非文本區域能再插入到輸出文檔里,它的色彩可被保留。
通過提供全頁面的OCR、自動輪顯以及可搜索的文本輸出功能,OCR Xpress可對Pegasus Imaging的產品功能進行補充。建議使用Pegasus Imaging的SmartZone產品對結構完整的表格(zonal OCR)上的英語文本進行區域識別。在zonal OCR應用程序中,可使用OCR Xpress對歐洲語言進行識別。
包括的控件
每一個OCR Xpress的版本都使用相同的.NET控制組件以及COM控制組件。按照版本可使用特定的不同功能。
OCR Xpress 專業版 – 包括OCR Xpress v1組件,還包括ImagXpress Document v8、NotateXpress v8、ThumbnailXpress v1、TwainPRO v4與PrintPRO v3 components
OCR Xpress 標準版 – 除了PDF輸出功能外,具備OCR Xpress專業版的所有功能。
OCR Xpress delivers fast and accurate full-page optical character recognition (OCR) to software developers in .NET and ActiveX COM toolkits. Use OCR Xpress to add full-page text recognition, auto rotate, and searchable document creation to your application. This software development kit (SDK) also supports deskew, binarization, character position information, and segmentation of documents into image and text elements. It supports output to multiple text and text-plus-image formats including Microsoft® Word®-compatible RTF files and standard Adobe® PDF files.
Recognize text in thirteen languages: English, French, German, Italian, Spanish, Portuguese, Danish, Dutch, Swedish, Norwegian, Hungarian, Polish, and Finnish. OCR Xpress provides a dictionary for each language, and also supports a user-defined dictionary for words that are application-specific.
The auto rotate feature in OCR Xpress detects the correct orientation of the text in an image, and rotates the entire page accordingly. It can also deskew documents that become skewed during the scanning process.
Character position information allows users of OCR Xpress to redact or highlight text in the original image using the included NotateXpress component. Users can also build their own PDF files, using the position information to place the hidden text in the correct location. With the help of reported recognition confidence for each character, OCR Xpress can also be used in conjunction with other OCR engines such as SmartZone to perform voting, thereby improving resulting recognition accuracy.
OCR Xpress flags characters recognized with low confidence, allowing developers to easily build text proofing and character replacement functions into their applications. This enables users to review and make corrections to text prior to output.
OCR Xpress includes advanced segmentation to locate regions of the input image and identify them as either images (whose color can be preserved) or areas containing recognizable text. The various regions can be accessed for individualized processing, or automatically recombined into fully-formatted documents. The binarization function can convert color to black and white documents to improve recognition without affecting non-text regions, which may be retained in full color for reinsertion into the output document.
OCR Xpress complements the Pegasus Imaging product line by offering full-page OCR, auto rotate, and searchable text output capabilities. Pegasus Imaging's SmartZone product is recommended for recognition of English-language text in zones on structured forms (zonal OCR). OCR Xpress can also be used for European-language recognition in zonal OCR applications.
Included Components
Both editions of OCR Xpress use the same set of .NET controls, and COM controls. Access to specific functions is determined by the edition.
- OCR Xpress Professional - Includes the OCR Xpress v1 component, plus ImagXpress Document v8, NotateXpress v8, ThumbnailXpress v1, TwainPRO v4, and PrintPRO v3 components.
- OCR Xpress Standard - All features of OCR Xpress Professional except for PDF output.