PDFlib pCOS provides a simple and elegant facility for retrieving any information from a PDF document which is not part of the page contents. For example, PDF metadata, interactive elements (links etc.), or page dimensions can easily be queried with pCOS.
用pCOS您可以提取各種您感興趣的內容并為不同用途創建輸出。通過用一個函數調用處理多個PDF文檔,您可以輕松地為文檔信息項,頁面格式,字體,或其他任何屬性創建摘要信息。結合表格形式的輸出,該產品提供了一個功能強大的PDF管理工具。
這里有很多提供給PDF專業人員常用的pCOS應用程序,但您還可以使用PDFlib pCOS作為一個工具用于學習或調試PDF。這里有一些典型的應用方案:
- 以預定義的條件檢查新進入的文檔。
- 檢查PDF文件的安全問題和動態內容(Java腳本等)
- 在出版之前檢查文檔的質量保證
- 在一個大型的文件集中確認問題文件
- 為文檔管理創建屬性摘要
- 了解PDF數據結構的詳細信息
PDFlib pCOS特性:
支持的輸入
PDFlib pCOS支持所有的PDF相關的風格輸入:
- 所有的PDF版本直到1.7(Acrobat 8)
- RC4和AES加密(可能需要密碼)
- 先進的安全模式:即使您不知道密碼,在您不違反文件相關規定的前提下,您可以查詢部分特定信息。
- 作者的意圖
- 如果可能,損壞的PDF輸入文檔將會被修復
信息檢索
PDFlib pCOS提過一個簡單的查詢接口,而不需要低級的解析器編程。使用PDFlib pCOS您可以提取各種您所感興趣的項,如:
- 文檔信息項和XMP元數據
- 基礎信息:線性化和加標簽的PDF狀態,加密詳細信息和許可設置,頁和字體數量。
- 所有字體的名字,嵌入狀態等。
- 圖片的大小,位深度,色彩空間,壓縮等。
- 所有的PDF色差的色彩空間的詳細信息
- 目標URL和網頁鏈接的目標
- 所有的書簽與相應的頁碼同時存在,例如,用于創建一個目錄
- 表格字段數據:完整的字段名字,內容,位置等。
- 頁面大小,CropBox,頁面旋轉
- PDF/X和PDF/A兼容文件的狀態信息
- 列出和提取文件附件信息
- 層名,頁面標簽,文章思路
- 注釋的詳細信息
- 列出所有評論者的評論
- 數字簽名詳細信息:簽名字段(姓名),有符號/無符號,簽名者姓名,簽名理由和日期。
- 從PDF/X或PDF/A文件中提取ICC輸出意向概要。
- 列出PDFlib塊屬性
- 在文檔,頁面,注釋或字段級中包含Java腳本
輸出格式
PDFlib pCOS能為不同的用途創建輸出:
- 純文本輸出
- 為處理具有電子表格/數據庫表的表格式的輸出
- 重用二進制數據,例如,ICC概要或文件附件
- 以UTF-8或UTF-16格式的統一編碼文本輸出
- 為自定義后加工處理提供用戶自定義輸出格式
pCOS路徑-為PDF對象提供的簡單的語法
通過復雜的樹形結構使其免于陷入困境,例如,對于書簽或表格字段,您可以通過使用這個簡單的pCOS路徑語法很容易地訪問PDF對象。它可以提供便捷的方法用于訪問普通的PDF對象,例如頁面,字體,書簽,表格字段等。
pCOS 庫或命令行工具?
pCOS是一個用于多種開發環境的程序庫(組件),并可以為批處理操作用作命令行工具使用。兩者都提供類似的功能,但它是適合于不同部署任務。
pCOS程序庫已被使用
為了集成到桌面或服務器應用中。在pCOS包中包含了使用所有支持的語言綁定庫的樣例。在PDFlib網站有一個額外的例子在pCOS Cookbook中是可用的。
pCOS命令行工具,適合
為了批處理PDF文檔。它不需要任何的編程,但能提供功能強大的命令行選項,其能被用于集成到復雜的工作流程中。pCOS命令行工具擴展了該庫的功能。普通PDF元素的簡單檢索,如書簽,注釋,元數據,表格字段等。
- 為查詢更復雜的對象和自定輸出格式的擴展模式
- 提取數據項,如文件附件,ICC色彩描述文檔等
- 為導入到一個電子表格或數據庫釋放一些信息如逗號分隔符或用戶自定義格式。
- 遞歸式的功能支持銷毀一些混合的PDF對象,如字典和數組
支持的開發環境
PDFlib pCOS無處不在-它幾乎能在所有的計算平臺上運行。我們提供Windows, Mac OS, Linux和Unix的所有的普通風格的變體。
pCOS的內核是用高度優化的C語言代碼編寫,以追求其最高的性能和小的開銷。通過簡單的API(應用編程接口),pCOS的功能就能夠運行在各種各樣的開發環境中。
- 具有用于VB,ASP和許多其他語言的COM組件
- C和C++語言
- Java,包括servlet和Java應用服務器
- 具有用于C#,VB.NET, ASP.NET等的NET環境
- Perl語言
- PHP語言
With pCOS you can extract a variety of interesting items and create output for different purposes. By processing multiple PDF documents with a single call you can easily create summaries of document info entries, page formats, fonts, or any other property. Combined with tabular output this provides a powerful PDF administration tool.
There are many every-day pCOS applications for PDF practitioners, but you can also use PDFlib pCOS as a tool for learning or debugging PDF. Here are some typical scenarios:
- Check incoming documents for predefined criteria
- Check PDFs for security problems and active content (Java-Script etc.)
- Check documents for quality assurance before publication
- Identify problem files in a large collection
- Create property summaries for document management
- Learn details of PDF data structures
PDFlib pCOS Features
Supported Input
PDFlib pCOS supports all relevant flavors of PDF input:
- All PDF versions up to PDF 1.7 (Acrobat 8)
- RC4 and AES encryption (password may be required)
- Sophisticated security model: even if you don’t know the password, you can query certain pieces of information as long as this doesn’t violate the document
- author’s intentions
- Damaged PDF input documents will be repaired if possible
Information Retrieval
PDFlib pCOS offers a simple query interface, without the need for low-level parser programming. With PDFlib pCOS you can extract a variety of interesting items, such as:
- Document info entries and XMP metadata
- General information: linearization and tagged PDF status, encryption details and permission settings, number of pages and fonts
- All fonts with their name, embedding status, etc.
- Images with size, bit depth, color space, compression, etc.
- Color space details for all PDF color variations
- Target URLs and coordinates of Web links
- All bookmarks along with the corresponding page numbers, e.g. to create a table of contents
- Form field data: full field names, contents, position, etc.
- Page size, CropBox, page rotation
- Status of PDF/X and PDF/A compliant files
- List or extract file attachments
- Layer names, page labels, article threads
- Annotation details
- List all comments along with the reviewer’s name
- Digital signature details: name of signature field(s), signed/unsigned, name of signer, date and reason of signature
- Extract ICC output intent profiles from PDF/X or PDF/A files
- List PDFlib block properties
- JavaScript on document, page, annotation, or field level
Output Formats
PDFlib pCOS can create output for different purposes:
- Plain text output
- Tabular output for processing with a spreadsheet/database
- Binary data for reuse, e.g. ICC profiles or file attachments
- Unicode text output in UTF-8 or UTF-16 formats
- User-defined output formats for custom post-processing
pCOS Paths – Simple Syntax for PDF Objects
Instead of getting bogged down by complex tree structures, e.g. for bookmarks or form fields, you can easily access PDF objects by using the simple pCOS path syntax. It offers convenient shortcuts for accessing commonly used PDF objects, such as pages, fonts, bookmarks, form fields etc.
pCOS Library or Command-Line Tool?
pCOS is available as a programming library (component) for various development environments, and as a command-line tool for batch operations. Both offer similar features, but are suitable for different deployment tasks.
The pCOS programming library is used...
...for integration into desktop or server applications. Examples for using the library with all supported language bindings are included in the pCOS package. A variety of additional examples is available in the pCOS Cookbook on the PDFlib Web site.
The pCOS command-line tool is suited...
...for batch processing PDF documents. It doesn’t require any programming, but offers powerful command-line options which can be used to integrate it into complex workflows. The pCOS command-line tool extends the features of the library:
- Simple retrieval of common PDF elements, such as bookmarks, annotations, metadata, form fields, etc.
- Extended mode for querying more complex objects and customizing the output format
- Extract data items, such as file attachments, ICC profiles, etc.
- Emit information as comma-separated values or a userdefined format for import into a spreadsheet or database
- Recursion feature for dumping composite PDF objects, such as dictionaries and arrays
Supported Development Environments
PDFlib pCOS is everywhere – it runs on practically all computing platforms. We offer variants for all common flavors of Windows, Mac OS, Linux and Unix.
The pCOS core is written in highly optimized C code for maximum performance and small overhead. Via a simple API (Application Programming Interface) the pCOS functionality is accessible from a variety of development environments:
- COM for use with VB, ASP, and many other languages
- C and C++
- Java, including servlets and Java Application Server
- .NET for use with C#, VB.NET, ASP.NET, etc.
- Perl
- PHP