高潮喷水视频,国产视频999,日韩成人国

欧美日韩亚-欧美日韩亚州在线-欧美日韩亚洲-欧美日韩亚洲第一区-欧美日韩亚洲二区在线-欧美日韩亚洲高清精品

文本提取器API GroupDocs.Parser for Java v19.11來襲！簡化文檔解析過程

原創|產品更新|編輯：況魚杰|2019-12-23 11:39:12.390|閱讀 637 次

概述：為了改善API的工作效率并簡化開發人員的使用，從頭開始對GroupDocs.Parser的架構進行了改進。此次更新將會提供GroupDocs.Parser for Java 19.11改進和簡化的API。

GroupDocs.Parser for Java是文本，圖像和元數據提取器API，用于構建支持解析原始，結構化和格式化文本的業務應用程序。它還允許檢索支持格式的文件元數據。

自GroupDocs.Parser for Java API進入市場，它就成為了功能強大的文檔解析器API之一，它可以解析和讀取常用格式的文字處理文檔，電子表格，演示文稿，電子書，電子郵件，標記文檔，注釋，檔案和數據庫。不僅文本，您還可以從各種文檔格式中提取圖像和元數據屬性，包括PDF，XLS，XLSX，CSV，DOC，DOCX，PPT，PPTX，MPP，EML，MSG，OST，PST，ONE等。

其中，為了改善API的工作效率并簡化開發人員的使用，從頭開始對GroupDocs.Parser的架構進行了改進。此次更新將會提供GroupDocs.Parser for Java 19.11改進和簡化的API。

本次更新增加了許多新功能，下面將會介紹本次更新的內容：

引入了Parser類以從任何受支持格式的文檔中讀取和提取數據。
所有數據類型的數據提取過程已統一。
產品架構從頭進行了修改，以簡化使用不同選項和類來處理數據的過程。
獲取文檔信息和預覽生成的過程已簡化。

遷移

由于產品已進行了重大更新，因此類，方法及其使用方式也已更改。但是，還尚未從包中刪除舊版API，而是將其移至com.groupdocs.parser.legacy包中。升級到v19.11后，您只需在項目范圍內將包從com.groupdocs.parser替換為com.groupdocs.parser.legacy。這就可以擺脫立即構建問題。然后，您可逐步進行更新源代碼，并使用新的公共API的類和方法。

  下面將會介紹GroupDocs.Parser for Java v19.11中使用新舊API提取數據的簡要比較。

文本

舊版：

// Create an extractor factory
ExtractorFactory factory = new ExtractorFactory();
// Create a text extractor
try (TextExtractor extractor = factory.createTextExtractor(filePath)) {
    // Extract a text from the text extractor
    String textLine = null;
    do {
        textLine = extractor.extractLine();
        if (textLine != null) {
            System.out.println(textLine);
        }
    }
    while (textLine != null);
}

新版：

// Create an instance of Parser class
try (Parser parser = new Parser(filePath)) {
    // Extract a text to the reader
    try (TextReader reader = parser.getText()) {
        // Check if text extraction is supported
        if (reader == null) {
            System.out.println("Text extraction isn't supported.");
            return;
        }
        // Extract a text from the reader
        String textLine = null;
        do {
            textLine = reader.readLine();
            if (textLine != null) {
                System.out.println(textLine);
            }
        }
        while (textLine != null);
    }
}

文本頁

舊版：

// Create an extractor factory
ExtractorFactory factory = new ExtractorFactory();
// Create a text extractor
try (TextExtractor extractor = factory.createTextExtractor(filePath)) {
    // Check if the extractor supports pagination
    IPageTextExtractor pte = extractor instanceof IPageTextExtractor
            ? (IPageTextExtractor) extractor
            : null;
    if (pte != null) {
        // Extract the first page
        System.out.println(pte.extractPage(0));
    }
}

新版：

// Create an instance of Parser class
try (Parser parser = new Parser(filePath)) {
    // Extract the first page text to the reader
    try (TextReader reader = parser.getText(0)) {
        // Check if text extraction is supported
        if (reader != null) {
            // Extract a text from the reader
            System.out.println(reader.readToEnd());
        }
    }
}

搜索

舊版：

// Create an extractor factory
ExtractorFactory factory = new ExtractorFactory();
// Create a text extractor
try (TextExtractor extractor = factory.createTextExtractor(filePath)) {
    // Check if the extractor supports search
    ISearchable se = extractor instanceof ISearchable
            ? (ISearchable) extractor
            : null;
    if (se != null) {
        // Create a handler
        ListSearchHandler handler = new ListSearchHandler();
        // Search "keyword" in the document
        se.search(new SearchOptions(null), handler, java.util.Arrays.asList(new String[]{"keyword"}));
        // Print search results
        for (SearchResult result : handler.getList()) {
            System.out.println(String.format("at %d: %s", result.getIndex(), result.getFoundText()));
        }
    }
}

新版：

// Create an instance of Parser class
try (Parser parser = new Parser(filePath)) {
    // Search "keyword" in the document
    Iterable list = parser.search("keyword");
    // Check if search is supported
    if (list == null) {
        System.out.println("Search isn't supported.");
        return;
    }
    // Print search results
    for (SearchResult result : list) {
        System.out.println(String.format("at %d: %s", result.getPosition(), result.getText()));
    }
}

文件類型檢測

舊版：

// Detect and print file type
System.out.println(CompositeMediaTypeDetector.DEFAULT.detect(filePath));

新版：

// Create an instance of Parser class
try (Parser parser = new Parser(filePath)) {
    // Detect and print file type
    System.out.println(parser.getDocumentInfo().getFileType());
}

元數據

舊版：

// Create an extractor factory
ExtractorFactory factory = new ExtractorFactory();
// Create a metadata extractor
MetadataExtractor extractor = factory.createMetadataExtractor(filePath);
// Extract metadata
MetadataCollection metadata = extractor.extractMetadata(filePath);
// Print metadata
for (String key : metadata.getKeys()) {
    String value = metadata.get_Item(key);
    System.out.println(String.format("%s = %s", key, value));
}

新版：

// Create an instance of Parser class
try (Parser parser = new Parser(filePath)) {
    // Extract metadata
    Iterable metadata = parser.getMetadata();
    // Check if metadata extraction is supported
    if (metadata == null) {
        System.out.println("Metadata extraction isn't supported.");
        return;
    }
    // Print metadata
    for (MetadataItem item : metadata) {
        System.out.println(String.format("%s = %s", item.getName(), item.getValue()));
    }
}

結構體

舊版：

// Create an extractor factory
ExtractorFactory factory = new ExtractorFactory();
// Create a text extractor
try (TextExtractor extractor = factory.createTextExtractor(filePath)) {
    // Check if the extractor supports text structure extraction
    IStructuredExtractor se = extractor instanceof IStructuredExtractor
            ? (IStructuredExtractor) extractor
            : null;
    if (se != null) {
        // Create a handler
        Handler handler = new Handler();
        // Extract text structure
        se.extractStructured(handler);
        // Print hyperlinks
        for (String link : handler.getLinks()) {
            System.out.println(link);
        }
    }
}
 
// Handler for the hyperlink extraction
class Handler extends StructuredHandler {
    private final java.util.List links;
    public Handler() {
        links = new java.util.ArrayList();
    }
    public java.util.List getLinks() {
        return links;
    }
    // Override the method to catch hyperlinks
    @Override
    protected void onStartHyperlink(HyperlinkProperties properties) {
        links.add(properties.getLink());
    }
}

新版：

// Create an instance of Parser class
try (Parser parser = new Parser(filePath)) {
    // Extract text structure to the XML reader
    Document document = parser.getStructure();
    // Check if text structure extraction is supported
    if (document == null) {
        System.out.println("Text structure extraction isn't supported.");
        return;
    }
    // Read XML document
    readNode(document.getDocumentElement());
}
 
void readNode(Node node) {
    NodeList nodes = node.getChildNodes();
    for (int i = 0; i < nodes.getLength(); i++) {
        Node n = nodes.item(i);
        if (n.getNodeName().toLowerCase() == "hyperlink") {
            Node a = n.getAttributes().getNamedItem("link");
            if (a != null) {
                System.out.println(a.getNodeValue());
            }
        }
        if (n.hasChildNodes()) {
            readNode(n);
        }
    }
}

在線文檔查看器GroupDocs.Viewer也已更新至v19.11，該版本修復許多小問題，感興趣的朋友可以點擊查看更新新聞。

如果您對想要購買正版授權GroupDocs.Parser，可以聯系咨詢相關問題。

關注慧聚IT微信公眾號 ???，了解產品的最新動態及最新資訊。