翻譯|使用教程|編輯:何躍|2022-01-24 14:36:15.267|閱讀 252 次
概述:PDF文件可以被用來存儲文件、圖像和其他數據。在過去,人們問是否有一個簡單的方法從PDF文件中提取圖形,如圖表或照片。如果你想從一個PDF文件中獲取所有的圖像,或者有數百個或更多的PDF文件需要處理,那么答案就是使用LEADTOOLS。
# 界面/圖表報表/文檔/IDE等千款熱門軟控件火熱銷售中 >>
使用LEADTOOLS提取嵌入在PDF文件中的圖像很容易。下面是使用LEADTOOLS從PDF文件中提取圖像的C#、Java和PowerShell代碼樣本。
/// <summary> /// 提取PDF文檔為TIFF /// </summary> /// <param name="pdfPath"></param> private static void ExtractImagesFromPdf(string pdfPath) { var destinationPath = Path.Combine(Path.GetDirectoryName(pdfPath), @"images\"); var documentName = Path.GetFileNameWithoutExtension(pdfPath); using var pdfDocument = new PDFDocument(pdfPath); pdfDocument.ParsePages(PDFParsePagesOptions.Objects, 1, -1); foreach (var page in pdfDocument.Pages) { var embeddedImages = page.Objects.Where(o => o.ObjectType == PDFObjectType.Image).ToArray(); using var codecs = new RasterCodecs(); foreach (var imgObj in embeddedImages) { var destinationFilePath = destinationPath + documentName + "~page-" + page.PageNumber + "~" + imgObj.ImageObjectNumber + ".tif"; using var image = pdfDocument.DecodeImage(imgObj.ImageObjectNumber); codecs.Save(image, destinationFilePath, RasterImageFormat.TifLzw, image.BitsPerPixel, 1, 1, -1, CodecsSavePageMode.Append); } } }
/** * 提取PDF文件并另存為到子目錄 * e.g. getFileName("c:\\temp\\") will return "c:\\temp\\images\\" * * * @param pdfPath */ private static void extractImagesFromPdf(String pdfPath) { final String destinationFolder = getOutputFolder(pdfPath); final String documentName = getBaseName(getFileName(pdfPath)); final PDFDocument pdfDocument = new PDFDocument(pdfPath); pdfDocument.parsePages(PDFParsePagesOptions.OBJECTS.getValue(), 1, -1); final RasterCodecs codecs = new RasterCodecs(); try { final List<PDFDocumentPage> pages = pdfDocument.getPages(); for (PDFDocumentPage page : pages) { final int pageNumber = page.getPageNumber(); for (final PDFObject object : page.getObjects()) { if (object.getObjectType() == PDFObjectType.IMAGE) { final String imageObjectNumber = object.getImageObjectNumber(); final String destinationFilePath = destinationFolder + documentName + "~page-" + pageNumber + "~" + imageObjectNumber + ".tif"; final RasterImage image = pdfDocument.decodeImage(imageObjectNumber); try { codecs.save(image, destinationFilePath, RasterImageFormat.TIFLZW, image.getBitsPerPixel(), 1, 1, -1, CodecsSavePageMode.OVERWRITE); } finally { image.dispose(); } } } } } finally { codecs.dispose(); } }
function Export-LtImagesFromPdf { <# .SYNOPSIS Exports images embedded in a PDF file .DESCRIPTION Exports images embedded in a PDF file .PARAMETER PdfPath File path to the PDF file that has embedded images to be exported .PARAMETER Path Folder path to export the embedded images .EXAMPLE Export-LtImagesFromPdf -PdfPath "c:\temp\a.pdf" -Path "c:\temp\images\" .INPUTS String .OUTPUTS void .NOTES Author: LEAD Technologies, Inc. Website: //www.leadtools.com Twitter: @leadtools #> [CmdletBinding()] param( [Parameter(Mandatory)] [string]$PdfPath, [Parameter(Mandatory)] [string]$Path ) if( -not(Test-Path -Path $PdfPath -PathType Leaf) ) { Write-Error "File does not exist." return $false } if( -not(Test-Path -Path $Path -PathType Container) ) { New-Item -Path $Path -ItemType Directory } $baseFileName = (Get-Item $PdfPath).Basename $pdfDocument = New-Object -TypeName Leadtools.Pdf.PDFDocument -ArgumentList $PdfPath $pdfDocument.ParsePages(1, 1, -1) ForEach ($page in $pdfDocument.Pages){ ForEach($object in $page.Objects){ if( $object.ObjectType -eq [Leadtools.Pdf.PDFObjectType]::Image ){ $imageObjectNumber = $object.ImageObjectNumber $pageNumber = $page.PageNumber $image = $pdfDocument.DecodeImage($imageObjectNumber) $outputFilePath = (Join-Path -Path $Path -ChildPath ($baseFileName + "~page#-" + $pageNumber + "~" + $imageObjectNumber + ".tif")) Export-LTImage -RasterImage $image -Path $outputFilePath -Format ([Leadtools.RasterImageFormat]::Tif) } } } }
有了LEADTOOLS的工具包,就沒有什么是你不能做的PDF文件了。點擊這里可以下載Leadtools全套SDK。
本站文章除注明轉載外,均為本站原創或翻譯。歡迎任何形式的轉載,但請務必注明出處、不得修改原文相關鏈接,如果存在內容上的異議請郵件反饋至chenjj@fc6vip.cn