原創(chuàng)|行業(yè)資訊|編輯:郝浩|2013-08-07 12:55:32.000|閱讀 2548 次
概述:如何從演示文稿中提取文本?本文以Microsoft PowerPoint PPTX演示文稿為例,為你介紹如何用Aspose.Slides控件從中提取文本。
# 界面/圖表報表/文檔/IDE等千款熱門軟控件火熱銷售中 >>
開發(fā)人員需要從演示文稿中提取文本,這并不罕見。要做到這一點(diǎn),你需要從演示文稿所有不同圖形的幻燈片中提取文本。為此,本文以Microsoft PowerPoint PPTX演示文稿為例, 為你介紹如何用Aspose.Slides控件從中提取文本。無論是從一張幻燈片中提取文本,還是從演示文稿的所有幻燈片中提取文本,Aspose.Slides使用靜態(tài)方法PresentationScanner都能幫你做到。提取的文本會自動打包在命名空間Aspose.Slides.Util下面。
Aspose.Slides for .NET提供一個叫做Aspose.Slides.Util的命名空間,它包括一個PresentationScanner類。這個類顯示了多個從一頁演示文稿或幻燈片中提取文本的重載靜態(tài)方法。 從PPTX演示幻燈片中提取文本,可以使用PresentationScanner類下面顯示的重載靜態(tài)方法GetAllTextBoxes。這個方法接收SlideEx對象作為一個參數(shù)。
執(zhí)行時,SlideEx方法掃描經(jīng)過的幻燈片上的所有文本,作為參數(shù)返回一組TextFrameEx對象。這意味著與文本相關(guān)的任何文本格式都適用。下面的一段代碼顯示在第一張幻燈片上提取文本:
//Instatiate PresentationEx class that represents a PPTX file
using(PresentationEx pptxPresentation = new PresentationEx("d:\\pptx\\testx.pptx"))
{
//Get an Array of TextFrameEx objects from the first slide
TextFrameEx[] textFramesSlideOne = SlideUtil.GetAllTextBoxes(pptxPresentation.Slides[0]);
//Loop through the Array of TextFrames
for(int i=0;i<textFramesSlideOne.Length;i++)
//Loop through paragraphs in current TextFrame
foreach( ParagraphEx para in textFramesSlideOne[i].Paragraphs )
//Loop through portions in the current Paragraph
foreach (PortionEx port in para.Portions)
{
//Display text in the current portion
Console.WriteLine(port.Text);
//Display font height of the text
Console.WriteLine(port.FontHeight);
//Display font name of the text
Console.WriteLine(port.LatinFont.FontName);
}
}
'Instatiate PresentationEx class that represents a PPTX file
Using Dim pptxPresentation As New PresentationEx("d:\pptx\testx.pptx")
'Get an Array of TextFrameEx objects from the first slide
Dim textFramesSlideOne() As TextFrameEx = SlideUtil.GetAllTextBoxes(pptxPresentation.Slides(0))
'Loop through the Array of TextFrames
For i As Integer = 0 To textFramesSlideOne.Length - 1
'Loop through paragraphs in current TextFrame
For Each para As ParagraphEx In textFramesSlideOne(i).Paragraphs
'Loop through portions in the current Paragraph
For Each port As PortionEx In para.Portions
'Display text in the current portion
Console.WriteLine(port.Text)
'Display font height of the text
Console.WriteLine(port.FontHeight)
'Display font name of the text
Console.WriteLine(port.LatinFont.FontName)
Next port
Next para
Next i
End Using
要掃描整個演示文稿的文本,可以使用 PresentationScanner類顯示的靜態(tài)方法GetAllTextFrames。它包含兩個參數(shù):
1. 一個PresentationEx對象:顯示當(dāng)前正從中提取文本的PPTX演示文稿
2. 一個布爾值:決定當(dāng)文本正從演示文稿中掃描時,主幻燈片是否包含在內(nèi)。
這種方法將返回一組TextFrameEx對象,帶有完整的文本格式信息。下面的代碼表示掃描來自于演示文稿的文本和格式信息,包括主幻燈片。
//Instatiate PresentationEx class that represents a PPTX file
using(PresentationEx pptxPresentation = new PresentationEx("d:\\pptx\\testx.pptx"))
{
//Get an Array of TextFrameEx objects from all slides in the PPTX
TextFrameEx[] textFramesPPTX = SlideUtil.GetAllTextFrames(pptxPresentation, true);
//Loop through the Array of TextFrames
for (int i = 0; i < textFramesPPTX.Length; i++)
//Loop through paragraphs in current TextFrame
foreach (ParagraphEx para in textFramesPPTX[i].Paragraphs)
//Loop through portions in the current Paragraph
foreach (PortionEx port in para.Portions)
{
//Display text in the current portion
Console.WriteLine(port.Text);
//Display font height of the text
Console.WriteLine(port.FontHeight);
//Display font name of the text
Console.WriteLine(port.LatinFont.FontName);
}
}
'Instatiate PresentationEx class that represents a PPTX file
Using Dim pptxPresentation As New PresentationEx("d:\pptx\testx.pptx")
'Get an Array of TextFrameEx objects from all slides in the PPTX
Dim textFramesPPTX() As TextFrameEx = SlideUtil.GetAllTextBoxes(pptxPresentation.Slides(0))
'Loop through the Array of TextFrames
For i As Integer = 0 To textFramesPPTX.Length - 1
'Loop through paragraphs in current TextFrame
For Each para As ParagraphEx In textFramesPPTX(i).Paragraphs
'Loop through portions in the current Paragraph
For Each port As PortionEx In para.Portions
'Display text in the current portion
Console.WriteLine(port.Text)
'Display font height of the text
Console.WriteLine(port.FontHeight)
'Display font name of the text
Console.WriteLine(port.LatinFont.FontName)
Next port
Next para
Next i
End Using
Aspose.Slides.Util.SlideUtil類顯示多個可供選擇的動態(tài)方法來掃描演示文稿或幻燈片中的文本。格式信息也連同掃描的文件被提取出來。 如果你也遇到需要從演示文稿中提取文本或類似的難題,不妨試試Aspose.Slides,相信它會帶給你不一樣的體驗和收獲。
本站文章除注明轉(zhuǎn)載外,均為本站原創(chuàng)或翻譯。歡迎任何形式的轉(zhuǎn)載,但請務(wù)必注明出處、不得修改原文相關(guān)鏈接,如果存在內(nèi)容上的異議請郵件反饋至chenjj@fc6vip.cn
文章轉(zhuǎn)載自:慧都控件網(wǎng)