翻譯|行業資訊|編輯:胡濤|2024-05-20 11:29:46.270|閱讀 72 次
概述:在文檔處理應用程序中比較文檔的策略有很多。最常見的方法之一是逐字比較文檔的文本。這是一種簡單有效的文檔比較方法,但它確實有一些局限性。
# 界面/圖表報表/文檔/IDE等千款熱門軟控件火熱銷售中 >>
TX Text Control 是一款功能類似于 MS Word 的文字處理控件,包括文檔創建、編輯、打印、郵件合并、格式轉換、拆分合并、導入導出、批量生成等功能。廣泛應用于企業文檔管理,網站內容發布,電子病歷中病案模板創建、病歷書寫、修改歷史、連續打印、病案歸檔等功能的實現。
在文檔處理應用程序中比較文檔的策略有很多。最常見的方法之一是逐字比較文檔的文本。這是一種簡單有效的文檔比較方法,但它確實有一些局限性。
本質上,該比較算法按給定順序比較所有段落。在段落的基礎上,將按照分隔符提取所有句子。最后,將原始文檔中這些句子中的單詞與給定的修訂文檔進行比較。
結果在原始文檔中標記為跟蹤更改。跟蹤更改在原始文檔中突出顯示,用戶可以看到對文檔所做的更改。
該示例實現了該類DocumentComparison,該類接受兩個TXText控件。 其構造函數中的文本控件實例。您可以輕松地重寫此類以使用非 UI TXText控件。服務器文本控件實例。
DocumentComparison dc = new DocumentComparison(textControl1, textControl2);
構造函數比較兩個文檔。它循環遍歷原始文檔中的所有段落,并將文本與修訂后的文檔進行比較。如果發現差異,文本將被標記為跟蹤更改。
該ExtractSentences方法從當前段落中獲取一個字符串,并通過在典型的分隔符處拆分它來返回句子列表。
public static List<string> ExtractSentences(string input) { List<string> sentences = new List<string>(); // Use regular expression to split the input string into sentences but keep white spaces string pattern = @"([.!?])"; // split the input string into sentences with the delimiters string[] splitSentences = Regex.Split(input, pattern); // Trim each sentence and remove empty strings foreach (string sentence in splitSentences) { sentences.Add(sentence); } return sentences; }
CompareSentences 方法創建單個單詞并比較每個給定句子中單詞的位置。它返回一個元組列表,每個元組包含三個元素:單詞 from sentence1、單詞開頭的字符索引以及對應的單詞 from sentence2。最后,它返回兩個句子之間的差異列表。
private static List<(string word, int charIndex, string replacedWord)> CompareSentences(string sentence1, string sentence2) { string[] words1 = sentence1.Split(' '); string[] words2 = sentence2.Split(' '); List<(string word, int charIndex, string replacedWord)> differences = new List<(string word, int charIndex, string replacedWord)>(); // Track the character index int charIndex = 0; // Get the maximum length of the two sentences int maxLength = Math.Max(words1.Length, words2.Length); // Compare each word in the sentences for (int i = 0; i < maxLength; i++) { // Check if the current word exists in both sentences if (i < words1.Length && i < words2.Length) { // If the words are different, add the word, character index, and replaced word to the list if (words1[i] != words2[i]) { differences.Add((words1[i], charIndex, words2[i])); } } // If one of the sentences is shorter, add the extra word to the list else if (i < words1.Length) { differences.Add((words1[i], charIndex, "")); } else { differences.Add((words2[i], charIndex, "")); } // Update the character index for the next word if (i < words1.Length) charIndex += words1[i].Length + 1; // Add 1 for the space } return differences; }
DocumentComparison 類的構造函數使用上述方法來查找給定 TextControl 實例之間的差異。差異被標記為原始文檔中的跟蹤更改。
public DocumentComparison(TXTextControl.TextControl originalDocument, TextControl revisedDocument) { // Initialize document references m_originalDocument = originalDocument; m_revisedDocument = revisedDocument; // Enable track changes in the original document originalDocument.IsTrackChangesEnabled = true; // Compare paragraphs between the original and revised documents for (int p = 1; p <= m_originalDocument.Paragraphs.Count; p++) { var offsetSentences = 0; // Retrieve the original and revised paragraphs Paragraph originalParagraph = m_originalDocument.Paragraphs[p]; if (p > m_revisedDocument.Paragraphs.Count) break; // Break if the revised document has fewer paragraphs than the original document Paragraph revisedParagraph = m_revisedDocument.Paragraphs[p]; // Get the start position of the original paragraph var startParagraph = originalParagraph.Start; var uncheckedOffset = 0; // Check if the text of the original and revised paragraphs differ if (originalParagraph.Text != revisedParagraph.Text) { // Extract sentences from the original and revised paragraphs var originalSentences = ExtractSentences(originalParagraph.Text); var revisedSentences = ExtractSentences(revisedParagraph.Text); // Compare sentences and replace words in the original document for (int i = 0; i < originalSentences.Count; i++) { // Trim sentences and calculate offset var originalTrimOffset = originalSentences[i].Length - originalSentences[i].Trim().Length; var originalSentence = originalSentences[i].Trim(); var revisedSentence = revisedSentences[i].Trim(); // Track changes offset initialization int trackedChangeOffset = 0; var differences = CompareSentences(originalSentence, revisedSentence); // Check if there are any differences if (differences.Count == 0) uncheckedOffset = originalSentences[i].Length - 1; // Apply differences to the original document foreach (var difference in differences) { m_originalDocument.Selection.Start = trackedChangeOffset + startParagraph + offsetSentences + difference.charIndex + originalTrimOffset + uncheckedOffset - 1; m_originalDocument.Selection.Length = difference.word.Length; m_originalDocument.Selection.Text = difference.replacedWord; trackedChangeOffset += difference.replacedWord.Length; } // Update offset for next sentence offsetSentences += originalSentences[i].Length + trackedChangeOffset; } } } }
逐字比較文檔是文檔比較的常用方法。此示例演示如何使用 TX Text Control 實現簡單的逐字比較算法。該示例比較兩個文檔并將差異標記為跟蹤原始文檔中的更改。
歡迎下載|體驗更多TX Text Control產品
本站文章除注明轉載外,均為本站原創或翻譯。歡迎任何形式的轉載,但請務必注明出處、不得修改原文相關鏈接,如果存在內容上的異議請郵件反饋至chenjj@fc6vip.cn