English  |  正體中文  |  简体中文  |  Items with full text/Total items : 2737/2828
Visitors : 344458      Online Users : 42
RC Version 4.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Adv. Search
LoginUploadHelpAboutAdminister

Please use this identifier to cite or link to this item: http://ir.lib.stu.edu.tw:80/ir/handle/310903100/1271

Title: 搜尋引擎與資訊索引中文斷詞方法
Chinese phrase segmentation method of Information Retrieval and Search Engine
Authors: 焉德葳
Dewei Yen
Contributors: Chao-Kuei Hung;Yu-Chang Chen
資訊工程學系
Keywords: 搜尋引擎;資訊索引;斷詞;N-gram;Ozearch
Search engine;Information retrieval;phrase segmentation;N-gram;Ozearch
Date: 2008
Issue Date: 2011-05-24 15:12:04 (UTC+8)
Publisher: 高雄市:[樹德科技大學資訊工程學系]
Abstract: 搜尋引擎對大多數的人而言,是一項熟悉又陌生的技術。熟悉的部份是人們在網路的活動中不斷的使用它。而很多人都聽過其中知名的技術,甚至研究並改良它。但事實上,很少有人瞭解該如何建立一個完整的搜尋引擎。本論文試著將一個搜尋引擎裡如同萬花筒一般的技術跟理論,透過簡單易懂的圖形跟範例進行說明。並以本研究中實際建立的開放原始碼搜尋引擎Ozearch做為例子,將完整的實做列在各個部份。

在本研究中採用了為中文索引所量身打造的特殊斷詞方法,這是一種基於N-gram與詞彙法的連結方法。而這個N-gram 與詞彙合併法主要的概念是採用了兩方的優點,將N-gram與詞彙法的斷詞連結。以使得搜尋引擎在保持良好準確率(Precision)與召回率(Recall)的情況下,有效的降低頁面所使用的索引鍵數量。並於文中列舉出目前商業搜尋引擎斷詞方法所產生的缺失,同時提出了可行的改良的方法。
For most people, the techniques of search engine are both familiar and strange. It is familiar because people keep using it in the network activity. The well-known technology of search engine lets many people research to improve it. But only few people knew how to establish a search engine. This paper tries to explain the technology of search engine by graphs and examples. These researches present the details of each part by actually creating an open source search engine “Ozearch” as example.

This paper also presents an algorithm for segmenting Chinese phrases. It utilizes both the N-gram algorithm and the word-based algorithm to improve precision and recall of the search engine. In this paper, we also find few defect of segmenting Chinese phrases for now and presents workable method to improve it.
Appears in Collections:[資訊工程系(所) ] 博碩士論文

Files in This Item:

File Description SizeFormat
搜尋引擎與資訊索引中文斷詞方法.pdf2070KbAdobe PDF1841View/Open
搜尋引擎與資訊索引中文斷詞方法__臺灣博碩士論文知識加值系統.htm國圖109KbHTML595View/Open


All items in STUAIR are protected by copyright, with all rights reserved.

 


無標題文件

著作權政策宣告:

1.

本網站之數位內容為樹德科技大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
 
2. 本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本校護人員(clairhsu@stu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
 
DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback