![]() |
Web Search Project |
| Objective | People | Demo |
Keyword search is the most popular information discovery method because the user does not need to know either a query language or the underlying structure of the data. The search engines available today provide keyword search on top of sets of documents. We need high-quality searching techniques to produce relevant results for user queries. Also, the amount of data available over the internet has increased tremendously in the recent years and we need effective summarization techniques to provide users with good summaries to make their task of searching relevant material easier.
People
|
|
Vagelis Hristidis, Assistant Professor, School of Computing & Information Sciences , Florida International University. |
|
|
Tao Li, Assistant Professor, School of Computing & Information Sciences , Florida International University. |
|
|
Ramakrishna Varadarajan, PhD student, School of Computing & Information Sciences , Florida International University. |
Projects
1. Searching the Web Using Composed Pages
Given a user keyword query, current Web search engines return a list of pages ranked by their “goodness” with respect to the query. However, it is often the case that no single page contains all query keywords. We propose a technique that given a keyword query, on-the-fly generates new pages, called composed pages, which contain all query keywords. The composed pages are generated by stitching together appropriate pieces from hyperlinked Web pages, and retain links to the original Web pages. To rank the composed pages we consider both the hyperlink structure of the original pages, as well as the associations between the keywords within each page. Furthermore, we present and experimentally evaluate heuristic algorithms to efficiently generate the top composed pages.
PublicationsTo evaluate the performance of our approach we used a document set of approximately 25,108 Web pages from www.fiu.edu. We used a PC with Pentium 4, 2.44GHz processor and 256MB of RAM running Windows XP. The algorithms were implemented in Java. To build the full-text index we used Oracle interMedia and stored the documents in an Oracle database. JDBC was used to connect to the database system.
2. Document Summarization
Keyword search is the most popular information discovery method because the user does not need to know either a query language or the underlying structure of the data. The search engines available today provide keyword search on top of sets of documents. As the number of documents available on users’ desktops and the Internet increases, so does the need to provide high-quality summaries in order to allow the user to quickly locate the desired information. Most of the previous summarization techniques are query- independent and also are not optimal as they ignore the inherent structure in document. We present a method to create query-specific summaries by adding structure to documents by extracting associations between their fragments. Our objective is to develop a technique that produces high-quality query-dependent summaries for documents.
PublicationsThe Summarization demo is operating on a large set of news articles from www.cnn.com in the area of science and technology.The user inputs a set of keywords. A thread is generated for every query entered by the user. These threads execute in parallel and output results as they come. The threads of smaller queries execute faster since they correspond to simpler queries. Hence the smaller results, which are intuitively the most relevant, are output first. The results displayed are a set of documents that have all keywords in the query entered and are displayed along with their corresponding summary and rank and also with a hyperlink that points to the corresponding cnn document in www.cnn.com