|
|
|
|
ObjectRank Homepage (FIU Mirror)
|
The ObjectRank system applies the random
walk model, the effectiveness of which is proven by Google's PageRank, to keyword search in
databases modeled as labeled graphs. The system ranks the database objects with
respect to the user-provided keywords.
The PageRank technique assigns to each
page p a score based on the score of the pages pointing to p.
Hence, pages pointed by many high-score pages receive a high score as well.
Alternatively, the score of p is equal to the probability that a random surfer,
starting from a random page, will be at p at a specific time. For more
information on PageRank see the original PageRank paper.
We apply this idea to databases, which
require modifications of the original algorithms to exploit the semantic
information of every application, which is represented by annotating the
database schema. Another novelty of ObjectRank is that we perform
keyword-specific ObjectRank and not global as Google. Hence, for each
<keyword, object> we compute an ObjectRank value. In particular, random
walks start from the objects containing the keywords. Each object is ranked
with respect to the particular keywords, based on the stationary probability
that random walks are found at the object at a given time.
ObjectRank is adjustable to the semantics
of each database and the varying requirements of object ranking. In particular,
we adjust the weight of global importance, the weight of each keyword of the
query, the importance of a result actually containing the keywords versus being
referenced by nodes with high ObjectRank, and the volume of authority flow via
each type of semantic connection.
Novel performance challenges and opportunities are addressed. First, schemas
impose constraints on the graph, which are exploited for performance purposes.
Second, we employ aggressive precomputation and experiment on storage space
versus execution time tradeoffs. Finally, the keyword specific ObjectRanks are efficiently combined in order to produce
the top-k objects.
Example: The
following figure illustrates a small subset of the DBLP database in the form of
a labeled graph (author, conference and year nodes except for ``R. Agrawal'', ``ICDE'' and ``ICDE 1997'' respectively are
omitted to simplify the figure). The weights on edges
represent the authority flow through them. For example, more authority
is transferred from a paper to a cited paper than from a paper to its author.
Also notice that no authority is transferred from the cited papers to the
citing.

Given a keyword query, e.g. the single
keyword query "OLAP", ObjectRank sorts the database objects by their
importance with respect to the user-provided keywords. The papers of the above
figure are ranked as folows:
|
Data Cube |
|
Modeling Multidimensional Databases |
|
Range Queries in OLAP |
|
Index Selection for OLAP |
Notice that the "Data
Cube" and the "Modeling Multidimensional Databases" papers do
not contain the keyword "OLAP", but they clearly constitute important
papers in the OLAP area, since they are referenced by other papers of the OLAP
area, or have been written by authors who have written other important
"OLAP" papers, or appear in conferences important to
"OLAP".
|
|
Yannis Papakonstantinou, Associate
Professor, UCSD |
|
|
Vagelis Hristidis, Assistant Professor, FIU |
|
|
Andrey Balmin, IBM Research |
|
|
Heasoo Hwang, PhD student, UCSD |
|
|
Ramakrishna Varadarajan, PhD
student, FIU |
1.
Heasoo Hwang, Vagelis Hristidis, Yannis
Papakonstantinou. ObjectRank:
A System for Authority-based Search on Databases. Demo paper, ACM
SIGMOD 2006
2. Andrey Balmin, Vagelis Hristidis,
Yannis Papakonstantinou:
Authority-Based
Keyword Queries in Databases using ObjectRank. VLDB 2004
3. Vagelis Hristidis, Heasoo Hwang, Yannis Papakonstantinou. Authority-Based Keyword Search in Databases. ACM Trans. Database Syst
(TODS) 2008
4. Ramakrishna Varadarajan, Vagelis
Hristidis, Louiqa Raschid. Explaining and Reformulating Authority Flow Queries. IEEE ICDE 2008
|
|
Random Walk in Stuctured Databases, at UCSD Research Seminar, 2003 |
|
|
Explaining and
Reformulating Authority Flow Queries, at IEEE ICDE, 2008 |
The DBLP subset of the demo consists of
the available publications in 12 major database conferences, including SIGMOD, VLDB,
PODS, ICDE, ICDT and EDBT. The schema of the database is the following.
For AND (OR) semantics, the score of a node u is the product
(sum) of the keyword-specific ObjectRanks of u with
respect to each of the user-specified keywords. The demo also provides two
other adjusting parameters: The "containment of actual keywords"
parameter determines how important it is for a result to contain the
user-specified keywords, as opposed to being pointed by other nodes with high
score. The "global ObjectRank" parameter specifies if the
global (keyword-independent) ObjectRank is included in the calculation of the
scores.
On the demo page, we describe some typical
search profiles for various selections of paramaters.
Inquiries
for commercial use of this software should be directed to invent@ucsd.edu.