GENIE
|
GENIE is a Generic Inverted Index on GPU. It builds a database (inverted index) from high dimensional data, commonly preprocessed by either Locality Sensitive Hashing or Shotgun and Assembly schemes. GENIE provides a simple way to perform top-k similarity queries on top of such inverted index. The user may define queries as dimension and value pairs, and optionally value ranges and weights. GENIE processes all queries in parallel on GPU using a Match Count similarity model (number of dimensions with matching values in a query). For each query, top-k similar results and their corresponding counts are returned. GENIE is much faster than other CPU searching algorithms due to extensive parallelism on two levels: parallel query processing and multiple queries processed in parallel.
Please refer to the following technical report:
Generic Inverted Index on the GPU, Technical Report (TR 11/15), School of Computing, NUS.
CoRR arXiv:1603.08390 at www.comp.nus.edu.sg/~atung/publication/gpugenie.pdf
You are required to install G++, CMake, CUDA, OpenMPI and Boost. The minimum required versions are:
GENIE_DISTRIBUTED
only)GENIE_COMPR
only)To create an "out-of-source" build of GENIE containing both the GENIE library, tests and tools, you can use the standard CMake procedure:
Use target $ make test
to run GENIE tests, $ make doc
to build html code documentation, $ make install
to install GENIE.
CMake
build parameters can be further configured using the following options:
CMAKE_BUILD_TYPE:STRING
– build type, one of Release
, Debug
(default Release
)CMAKE_INSTALL_PREFIX:PATH
– cmake
's option for installation prefix (default ${CMAKE_BINARY_DIR}/install
)BOOST_ROOT:PATH
– root dir of Boost libraries (default from system paths)DOXYGEN_EXECUTABLE:PATH
– doxygen executable (default from system paths)MPI_HOME:PATH
– root dir of OpenMPI installation (default from system paths)GENIE_DISTRIBUTED:BOOL
– enable distributed GENIE module (default OFF)GENIE_COMPR:BOOL
– enable compression GENIE module (default OFF)GENIE_SIMDCAI:BOOL
– enable compilation of SIMDCAI library (default OFF)GENIE_EXAMPLES:BOOL
– enable compilation of GENIE examples (default ON)Example use of cmake
command may look like this:
The GENIE interface consists of 4 important classes. They are
genie::Config
for configuring GENIEgenie::ExecutionPolicy
for providing actual implementation of table and query building and matchinggenie::table::inv_table
for constructed tables (inverted index)genie::query::Query
for queriesThe interface also has several functions. The genie::Search()
function is the 1st-level interface function for using GENIE. It accepts file paths to table and query CSV files and returns the matching result. There are also several 2nd-level interface functions, which are used internally by the genie::Search()
function. These 2nd-level functions are meant to provide finer control of GENIE for advanced usage.
These functions are
genie::BuildTable()
for building the inverted indexgenie::BuildQuery()
for building the queriesgenie::Match()
for matchingTo use GENIE, first configured it with the genie::Config
class. According to the configurations, a corresponding execution policy will be generated using genie::MakePolicy()
. Then use the 1st or 2nd level interface functions to perform the search.
In general, everything in namespace genie
is a stable, public interface, while subnamespaces, such as genie::utility
refer to internal components of GENIE, which may change or be removed over time.
Below is an example program demonstrating the usage of GENIE.
For convenience, we have provided an executable genie-cli
(compiled to /bin directory) for interfacing with GENIE. ./genie-cli --help
shows you all the allowed options
For example, you could run ./genie-cli -k 10 -n 5 --gpu 2 -t ../static/sift_20.csv -q ../static/sift_20.csv
to match 5 queries with k set to 10 using gpu 2. Table and query are loaded from ../static/sift_20.csv
.