|
GENIE
|
GENIE is a Generic Inverted Index on GPU. It builds a database (inverted index) from high dimensional data, commonly preprocessed by either Locality Sensitive Hashing or Shotgun and Assembly schemes. GENIE provides a simple way to perform top-k similarity queries on top of such inverted index. The user may define queries as dimension and value pairs, and optionally value ranges and weights. GENIE processes all queries in parallel on GPU using a Match Count similarity model (number of dimensions with matching values in a query). For each query, top-k similar results and their corresponding counts are returned. GENIE is much faster than other CPU searching algorithms due to extensive parallelism on two levels: parallel query processing and multiple queries processed in parallel.
Please refer to the following technical report:
Generic Inverted Index on the GPU, Technical Report (TR 11/15), School of Computing, NUS.
CoRR arXiv:1603.08390 at www.comp.nus.edu.sg/~atung/publication/gpugenie.pdf
You are required to install G++, CMake, CUDA, OpenMPI and Boost. The minimum required versions are:
GENIE_DISTRIBUTED only)GENIE_COMPR only)To create an "out-of-source" build of GENIE containing both the GENIE library, tests and tools, you can use the standard CMake procedure:
Use target $ make test to run GENIE tests, $ make doc to build html code documentation, $ make install to install GENIE.
CMake build parameters can be further configured using the following options:
CMAKE_BUILD_TYPE:STRING – build type, one of Release, Debug (default Release)CMAKE_INSTALL_PREFIX:PATH – cmake's option for installation prefix (default ${CMAKE_BINARY_DIR}/install)BOOST_ROOT:PATH – root dir of Boost libraries (default from system paths)DOXYGEN_EXECUTABLE:PATH – doxygen executable (default from system paths)MPI_HOME:PATH – root dir of OpenMPI installation (default from system paths)GENIE_DISTRIBUTED:BOOL – enable distributed GENIE module (default OFF)GENIE_COMPR:BOOL – enable compression GENIE module (default OFF)GENIE_SIMDCAI:BOOL – enable compilation of SIMDCAI library (default OFF)GENIE_EXAMPLES:BOOL – enable compilation of GENIE examples (default ON)Example use of cmake command may look like this:
The GENIE interface consists of 4 important classes. They are
genie::Config for configuring GENIEgenie::ExecutionPolicy for providing actual implementation of table and query building and matchinggenie::table::inv_table for constructed tables (inverted index)genie::query::Query for queriesThe interface also has several functions. The genie::Search() function is the 1st-level interface function for using GENIE. It accepts file paths to table and query CSV files and returns the matching result. There are also several 2nd-level interface functions, which are used internally by the genie::Search() function. These 2nd-level functions are meant to provide finer control of GENIE for advanced usage.
These functions are
genie::BuildTable() for building the inverted indexgenie::BuildQuery() for building the queriesgenie::Match() for matchingTo use GENIE, first configured it with the genie::Config class. According to the configurations, a corresponding execution policy will be generated using genie::MakePolicy(). Then use the 1st or 2nd level interface functions to perform the search.
In general, everything in namespace genie is a stable, public interface, while subnamespaces, such as genie::utility refer to internal components of GENIE, which may change or be removed over time.
Below is an example program demonstrating the usage of GENIE.
For convenience, we have provided an executable genie-cli (compiled to /bin directory) for interfacing with GENIE. ./genie-cli --help shows you all the allowed options
For example, you could run ./genie-cli -k 10 -n 5 --gpu 2 -t ../static/sift_20.csv -q ../static/sift_20.csv to match 5 queries with k set to 10 using gpu 2. Table and query are loaded from ../static/sift_20.csv.