GIN
stands for Generalized Inverted Index.
GIN
is designed for handling cases where the items to be indexed are composite values, and the queries to be handled by the index need to search for element values that appear within the composite items. For example, the items could be documents, and the queries could be searches for documents containing specific words.
We use the word
item
to refer to a composite value that is to be indexed, and the word
key
to refer to an element value.
GIN
always stores and searches for keys, not item values per se.
A
GIN
index stores a set of (key, posting list) pairs, where a
posting list
is a set of row IDs in which the key occurs. The same row ID can appear in multiple posting lists, since an item can contain more than one key. Each key value is stored only once, so a
GIN
index is very compact for cases where the same key appears many times.
GIN
is generalized in the sense that the
GIN
access method code does not need to know the specific operations that it accelerates. Instead, it uses custom strategies defined for particular data types. The strategy defines how keys are extracted from indexed items and query conditions, and how to determine whether a row that contains some of the key values in a query actually satisfies the query.
One advantage of
GIN
is that it allows the development of custom data types with the appropriate access methods, by an expert in the domain of the data type, rather than a database expert. This is much the same advantage as using
GiST
.
The GIN
implementation in
PostgreSQL
is primarily maintained by Teodor Sigaev and Oleg Bartunov. There is more information about
GIN
on their
website.