| PUBLIC-INBOX-INDEX(1) | public-inbox user manual | PUBLIC-INBOX-INDEX(1) |
public-inbox-index - create and update search indices
public-inbox-index [OPTIONS] INBOX_DIR...
public-inbox-index [OPTIONS] --all
public-inbox-index creates and updates the search, overview and NNTP article number database used by the read-only public-inbox HTTP and NNTP interfaces. Currently, this requires DBD::SQLite and DBI Perl modules. Search::Xapian is optional, only to support the PSGI search interface.
Once the initial indices are created by public-inbox-index, public-inbox-mda(1) and public-inbox-watch(1) will automatically maintain them.
Running this manually to update indices is only required if relying on git-fetch(1) to mirror an existing public-inbox; or if upgrading to a new version of public-inbox using the "--reindex" option.
Having the overview and article number database is essential to running the NNTP interface, and strongly recommended for the HTTP interface as it provides thread grouping in addition to normal search functionality.
See "--jobs" in public-inbox-init(1) for a full description of sharding.
"--jobs=0" is accepted as of public-inbox 1.6.0 to disable parallel indexing regardless of the number of pre-existing shards.
If the inbox has not been indexed or initialized, "JOBS - 1" shards will be created (one job is always needed for indexing the overview and article number mapping).
Default: the number of existing Xapian shards
While option takes a negligible amount of time compared to "--reindex", it requires temporarily duplicating the entire contents of the Xapian DB.
This switch may be specified twice, in which case compaction happens both before and after indexing to minimize the temporal footprint of the (re)indexing operation.
Available since public-inbox 1.4.0.
public-inbox protects writes to various indices with flock(2), so it is safe to reindex (and rethread) while public-inbox-watch(1), public-inbox-mda(1) or public-inbox-learn(1) run.
This does not touch the NNTP article number database. It does not affect threading unless "--rethread" is used.
This fixes some bugs in older versions of public-inbox. While it is possible to use this without "--reindex", it makes little sense to do so.
Available in public-inbox 1.6.0+.
Available since public-inbox 1.2.0.
Available since public-inbox 1.5.0.
When using rotational storage but abundant RAM, using a large value (e.g. "500m") with "--sequential-shard" can significantly speed up and reduce fragmentation during the initial index and full "--reindex" invocations (but not incremental updates).
Available in public-inbox 1.6.0+.
Available in public-inbox 1.6.0+.
Available in public-inbox 1.8.0+
Available in public-inbox 1.6.0+.
See "--skip-docdata" in public-inbox-init(1) for description and caveats.
Available in public-inbox 1.6.0+.
Defaults to "all" if "[extindex "all"]" is configured, otherwise no external indices are updated.
May be specified multiple times in rare cases where multiple external indices are configured.
For v1 (ssoma) repositories described in public-inbox-v1-format(5). All public-inbox-specific files are contained within the "$GIT_DIR/public-inbox/" directory.
v2 inboxes are described in public-inbox-v2-format(5).
This is useful for avoiding memory exhaustion in mirrors via git. It does not prevent public-inbox-mda(1) or public-inbox-watch(1) from importing (and indexing) a message.
This option is only available in public-inbox 1.5 or later.
Default: none
Increase this value on powerful systems to improve throughput at the expense of memory use. The reduction of lock granularity may not be noticeable on fast systems. With SSDs, values above "4m" have little benefit.
For public-inbox-v2-format(5) inboxes, this value is multiplied by the number of Xapian shards. Thus a typical v2 inbox with 3 shards will flush every 3 megabytes by default unless parallelism is disabled via "--sequential-shard" or "--jobs=0".
This influences memory usage of Xapian, but it is not exact. The actual memory used by Xapian and Perl has been observed in excess of 10x this value.
This option is available in public-inbox 1.6 or later. public-inbox 1.5 and earlier used the current default, "1m".
Default: 1m (one megabyte)
Using a higher-than-normal number of "--jobs" with public-inbox-init(1) may be required to ensure individual shards are small enough to fit into cache.
Warning: interrupting "public-inbox-index(1)" while this option is in use may leave the search indices out-of-date with respect to SQLite databases. WWW and IMAP users may notice incomplete search results, but it is otherwise non-fatal. Using "--reindex" will bring everything back up-to-date.
Available in public-inbox 1.6.0+.
This is ignored on public-inbox-v1-format(5) inboxes.
Default: false, shards are indexed in parallel
For public-inbox 1.6 and later, use "publicinbox.indexBatchSize" instead.
Setting "XAPIAN_FLUSH_THRESHOLD" or "publicinbox.indexBatchSize" for a large "--reindex" may cause public-inbox-mda(1), public-inbox-learn(1) and public-inbox-watch(1) tasks to wait long and unpredictable periods of time during "--reindex".
Default: none, uses "publicinbox.indexBatchSize"
Occasionally, public-inbox will update it's schema version and require a full index by running this command.
Feedback welcome via plain-text mail to <mailto:meta@public-inbox.org>
The mail archives are hosted at <https://public-inbox.org/meta/> and <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
Copyright all contributors <mailto:meta@public-inbox.org>
License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
Search::Xapian, DBD::SQLite, public-inbox-extindex-format(5)
| 1993-10-02 | public-inbox.git |