From bcccb7d79eeccac9e6655f3923974c0ef38281aa Mon Sep 17 00:00:00 2001 From: Earl Warren Date: Fri, 23 Jun 2023 16:32:03 +0200 Subject: [PATCH] docs: document robots.txt # Conflicts: # admin/search-engines-indexation.md --- admin/index.md | 1 + admin/search-engines-indexation.md | 43 ++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) create mode 100644 admin/search-engines-indexation.md diff --git a/admin/index.md b/admin/index.md index d7cc502a..625185a2 100644 --- a/admin/index.md +++ b/admin/index.md @@ -16,3 +16,4 @@ These documents are targeted to people who run Forgejo on their machines. - [Incoming Email](incoming-email) - [Logging Configuration](logging-documentation) - [Actions](actions) +- [Search Engines and robots.txt](search-engines-indexation) diff --git a/admin/search-engines-indexation.md b/admin/search-engines-indexation.md new file mode 100644 index 00000000..8f07f61a --- /dev/null +++ b/admin/search-engines-indexation.md @@ -0,0 +1,43 @@ +--- +layout: '~/layouts/Markdown.astro' +title: 'Search Engines Indexation' +license: 'Apache-2.0' +origin_url: 'https://github.com/go-gitea/gitea/blob/62ac3251fa545d32bdfc9ff824106b97ec63edbb/docs/content/doc/administration/search-engines-indexation.en-us.md' +--- + +# Search engines indexation of your Forgejo installation + +By default your Forgejo installation will be indexed by search engines. +If you don't want your repository to be visible for search engines read further. + +## Block search engines indexation using robots.txt + +To make Forgejo serve a custom `robots.txt` (default: empty 404) for top level installations, +create a file called `robots.txt` at the root of the `CustomPath` as displayed in the `/admin` page. + +Examples on how to configure the `robots.txt` can be found at [https://moz.com/learn/seo/robotstxt](https://moz.com/learn/seo/robotstxt). + +```txt +User-agent: * +Disallow: / +``` + +If you installed Forgejo in a subdirectory, you will need to create or edit the `robots.txt` in the top level directory. + +```txt +User-agent: * +Disallow: /forgejo/ +``` + +## Disallow crawling archives to save disk space + +If the archive files are crawled, they will be generated dynamically +and kept around which can amount to a lot of disk. To prevent that +from happening, add the following to the `robots.txt` file: + +```txt +User-agent: * +Disallow: /*/*/archive/ +``` + +See also a more complete example [at Codeberg](https://codeberg.org/robots.txt).