# this one, despite being a search engine, ignores 429 rate limit responses # at least according to wikipedia # http://mj12bot.com/ User-agent: MJ12bot Disallow: / # ai training, analytics, etc. # we don't want this User-agent: openai User-agent: Pandalytics User-agent: CensysInspect User-agent: SemrushBot User-agent: ChatGPT-User User-agent: GPTBot User-agent: Pinterestbot User-agent: Botify User-agent: Bytespider #used by tiktok User-agent: FreshBot User-agent: JamesBOT User-agent: Omgili User-agent: omgili User-agent: PerplexityBot User-agent: Youbot User-agent: YouBot User-agent: diffbot User-agent: img2dataset User-agent: magpie-crawler User-agent: anthropic-ai User-agent: Claude-Web User-agent: ClaudeBot User-agent: cohere-ai User-agent: Diffbot Disallow: / # we want to show up in search results, # since we're selling a service and customers might find us through search engines # NOTE: this doesn't necessarily mean that we like or endorse these search companies # it just means we want users to be able to use their services to find our site User-agent: Googlebot User-agent: Google-Extended User-agent: Google-InspectionTool User-agent: GoogleOther User-agent: Googlebot-Image User-agent: Googlebot-News User-agent: Googlebot-Video User-agent: bingbot User-agent: Bingbot User-agent: BingPreview User-agent: Ahrefsbot User-agent: DuckDuckBot User-agent: Amazonbot User-agent: Applebot User-agent: Applebot-Extended User-agent: Baiduspider User-agent: MicrosoftPreview User-agent: Yandex User-agent: YandexBot User-agent: CocCocBot #search engine based in Vietnam User-agent: Exabot #search engine based in France, exalead User-agent: PetalBot #petalsearch User-agent: SeznamBot #search engine based in Czech Republic User-agent: Slurp #yahoo User-agent: Sogou #search engine based in China User-agent: Yeti #search engine based in South Korea User-agent: archive.org_bot Disallow: Allow: / # there's no reason these crawlers need to update more often than once every 30 seconds Crawl-delay: 30 # search-related bots that don't directly correspond to search results User-agent: AdIdxBot #used by bing User-agent: APIs-Google User-agent: AdsBot-Google User-agent: AdsBot-Google-Mobile User-agent: Mediapartners-Google User-agent: Storebot-Google User-agent: SemrushBot User-agent: deepcrawl User-agent: Twitterbot User-agent: FacebookBot User-agent: CCBot User-agent: DotBot #used by mozilla User-agent: RogerBot #used by mozilla User-agent: ImagesiftBot #used by Hive User-agent: OnCrawl User-agent: Screaming Frog SEO Spider Disallow: Allow: / Crawl-delay: 30 # disallow anything not explicitly allowed above User-agent: * User-agent: Other Disallow: /