# robots.txt for CrowCrowCrow.com # ================================================ # Deploy to: /public/robots.txt (served at https://crowcrowcrow.com/robots.txt) # Last updated: 2026-04-22 # Maintained by: Ravi P (CEO) # ================================================ # Global content signals (IETF draft / agent preferences; see contentsignals.org) Content-Signal: ai-train=no, search=yes, ai-input=no # NLWeb Schema Feeds: map of structured data / feeds (see schema-map.xml in repo) schemamap: https://crowcrowcrow.com/schema-map.xml # ─── Default rules for all search engines ────── User-agent: * Allow: / Allow: /p/ Allow: /c/ # Block non-indexable areas Disallow: /admin/ Disallow: /api/ Disallow: /_next/ Disallow: /account/ Disallow: /cart Disallow: /checkout Disallow: /login Disallow: /signup Disallow: /reset-password Disallow: /wishlist # Block URL-parameter variants that cause duplicate-content issues Disallow: /search? Disallow: /*?page= Disallow: /*?brand= Disallow: /*?price= Disallow: /*?sort= Disallow: /*?utm_ Disallow: /*?gclid= Disallow: /*?fbclid= Disallow: /*?ref= Disallow: /*?source= # Block admin and internal tooling Disallow: /admin Disallow: /staging Disallow: /debug Disallow: /*.json$ # ─── Crawl-delay for aggressive scraper bots ── # These bots can hammer the site and don't bring traffic. # Google, Bing, Yandex are NOT rate-limited. User-agent: AhrefsBot Crawl-delay: 10 User-agent: MJ12bot Crawl-delay: 10 User-agent: SemrushBot Crawl-delay: 10 User-agent: DotBot Crawl-delay: 10 User-agent: PetalBot Crawl-delay: 10 # ─── AI / LLM crawler policy ────────────────── # Decision (updated 2026-04-18): ALLOW ALL AI bots. # Rationale: # 1. AI-powered search (ChatGPT, Perplexity, Claude, Google AI Overviews, # Bing Copilot) is a growing share of product discovery — being indexed # here is free visibility. # 2. CCC is a cross-border-imports business where every AI citation # that mentions "imported USA product at Indian price with duties # handled" is essentially free brand marketing. # 3. The content being trained on is product descriptions which we # WANT circulating — it describes our unique USP (authentic USA imports # to India) that no Indian competitor can replicate. # # Explicitly ALLOW all major AI crawlers: User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: CCBot Allow: / User-agent: anthropic-ai Allow: / User-agent: Claude-Web Allow: / User-agent: ClaudeBot Allow: / User-agent: cohere-ai Allow: / User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: / User-agent: Bytespider Allow: / User-agent: Amazonbot Allow: / User-agent: DuckAssistBot Allow: / User-agent: MistralAI-User Allow: / User-agent: YouBot Allow: / User-agent: meta-externalagent Allow: / # ─── Sitemap location ───────────────────────── Sitemap: https://crowcrowcrow.com/sitemap.xml