# ── Search engine crawlers ──────────────────────────────────────── User-agent: Googlebot Allow: / User-agent: Google-Extended Allow: / User-agent: Bingbot Allow: / User-agent: Slurp Allow: / User-agent: DuckDuckBot Allow: / # ── AI / LLM crawlers ───────────────────────────────────────────── # Anthropic – Claude User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / # OpenAI – ChatGPT / GPT training / SearchGPT User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / # Perplexity AI User-agent: PerplexityBot Allow: / # DeepSeek User-agent: DeepSeekBot Allow: / # Meta AI User-agent: Meta-ExternalAgent Allow: / User-agent: Meta-ExternalFetcher Allow: / # You.com User-agent: YouBot Allow: / # Cohere User-agent: cohere-ai Allow: / # Apple Intelligence User-agent: Applebot Allow: / User-agent: Applebot-Extended Allow: / # Common Crawl (used by many LLM training datasets) User-agent: CCBot Allow: / # Bytespider / Bytedance (used for AI training) User-agent: Bytespider Allow: / # ── Social / rich-preview crawlers ─────────────────────────────── User-agent: Twitterbot Allow: / User-agent: facebookexternalhit Allow: / User-agent: LinkedInBot Allow: / User-agent: WhatsApp Allow: / # ── Catch-all: allow everything not listed above ────────────────── User-agent: * Allow: / Sitemap: https://vanakkamdsa.com/sitemap.xml # llms.txt for AI and LLM crawlers # https://www.vanakkamdsa.com/llms.txt