> ## Documentation Index
> Fetch the complete documentation index at: https://www.ayrshare.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Meta Media Crawler Blocked (Instagram / Threads)

> How to fix error code 440 (and related Instagram 138 / Threads 379) caused by robots.txt or bot rules blocking Meta's media crawler.

Error code **440** (and the related Instagram code **138** / Threads code **379**) is returned when Meta's publishing crawler cannot download your media URL — most commonly because `robots.txt` or a bot-blocking rule on your server is denying the crawler.

<Note>
  This page covers failures whose error message or details mention that the social network could not download the media, typically referencing `facebookexternalhit`, `robots.txt`, `"Restricted by robots.txt"`, `"HTTP error code 403"`, or Meta error `2207052`. For aspect-ratio or format errors on Instagram code 138, see [Instagram Media Guidelines](/media-guidelines/instagram) or [Threads Media Guidelines](/media-guidelines/threads) instead.
</Note>

## Symptom

When the crawler is blocked, you'll see errors like these:

```json Error 440 (primary) theme={"system"}
{
  "status": "error",
  "errors": [{
    "action": "post",
    "code": 440,
    "message": "The social network could not download media from this URL (for example Instagram/Meta error 2207052). Ensure the file is publicly reachable by the platform's crawlers (e.g. facebookexternalhit), via media bucket's robots.txt file, not only in a browser.",
    "details": "Media download has failed.: The media could not be fetched from the provided URI...",
    "platform": "instagram",
    "status": "error"
  }],
  "postIds": [],
  "id": "..."
}
```

```json Instagram Error 138 (fallback — less specific upstream response) theme={"system"}
{
  "status": "error",
  "errors": [{
    "retryAvailable": true,
    "status": "error",
    "code": 138,
    "details": "Media download has failed.: The media could not be fetched from the provided URI. Video download failed with: HTTP error code 403. Restricted by robots.txt",
    "action": "post",
    "platform": "instagram",
    "message": "Instagram Error: Instagram cannot process your post at this time. Please try your post again."
  }],
  "postIds": [],
  "id": "..."
}
```

```json Threads Error 379 theme={"system"}
{
  "status": "error",
  "errors": [{
    "status": "error",
    "code": 379,
    "message": "Error posting to Threads.",
    "action": "post",
    "platform": "threads"
  }],
  "postIds": [],
  "id": "..."
}
```

<Note>
  Code **440** is the dedicated Ayrshare code for this failure and its message explicitly names `facebookexternalhit` and `robots.txt` — if you see 440, you're on the right page. Code **138** is emitted for the same root cause when the upstream response is less specific; 138 is also used for aspect-ratio / format issues, so the media-fetch variant is identifiable by `"Restricted by robots.txt"` or `"HTTP error code 403"` in `details`. Code **379** does not include a `details` field — if Threads fails alongside an Instagram 440 or 138, the root cause is typically the same.
</Note>

## Why This Happens

When you publish to Instagram or Threads via Ayrshare, Meta's servers fetch your media from the URL you provide. This server-side fetch uses the `facebookexternalhit` User-Agent. If your server's `robots.txt` disallows this crawler — or a WAF/bot-protection rule blocks it — Meta cannot download the file and the publish fails.

Facebook Page publishing uses a different ingestion path, which is why the same `mediaUrl` may work for Facebook but fail for Instagram and Threads.

## Fix: Update Your robots.txt

### Recommended: Allow Meta explicitly, keep others open

Add these rules to your `robots.txt` file:

```txt robots.txt theme={"system"}
User-agent: facebookexternalhit
Allow: /

User-agent: *
Allow: /
```

This explicitly allows Meta's crawler while keeping your site open to other crawlers (Google, Bing, etc.).

### Advanced: Lock down to social publishers only

If you want to block most crawlers but allow social media platforms:

```txt robots.txt theme={"system"}
User-agent: facebookexternalhit
Allow: /

User-agent: Twitterbot
Allow: /

User-agent: LinkedInBot
Allow: /

User-agent: Pinterest
Allow: /

User-agent: *
Disallow: /
```

<Warning>
  Use a single `User-agent: *` block, placed at the end of the file. RFC 9309-compliant crawlers merge multiple wildcard groups into one, but not every parser in the wild is RFC-compliant — duplicate wildcard groups are a common source of rules being dropped or applied inconsistently.
</Warning>

## Verify Meta Can Fetch Your URL

Before retrying your post, verify that Meta's crawler can now access your media. Run this command, replacing `$URL` with your full media URL:

```bash theme={"system"}
curl -v --compressed -H "Range: bytes=0-524288" -H "Connection: close" \
  -A "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)" \
  "$URL"
```

<ul class="custom-bullets">
  <li>**Healthy response:** HTTP 200 or 206 with binary data in the body.</li>
  <li>**Blocked response:** HTTP 403 or an empty/HTML error page.</li>
</ul>

<Note>
  Per Meta's documentation, `robots.txt` changes may take up to 24 hours to propagate through Meta's crawler cache. If verification succeeds but your post still fails, wait and retry later.
</Note>

## If This Doesn't Fix It

If you've updated `robots.txt` and verified with the `curl` command but still see failures:

<ul class="custom-bullets">
  <li>**24-hour propagation delay** — Meta caches `robots.txt`. Wait up to 24 hours after making changes before retrying.</li>
  <li>**WAF or bot-fight rules** — Cloudflare Bot Fight Mode, AWS WAF managed bot rule groups, and similar services may block Meta's crawler IP ranges even if `robots.txt` allows it. Check your WAF logs and add an exception for `facebookexternalhit`.</li>
  <li>**Hotlink protection / Referer checks** — Some CDNs block requests from data-center IPs or without a valid `Referer` header. Whitelist Meta's crawler or disable hotlink protection for media paths.</li>
  <li>**Signed-URL / presigned-URL expiry** — If your media URL has an expiration timestamp (common with S3 presigned URLs), ensure it doesn't expire before Meta's crawler can fetch it. For scheduled posts, generate URLs that remain valid until well after the scheduled time.</li>
  <li>**Managed media hosting** — If you use a service like Cloudinary, Imgix, or similar where you cannot edit `robots.txt`, check their documentation for a Meta/Facebook crawler allow-list setting.</li>
</ul>

If none of these resolve the issue, [contact Ayrshare support](https://www.ayrshare.com/contact) and include:

* The failing `postId` from the error response
* The output of the `curl` verification command above
* Your `robots.txt` contents

## Retry a Failed Post

Once you've resolved the crawler access issue, retry your failed post using the [Retry Post endpoint](/apis/post/retry-post).

## See Also

<ul class="custom-bullets">
  <li>[Instagram API](/apis/post/social-networks/instagram)</li>
  <li>[Threads API](/apis/post/social-networks/threads)</li>
  <li>[Instagram Media Guidelines](/media-guidelines/instagram)</li>
  <li>[Threads Media Guidelines](/media-guidelines/threads)</li>
  <li>[Ayrshare Error Codes](/errors/errors-ayrshare#media-fetch--crawler-access-errors)</li>
  <li>[Video Publishing Fails](/help-center/technical-support/video_publishing_fails)</li>
</ul>
