Automatically Generate Image Caption Text Using Google Cloud Vision API with JavaScript and Node.js

Automating the task of image captioning is invaluable for a host of applications, from enhancing accessibility and SEO to streamlining content management. Modern AI technologies have made it much easier and more accurate to generate captions for images automatically. This article walks you through how to use the Google Cloud Vision API to generate image captions automatically, with code examples in JavaScript and Node.js. We also show you how to generate the Alt Text using the Ayrshare API.

Prerequisites

  • Basic understanding of JavaScript and Node.js
  • Familiarity with RESTful APIs
  • A Google Cloud Platform (GCP) account

Technologies Used

  1. JavaScript (Node.js)
  2. Axios for HTTP requests
  3. JSON

Steps to Automatically Generate Image Caption Text

Step 1: Set Up Google Cloud Vision API

  1. Log in to your Google Cloud Console.
  2. Create a new project or select an existing one.
  3. Navigate to “API & Services” -> “Library”, then search for “Cloud Vision API” and enable it.
  4. Create credentials (API key or service account key JSON file) for your project.

Step 2: Install Required Packages

To make HTTP requests, we’ll use the Axios library. Install it using npm if you haven’t already:

npm install axios

Step 3: Write the Code

Here’s how to write a Node.js script to use Google Cloud Vision API for generating captions:

const axios = require('axios');
const fs = require('fs');

// Initialize API endpoint and API key
const API_ENDPOINT = "https://vision.googleapis.com/v1/images:annotate";
const API_KEY = "<your_api_key>";

// Define headers for the API request
const headers = {
  'Content-Type': 'application/json'
};

// Base64 encode your image
const image = fs.readFileSync('path/to/your/image.jpg', { encoding: 'base64' });

// Prepare the payload
const payload = JSON.stringify({
  requests: [
    {
      image: {
        content: image
      },
      features: [
        {
          type: "LABEL_DETECTION",
          maxResults: 1
        }
      ]
    }
  ]
});

// Make the API request
axios.post(`${API_ENDPOINT}?key=${API_KEY}`, payload, { headers })
  .then(response => {
    const result = response.data;

    // Extract the caption (label)
    if (result.responses && result.responses[0].labelAnnotations) {
      const caption = result.responses[0].labelAnnotations[0].description;
      console.log(`Generated caption: ${caption}`);
    } else {
      console.log("Caption not generated.");
    }
  })
  .catch(error => {
    console.error("An error occurred:", error);
  });

Replace <your_api_key> with the API key you obtained from the Google Cloud Console. Modify the path to the image file accordingly.

Step 4: Run the Code

Save your script and execute it. If everything is set up correctly, you should see the generated caption for your image.

Understanding Google Cloud Vision API Costs

Before implementing any cloud-based service in your application, it’s crucial to understand its cost implications. The Google Cloud Vision API is no exception. While the API does offer a free tier, costs can scale with usage, and it’s essential to consider this when planning your application’s architecture.

Free Tier

Google Cloud Vision API offers a free tier that includes:

  • 1,000 units per feature per month for the first 12 months
  • 1,000 units per month for Cloud Vision API Web Detection feature

Check Google Cloud’s Pricing Page for the most up-to-date information on free tier limits.

Paid Tier

Once you exceed the free tier’s usage limits, you’ll move to the paid tier, and different types of image recognition features come with varying costs. The pricing generally falls under two categories:

  1. Feature-based Pricing: The cost depends on the specific feature you’re using, like label detection, face detection, etc.
  2. Volume-based Pricing: The more you use, the cheaper it gets. Google offers lower prices for higher volumes of API requests.

Cost Considerations

Here are some factors to keep in mind:

  • Number of Features: If you’re using multiple features like label detection and face detection on the same image, the cost multiplies based on the number of features used.
  • Rate Limits: Ensure you don’t hit the rate limits, as this could disrupt your application’s functionality.
  • Batch Requests: Google Cloud Vision API allows you to make batch requests, which can be more cost-effective and efficient.
  • Data Transfer Costs: While generally minimal, there may be additional costs for data transfer, especially if you’re transferring large volumes of data in and out of Google Cloud.

To avoid unexpected costs, set up budget alerts in your Google Cloud Console, so you’re notified when your spending hits a certain threshold.

Steps Using the Ayrshare API

A simpler alternative is to to leverage the Ayrshare API. With the Max Pack, you get many useful utilities including the Alt Text Generator.

Step 1: Set Up Ayrshare Account

  1. Sign up for Ayrshare and add the Max Pack to your account.
  2. Get your API key on the Ayrshare dashboard.

Step 2: Write the Code

Here is the sample code for calling the API in node.js.

const API_KEY = "API_KEY";
fetch("https://api.ayrshare.com/api/generate/altText", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${API_KEY}`
      },
      body: JSON.stringify({
        url: "https://img.ayrshare.com/012/gb.jpg", // required
      }),
    })
      .then((res) => res.json())
      .then((json) => console.log(json))
      .catch(console.error);

Make sure you replace the {API_KEY} with your API key from the dashboard and the url with your image url.

Step 3: Run the Code

Save it and run it. You will now see the response with the alt text generated similar to the below sample.

{
    "status": "success",
    "altText": "A ghostbusters vehicle driving through a field.",
    "url": "https://img.ayrshare.com/012/gb.jpg"
}

Don’t Skip the Alt Text

Automatically generating image captions has numerous applications, including enhancing website accessibility, SEO, and content discovery. With modern AI tools, it’s easy to automate this process benefiting your users and your business.