Metascraping is extracting the webpage’s metadata from the meta tags of a webpage. The metadata of a webpage is information about the page but not the page’s content. For example, the metadata may include the author’s name, title, and the web page’s description. It helps users and search engines understand what the page is about. Scraping metadata allows a user quickly collect information about web pages in less time. Several approaches can be used to scrap webpages for their metadata, including scraping manually, using a library, or using an API such as the Geekflare Metascraping API.
Many Ways to Kill a Cat
To scrap manually, one can open a webpage using Chrome DevTools and extract the metadata from the Elements tab. However, this manual, repetitive and tedious when you are dealing with multiple pages. We can automate the task using multiple approaches: The first approach is to write the code from scratch. In this approach, you make an HTTP request to the website whose metadata you want to extract. Afterward, you can parse the response HTML extracting data from the meta tags using regular expressions or pattern matching. However, this approach is reinventing the wheel as you will spend time rewriting existing code. The second approach is to use a library in whatever programming language you prefer. This allows you to abstract over the implementation detail and keeps things simple. However, if the programming language of your choice does not have a suitable library or the particular runtime you are using does not support the library, then you cannot use it. The third approach is to use an API like the Geekflare Metascraping API. This approach is ideal because it gives you a uniform interface regardless of your programming language. It is usable in any language as long as it supports making HTTP requests. This article will demonstrate how to use the Geekflare Metascraping API with cURL, PHP, and JavaScript(NodeJS).
Why Should You Use the Geekflare Metascraping API?
Because of the disadvantages of other approaches, the advantages of using the Geekflare API are:
It is language and runtime environment agnostic. You avoid reinventing the wheel and spend less time writing code. You can scrape multiple websites efficiently(in a matter of seconds). It is incredibly easy to use. You can use it for free.
Getting Started Using the Geekflare API
To use the Geekflare API, you will need an API key. To obtain one, go to the Geekflare Website and create a free account. After creating your account, log in to the dashboard. From the dashboard, you should be able to see your API key.
Geekflare Metascraping API Overview
The API endpoint is located at https://api.geekflare.com/metascraping. When you make a request, you should provide your API key as a request header with the name x-api-key and the value being your API key. You will also need to pass in additional parameters in the request body. These are the url, device, and proxyCountry.
URL specifies the URL of the webpage whose metadata you want to scrape. Device specifies the device used to visit the site when scraping metadata. Your options for devices are mobile or desktop. Proxy country specifies the country from which the request should be made before the data is scraped. The proxy country, however, is a premium feature and can only be used under the Geekflare paid plans.
Given that parameters will be passed as part of the body, the request has to be a POST request since GET requests cannot contain metadata.
Using the Geekflare Metascraping API in cURL
In the first demonstration, we will use the cURL utility from the command line to request the Metascraping API. To use cURL, you will need to install it first. I am going to be using a Bash terminal. This should be the default terminal on macOS and Linux. For Windows, you will have to install Git Bash. After cURL is installed, we can use the cURL command to make the request. We will pass in options to the command to specify the request parameters: the request method, the endpoint, the request body, and the request headers. NB: The backslash after the first three lines allows you to break the command input into multiple lines. This command specified the HTTP method as POST and the endpoint as the Geekflare API meta-scraping endpoint. We also sent the request body as a JSON object with a URL property specified as https://tesla.com. Lastly, we added the headers that specify the body content type as JSON and provided the API key using the x-api-key header. When we run this command, we get the following output: That is the correct output.
Using the Geekflare Metascraping API with JavaScript
For this project, we will create a NodeJS script to fetch data from the API. This means you will need NodeJS installed. You will also need NPM or any other package manager for Node to manage the project’s dependencies. I am also going to be using the Bash terminal to run commands.
To use the API in JavaScript, we first create an empty project folder and open it in a terminal.
After this, we can create the file where we are going to write the script:
Then we can instantiate the project as a Node project:
To use ESModule syntax inside our file, add the line “type“ : “module” to the root of the package.json file such that it looks like this:
Next, we will install the node-fetch package. This package provides a fetch function in NodeJS that is similar to the browser’s fetch function. This makes it easier to make HTTP requests in NodeJS than using the built-in http module to make requests.
When the package is correctly installed, we can start editing the script. Open the index.js file using a text editor of your choice. In my case, I am going to be using the terminal-based nano text editor.
Editing the index.js file, we start by importing the fetch function, which is the default export of the node-fetch module.
Then, we will define the body of our request. This is going to be a JSON string with a url property. The url property’s value is the webpage whose metadata we want to get.
Next, we may define the request options we will pass to the fetch function when we eventually call it.
We have defined our request method as being a POST request. We also defined two headers. One specifies that the body contains JSON data, and the other provides the API key.
You may replace
Using the Geekflare API with PHP
To use the Geekflare Metascraping API, first ensure you have PHP and Composer installed on your local machine.
To begin, create and open the project folder.
Next, install GuzzleHTTP. Guzzle is one of the many PHP clients you can use with the Geekflare API.
Once Guzzle is installed, we can create a script with
Then we can start writing the code. Using a text editor of your choice, open the script.php file. In my case, I am going to use nano which is a terminal-based text editor.
Inside the script, we insert boiler-plate PHP
Now to load the extensions, import the Request and Client classes from Guzzle. This code should be written between the we wrote before.
Next, we can create a client by instantiating the GuzzleHttp\Client class
Afterward, we can define headers for the request. For this particular request, we will provide two headers, one that specifies that the content type of the body is JSON and the other containing our API key.
Replace
Final Words
This guide went through different ways to consume the Geekflare Metascraping API. The Metascraping API allows you to also provide more parameters than just the URL one. One such parameter is the proxy parameter, which can only be accessed with the Geekflare API premium plan. Regardless, the Geekflare API remains powerful enough for many uses. Check out the official documentation of the Geekflare API for more information.