How To Find Broken Links In Selenium

Topics Covered

Overview

For website owners, broken links can be a serious issue because they degrade the user experience and may drive away visitors.It is also harmful to the SEO rating of a page. When a link on a website directs users to an outdated or nonexistent page, this is known as a broken link. Manually locating broken links might be a difficult operation, but Selenium WebDriver makes it easy to automate this procedure. We will go over what broken links are, why they need to be checked, the typical causes of broken links, and how to find broken links with Selenium in this article.

Introduction

broken-link

Selenium is a very strong and well-liked tool in the web automation industry. Through a series of automated scripts, it enables users to interact with online pages, carry out operations, and retrieve data as a foundation for web automation testing. For users and website owners, broken links may be a big cause of annoyance. They may result in mistakes, missing pages, broken functionality, and poor user experience. As a result, it's critical to find and replace broken links on a website as soon as feasible. Selenium, fortunately, has a capability that can assist in locating broken links on a website. Selenium can be used to automatically detect broken links on a website, saving time and resources.

Broken links are links on a website that lead to non-existent or outdated pages. They can occur when a page is deleted, a URL is changed, or a page is moved. Broken links can negatively impact user experience, increase bounce rates, and result in lost traffic.

HTTP Status CodeAdditional NotesDefinition
400(Bad Request)The server cannot process the request due to an incorrect URL.
400(Bad Request - Bad Host)The server cannot process the request due to an invalid hostname.
400(Bad Request - Bad URL)The server cannot process the request as the URL is in an incorrect format, missing characters such as brackets, slashes, etc.
400(Bad Request - Empty)The response returned by the server is empty with no content and no response code.
400(Bad Request - Timeout)HTTP requests have timed out.
400(Bad Request - Reset)The server cannot process the request as it is busy processing other requests or has been misconfigured by the site owner.
404(Page Not Found)The requested page is not available on the server.
403(Forbidden)The server refuses to fulfil the request as authorization is required.
410(Gone)The requested page is no longer available, and this code is more permanent than 404.
408 (Request Time Out)The server has timed out waiting for the request.
503(Service Unavailable)The server is temporarily overloaded and cannot process the request.

For website owners, broken links may be a big problem because they can ruin the user experience and lower a site's search engine rankings. A website must regularly check for broken links and fix them. Selenium and other automated technologies can be used in this situation. There are many features offered by Selenium that make it possible to test web applications effectively and efficiently, including the capability to look for broken connections. Website owners can save time and make sure their site is operating properly by automating the process of identifying and repairing broken links. That is why how to find broken links in Selenium is important.

There are various reasons for broken links. Most major reasons for broken links include

  • 404 Page Not Found

reason-for-broken-link-1

When the web server cannot locate the requested page, this error occurs. When a page is destroyed or relocated to a different URL without the correct redirection, this can happen. A website's internal links that are broken might also produce 404 errors. Websites can utilize customized 404 pages to enhance user experience and direct visitors to other valuable pages on the platform.

  • 400 Bad Request

reason-for-broken-link-2

This error happens when the user submits a request that the web server is unable to handle because of improper request syntax or an unsupported request method. This may occur if the URL is incomplete or contains errors, such as missing brackets, slashes, or parameters. To assist the user in understanding the issue, the server may respond with a personalized error message.

  • User’s Firewall Settings

reason-for-broken-link-3

Network security systems called firewalls keep track of and regulate incoming and outgoing traffic by predetermined rules. A user's firewall may prohibit them from accessing particular websites or resources if configured to block specific traffic or domains. To fix broken connection difficulties, users can modify their firewall settings or momentarily turn them off.

  • The Link is Misspelled

reason-for-broken-link-4

Links that are broken can be caused by URL errors. Users may experience this if they manually type the URL or copy and paste it from another location. Webmasters can find and correct misspelled links on their websites using tools like Google Search Console or outside crawlers.

The first step in how to find broken links in Selenium is to collect all the links on the web page using Selenium's find_elements_by_tag_name method. This method returns a list of all elements on the page with the specified tag name.

  • Identify and Validate URLs Once all the links are collected, the next step is to identify and validate URLs using Python's urllib.request module. This module provides functions for opening URLs and sending HTTP requests.
  • Send HTTP request The next step is to send an HTTP request to the server using the urllib.request.urlopen() method. This method returns an HTTP response object that contains information about the server's response.
  • Validate Links The final step is to validate the links based on the HTTP status code. If the HTTP status code is 200, the link is valid. If the status code is anything other than 200, the link is broken.

Here is an example of how to find broken links in Selenium using Python:

Import Packages

First, we need to import the necessary packages:

Explanation

This code is typically used in conjunction with other code to perform tasks such as finding and checking links on a web page. The webdriver module is used to open a web page, while the urllib modules are used to extract links from the page and check their status codes.

Next, we will collect all the links on the web page using Selenium:

Explanation

This code uses the Selenium Python library to automate a web browser, in this case, Google Chrome. The links variable will contain a list of all the anchor elements on the page. This list can be iterated over to extract information such as the URL of each link, the text content of each link, and so on.

Identify and Validate URLs

Next, we will identify and validate URLs using Python's urllib.request module:

Explanation

This code extracts all the links on a webpage and checks each link for HTTP errors by attempting to open them. It then prints the path of each URL and its HTTP response code or error code.

Finally, we will validate the links based on the HTTP status code:

Explanation:

If an HTTPError occurs while trying to open the URL, it means that the link is broken, and the code prints the path of the URL and the message is broken. By running this code, we can identify all the broken links on a web page and take the necessary actions to fix them.

Conclusion

  • Broken links can lead to a poor user experience and frustrate website visitors, potentially driving them away from your site and reducing engagement.
  • Broken links can have a detrimental influence on both the user experience and the search engine optimization (SEO) rating of your website.
  • You may make sure that your website is fully functional and available to all users, regardless of device or location, by looking for broken links.
  • Common reasons for broken links include 404 page not found errors, 400 bad requests, user firewall settings, and misspelled links.
  • Selenium automates the process of inspecting links and can assist you in swiftly identifying and resolving any issues, so using it to find broken links can save you time and effort.