How to Find Broken Links in Selenium?

Topics Covered

Overview

For website owners, broken links can be a serious issue because they degrade the user experience and may drive away visitors. It is also harmful to the SEO rating of a page. When a link on a website directs users to an outdated or nonexistent page, this is known as a broken link. Manually locating broken links might be a difficult operation, but Selenium WebDriver makes it easy to automate this procedure. In this article, we will go over what broken links are, why they need to be checked, the typical causes of them, and how to find broken links with Selenium.

Broken links are links on a website that lead to non-existent or outdated pages. They can occur when a page is deleted, a URL is changed, or a page is moved. Broken links can negatively impact user experience, increase bounce rates, and result in lost traffic.

broken-link-selenium

The chart below shows various HTTP Status Codes for broken links and the reason and meaning of those codes.

HTTP Status CodeNameDefinition
400Bad RequestThe server is unable to process the request due to an incorrect URL, an invalid hostname, or an incorrect URL format.
400Bad Request – Bad HostThe Server is unable to process the request as the hostname provided is invalid.
400Bad Request – TimeoutThe HTTP requests submitted have timed out
404Page Not FoundThe requested page is not there on the server.
403ForbiddenThe server denies to fulfill the request due to lack of authorization.
410GoneThe requested page is no longer available, and this status code indicates a more permanent absence than 404.
408Request TimeoutThe server has timed out waiting for the request to complete.
503Service UnavailableThe server is temporarily overloaded and cannot process the request.

For website owners, broken links may be a big problem because they can ruin the user experience and lower a site's search engine rankings. A website must regularly check for broken links and fix them. Selenium and other automated technologies can be used in this situation. There are many features offered by Selenium that make it possible to test web applications effectively and efficiently, including the capability to look for broken connections. Website owners can save time and make sure their site is operating properly by automating the process of identifying and repairing broken links. That is why how to find broken links in Selenium is important.

There are various reasons for broken links. Most major reasons for broken links include,

  • 404 Page Not Found: reasons-for-broken-links When the web server cannot locate the requested page, this error occurs. When a page is destroyed or relocated to a different URL without the correct redirection, this can occur. A website's internal links that are broken might also produce 404 errors. Websites can utilize customized 404 pages to enhance user experience and direct visitors to other useful pages on the platform.

  • 400 Bad Request:

    reasons-for-broken-links-1

    This error happens when the user submits a request that the web server is unable to handle because of improper request syntax or an unsupported request method. This may occur if the URL is incomplete or contains errors, such as missing brackets, slashes, or parameters. To assist the user in understanding the issue, the server may respond with a personalized error message.

  • User’s Firewall Settings:

    reasons-for-broken-links-2 Network security systems called firewalls keep track of and regulate incoming and outgoing traffic by predetermined rules. A user's firewall may prohibit them from accessing particular websites or resources if it is configured to block particular traffic or particular domains. To fix broken connection difficulties, users can modify their firewall settings or momentarily turn them off.

  • The link is misspelled:

    reasons-for-broken-links-3 Links that are broken can be caused by URL errors. Users may experience this if they manually type the URL or copy and paste it from another location. Webmasters can find and correct misspelled links on their websites using tools like Google Search Console or outside crawlers.

Here is an example of how to find broken links in Selenium using Python:

Code Snippet

Explaining the Code

  • Import Packages:
    First, we need to import the necessary packages. This code is typically used in conjunction with other code to perform tasks such as finding and checking links on a web page. The webdriver module is used to open a web page, while the urllib modules are used to extract links from the page and check their status codes.

  • Collect all links on the web page:
    Next, we will collect all the links on the web page using Selenium. This code uses the Selenium Python library to automate a web browser, in this case, Google Chrome. The links variable will contain a list of all the anchor elements on the page. This list can be iterated over to extract information such as the URL of each link, the text content of each link, and so on.

  • Identify and Validate URLs:
    Next, we will identify and validate URLs using Python's urllib.request module. This code extracts all the links on a webpage and checks each link for HTTP errors by attempting to open them. It then prints the path of each URL and its HTTP response code or error code.

  • Validate Links:
    Finally, we will validate the links based on the HTTP status code. If an HTTPError occurs while trying to open the URL, it means that the link is broken, and the code prints the path of the URL and the message is broken. By running this code, we can identify all the broken links on a web page and take the necessary actions to fix them.

Conclusion

  • Broken links can lead to a poor user experience and frustrate website visitors, potentially driving them away from your site and reducing engagement.
  • Broken links can negatively influence both the user experience and the search engine optimization (SEO) rating of your website.
  • You may make sure that your website is fully functional and available to all users, regardless of device or location, by looking for broken links.
  • Common reasons for broken links include 404 page not found errors, 400 bad requests, errors due to firewall settings, and if the links are misspelt.
  • Selenium automates the process of inspecting links and can assist you in swiftly identifying and resolving any issues, so using it to find broken links can save you time and effort.