Are you a security professional or analyst who needs to quickly categorize a large list of websites? Or maybe you simply want to analyze a bunch of domains for your own research. Manually checking each website’s category on Symantec’s Site Review tool can be extremely tedious and time-consuming.
To help with this, we’ve created a handy Python script that can automate the process of categorizing websites in bulk using Symantec’s database. In this post, we’ll explain what the Site Review tool is, how our script works, what’s required to run it, and step-by-step directions to start categorizing websites in bulk.
Table of Contents
A Short Note About the Symantec Site Review Tool
Symantec’s free Site Review tool allows you to manually check the category of any website URL through their database. It can identify over 60 content types like gambling, hate speech, botnets, malware, and more. This helps security teams classify and filter websites during investigations.
The tool is very useful but has some capacity limitations. It is designed for manual one-off checks rather than bulk automated submissions. Submitting over 500 URLs in quick succession can trigger CAPTCHAs which block further automated requests.
We strongly recommend avoiding submitting thousands of URLs automatically in a short time span. This overuses Site Review’s resources past its intended purpose. Instead, use our script to categorize websites in batches under 100 URLs. Let sufficient time pass before running larger lists. With reasonable use, the tool can benefit your website categorization workflow. But be mindful of its limitations and don’t aggressively overuse the free resource.
About The domain_categorization.py Bulk Website Categorization Script
To help automate Symantec’s Site Review tool for bulk checks, we built a Python script called domain_categorization.py. It handles the submission of multiple URLs to the Site Review website, parses the category results, and compiles everything into an easy to analyze text file output.
Here is a high-level overview of how it works:
The script starts by opening the list of URLs you want to check from a file called domains.txt. It then launches a Chrome browser using Selenium, programmatically navigating to the Site Review tool webpage.
It takes the first URL in the list, inputs it into the search box on the Site Review website, and hits enter to submit the site for categorization. As it processes each URL, it checks for any CAPTCHA pop-ups and handles them accordingly.
Once a URL is categorized, the script extracts the identified categories and domain name into variables. It logs everything, along with the original URL, into a results.txt output file. It iterates through this extraction process for every URL in the input list.
Any URLs that hit a CAPTCHA are logged separately into a captcha.txt file for manual rechecking later.
Overall this automated interaction with Site Review’s website allows you to feed in a bulk list of URLs and efficiently get website categories parsed into a text file. This saves huge amounts of manual analysis time.
Here are its key features:
- Bulk URL Processing: Parses a text file of URLs and checks categories for each through the Site Review tool website.
- CAPTCHA Handling: Automatically detects and handles CAPTCHAs if they appear, logging any blocked URLs.
- Result Logging: Stores categorized URL data into a results .txt file for easy analysis. Also logs any CAPTCHA occurrences.
Overall, it makes checking hundreds or thousands of websites a quick and painless process!
Prerequisites to Run the Bulk Website Checker Script
Before running the domain_categorization.py script to categorize websites, you need to set up:
Python Environment: Having Python 3.x installed on your computer is necessary for executing the
.py script. Download the latest 3.x version if you don’t already have it from https://www.python.org/downloads/.
Detailed installation procedures are: Step-by-Step Procedure to Install Python on Windows
Selenium Module: The script imports Selenium to automate interaction with a browser. Install it via pip by running:
pip install selenium
ChromeDriver: ChromeDriver allows Selenium to interface with Google Chrome. Download the driver from https://chromedriver.chromium.org/downloads and add its executable to your system PATH. Ensure you grab the ChromeDriver version that matches your installed Chrome browser version.
Input File: Have your list of URLs ready in a plain text file called
domains.txt. Put one URL per line in this file for the script to iterate through.
Once Python, Selenium, ChromeDriver, and the input URL list are ready, you can move onto running the website category checker script! Let us know if any of the prerequisites are unclear.
Step-by-Step Guide to Checking Websites in Bulk
Once the prerequisites are set up, you are ready to utilize the script to categorize website lists in bulk.
Follow this streamlined process:
Step 1: Populate Input File
Add the full URLs you want to check, one on each line, into a text file called
domains.txt. This serves as the input list that the script will iterate through.
Step 2: Execute the Script
Open a terminal or command prompt, navigate to the script’s directory, and run:
This launches the Selenium browser automation to start checking each URL through Symantec’s Site Review tool.
Step 3: Review Outputs
As the script runs, your categorized URL results will be compiled line-by-line in the
results.txt file. Any CAPTCHAs encountered mid-process will be logged in
captcha.txt for retry later.
And that’s it! Sit back and let the tool scrape through your website list automatically. The heavy lifting of submissions and parsing is handled programmatically to save you headaches.
Let us know if you have any other questions getting set up! It’s our pleasure to help you create python scripts like this for any other use. Feel free to comment here.
We hope this article helped in understanding how to check website’s categories in bulk from Symantec Site Review tool using our bulk domain/IP or URL’s categorization checker script. Thanks for reading this post. Please share this post and help secure the digital world. Visit our website, thesecmaster.com, and our social media page on Facebook, LinkedIn, Twitter, Telegram, Tumblr, Medium, and Instagram and subscribe to receive updates like this.