web scraping

How can AI be used in web scraping?

HOW CAN AI BE USED IN WEB SCRAPING  +  Chat GPT ?

Everything changed after year 2023 with the introduction of Chat-GPT.
You can ask it to create code for you that can scrap data from a web site.

Lets start with s simple example.

Step 1.
Download Python and install from here:

Download Python | Python.org

Open your terminal in MAC/Linux or command prompt in Windows and type python :

If you get the above then you are OK.
Type exit() to exit from Python

Step 2.

You need an editor to type or paste the code.
You can use anything like TextEdit in MAC or Notepad++ in Windows, but we stronly suggest to use the free VS Code from here:
Download Visual Studio Code – Mac, Linux, Windows

VS Code will guide you with setup and also suggest to install a Python extension that will assist you.

This is how your code will eventually look like in VS Code:

Step 3.

You will also need Chrome Browser and Google Chrome Driver, you can get the Chrome Driver from here:

Chrome for Testing availability (googlechromelabs.github.io)

Then select the location as shown below and paste it to a new tab to download the zip file (It does not work by just clicking on it, weird !) :

I unziped the file here: C:\temp\chromedriver-win64\

Step 4.

Login to Chat GPT and ask something simple for start:
using python and chrome driver and selenium, i want you to download the source of a web page

Prerequisites

  1. Download the ChromeDriver executable from the official website that matches your Chrome version.
  2. Install Selenium by running pip install selenium in your terminal.

Type pip install selenium in your terminal to install this library.

GPT came back with some code that did not work 🙂
So to save you time pasting the error back to GPT , we modified the code ourselves:
Paste the code in VS and save it as Sample1.py , for example save it in c:\temp folder.
(make sure double quotes are properly pasted, you might need to retype them in VS Code)

 

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
# Initialize the Chrome driver
# Make sure you’ve downloaded ChromeDriver and put its location in the PATH, or specify its path directly in the argument.
# Create the WebDriver with the executable path of ChromeDriver
service = Service(“C:\\temp\\chromedriver-win64\\chromedriver.exe”)
# THIS IS WHERE I UNZIPED THE CHROME DRIVER IN MY WINDOWS MACHINE
 
driver = webdriver.Chrome(service=service)
 
# Navigate to the desired website
driver.get(“https://www.example.com”)
# Get the source code of the webpage
source_code = driver.page_source
# Print the source code (optional)
print(source_code)
# Save the source code to a file
with open(“webpage_source.html”, “w”, encoding=”utf-8″) as f:
    f.write(source_code)
# Close the browser
driver.quit()
 
 
Click the start button to run the code:
 
 
Terminal is helpful showing whats going on while running a program:
 
 
 
Here is the result, the source HTML code saved from the web site:
 
 
 
Stay tuned we will return with a second sample on how to extract specific data from a web page.
 

latest posts

Automate Creation of Serial Numbers In An Existing PDF Template

We received a request from a client, to automate the creation of Serial Numbers inside a PDF file.The PDF was a A3+ format and was designed in such a way...

Are you moving away from VMware due to the new pricing method ? – Why not test Microsoft free Hyper-V as a VMware alternative

Since 2010, we've been using Hyper-V, starting from its 2008 version. Hyper-V essentially operates as a Windows Server Core, without a Graphics User Interface, and we deploy it as a...

Seven + 1 Effective Measures for Cybersecurity in Businesses

Cyberattacks are a big risk for all businesses, big and small. They can disrupt how things run and may cause a loss of money and customers. If a company's cybersecurity...

Unveiling Hidden Dangers: The Perils of UPnP and Protecting Your Digital Homestead – Apple Time Capsule

The Dangers of Default Settings: A Narrative on UPnP and Data Vulnerability During one of my routine client visits, I encountered a situation that underscored the criticality of cybersecurity in...

OTHER PRODUCTS

Tags:
,