web scraping

How can AI be used in web scraping?

HOW CAN AI BE USED IN WEB SCRAPING  +  Chat GPT ?

Everything changed after year 2023 with the introduction of Chat-GPT.
You can ask it to create code for you that can scrap data from a web site.

Lets start with s simple example.

Step 1.
Download Python and install from here:

Download Python | Python.org

Open your terminal in MAC/Linux or command prompt in Windows and type python :

If you get the above then you are OK.
Type exit() to exit from Python

Step 2.

You need an editor to type or paste the code.
You can use anything like TextEdit in MAC or Notepad++ in Windows, but we stronly suggest to use the free VS Code from here:
Download Visual Studio Code – Mac, Linux, Windows

VS Code will guide you with setup and also suggest to install a Python extension that will assist you.

This is how your code will eventually look like in VS Code:

Step 3.

You will also need Chrome Browser and Google Chrome Driver, you can get the Chrome Driver from here:

Chrome for Testing availability (googlechromelabs.github.io)

Then select the location as shown below and paste it to a new tab to download the zip file (It does not work by just clicking on it, weird !) :

I unziped the file here: C:\temp\chromedriver-win64\

Step 4.

Login to Chat GPT and ask something simple for start:
using python and chrome driver and selenium, i want you to download the source of a web page

Prerequisites

  1. Download the ChromeDriver executable from the official website that matches your Chrome version.
  2. Install Selenium by running pip install selenium in your terminal.

Type pip install selenium in your terminal to install this library.

GPT came back with some code that did not work 🙂
So to save you time pasting the error back to GPT , we modified the code ourselves:
Paste the code in VS and save it as Sample1.py , for example save it in c:\temp folder.
(make sure double quotes are properly pasted, you might need to retype them in VS Code)

 

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
# Initialize the Chrome driver
# Make sure you’ve downloaded ChromeDriver and put its location in the PATH, or specify its path directly in the argument.
# Create the WebDriver with the executable path of ChromeDriver
service = Service(“C:\\temp\\chromedriver-win64\\chromedriver.exe”)
# THIS IS WHERE I UNZIPED THE CHROME DRIVER IN MY WINDOWS MACHINE
 
driver = webdriver.Chrome(service=service)
 
# Navigate to the desired website
driver.get(“https://www.example.com”)
# Get the source code of the webpage
source_code = driver.page_source
# Print the source code (optional)
print(source_code)
# Save the source code to a file
with open(“webpage_source.html”, “w”, encoding=”utf-8″) as f:
    f.write(source_code)
# Close the browser
driver.quit()
 
 
Click the start button to run the code:
 
 
Terminal is helpful showing whats going on while running a program:
 
 
 
Here is the result, the source HTML code saved from the web site:
 
 
 
Stay tuned we will return with a second sample on how to extract specific data from a web page.
 

latest posts

Ransomware attacks in 2022 – Protection Measures

The majority of ransomware attacks in 2022 began with the exploitation of public-facing applications, data retrieval from compromised user accounts, and malicious emails, a new report by Kaspersky reveals.According to...

GPT-3 vs GPT-4 improvements, differences, thoughts and examples

GPT-4 builds upon the success of its predecessor, GPT-3, by offering several improvements in various aspects. While both models are based on the Transformer architecture, GPT-4 has several advancements over...

Ransomware severs 1,000 ships from on-shore servers

A Norwegian company specializing in maritime risk management faced a ransomware attack on January 7th 2023 that caused its ShipManager software to go offline, resulting in 1,000 ships losing their...

How important is to keep data backups isolated from malware.

Keeping regular backups of your data is essential in today's digital age. There are many reasons why data backups are important, including: Data Loss Prevention: Accidents happen, and hard drives...

OTHER PRODUCTS

Tags:
,