This Python Selenium tutorial explains the process of creating Selenium Scripts in Python using code examples. Selenium binds with Python in a fantastic way. By using Selenium with Python you can easily create automation tools that:
- Scrape data from the web
- Test sites or pages as per multiple predefined test-cases
- Get any repetitive online task done at scale
In all our scripts we would be using Selenium Webdriver . Basically what Selenium Webdriver does is control the browser on behalf of an user. The scripts that we write tells Selenium Webdriver what it needs to do on our behalf. In this Python Selenium tutorial we are going to learn with the help of 2 simple but wacky examples.
Installing Selenium for Python
The installation process of Selenium on Python is extremely straight forward. In case you haven’t installed selenium yet – you could check this installation guide.
Also remember to install the driver for your preferred browser. In the examples below we would be using Chrome.
Web Scraping using Selenium in Python
Using Selenium, you can scrape information from any site. In-fact you can log-in to your social media profiles & pull out information from there as well. In this example we would learn how to pull out product and price details from a e-commerce web-page.
Importing Selenium and accessing a website using chrome
Like any Python code we start by importing the required modules :
import selenium from selenium import webdriver
While Selenium here is the Python library , Webdriver is an object within the library. We then then open a browser window using methods associated with Selenium Webdriver object. The following code, will open chrome in a new window. However, the browser will display the following message : “Chrome is being controlled by automated test software“.
Do note that we are also creating a new object called “driver“
driver = webdriver.Chrome()
Now to open the URL that we want to scrape from , we will use the following code. The get method is used to pass the URL of the page that we want to open:
URL = "https://www.insaraf.com/collections/desks-home-office" driver.get(URL)
That’s it! We have successfully opened chrome in a window and entered our target page.
Finding items to be scraped using identifiers
We now want to pull out the product names and prices. There are 8 different identifiers available to locate an element. For each identifier there are 2 corresponding methods:
- find_element_by_identifier (if we want to capture a first matching instance)
- find_elements_by_identifier (if we want to capture all the matching instances)
Check the list below for all the different identifiers available to fetch an element. The corresponding methods are mentioned within parenthesis:
- Name (find_elements_by_name)
- ID (find_elements_by_id)
- Link text (find_elements_by_link_text)
- Partial Link text (find_elements_by_partial_link_text)
- X-path (find_elements_by_xpath)
- CSS-selector (find_elements_by_css_selector)
- Tag Name (find_elements_by_tag_name)
- Class Name (find_elements_by_class_name)
In case of e-commerce sites, it’s highly probable that each listing would have some common html identifiers. We would need to inspect the target element to understand the identifiers.
In the case of this particular site that we intend to scrape,
The product names have the common class name: product-block-title
The price values have the common class name: price
We need to pull these out & store them in a list. If we use find_elements_by we would get list as output. On the other hand if we use find_element_by we would get a single element as output.
#List of product names x=driver.find_elements_by_class_name("product-block-title") #List of prices (corresponding) y=driver.find_elements_by_class_name("price") # the text attribute helps us get the text from the elements x1=[item.text for item in x] y1=[item.text for item in y] #looping through both lists in tandem to get a combined list: tally= for x,y in zip(x1,y1): tally.append([x,y]) print(tally)
The output would a list of lists containing the product names and prices:
[['Solid Wood Max Executive Office Desk', 'Rs. 43,999.00'], ['Solid Wood Port Writing / Office Desk', 'Rs. 28,999.00'], ['Solid Wood Brew Study Table with 2 Drawers', 'Rs. 13,999.00'], ['Solid Wood Brew Study Table with Drawer', 'Rs. 13,999.00'], ['Solid Wood Million Study Table with Storage', 'Rs. 18,999.00'], ['Solid Wood Ellen Study Table with 2 Drawers', 'Rs. 15,999.00'], ['Solid Wood Ellen Study Table with 2 Drawers', 'Rs. 15,999.00'], ['Solid Wood Eva Stripe Study Table with 2 Drawers', 'Rs. 16,999.00'], ... ['Solid Wood Cube Office Table', 'Rs. 17,999.00'], ['Solid Wood Cube Office Desk', 'Rs. 29,899.00'], ['Solid Wood Cube Office Unit', 'Rs. 17,999.00']]
Exporting the Scraped items to Excel
The last step would be to export the scraped items to excel for storage & analysis. We will be using pandas module to do it and would store the above list-of-lists to a Pandas DataFrame:
from pandas import DataFrame df=DataFrame(tally,columns=["item","price"]) df
The above code converted the clumsy list of lists to a beautiful data-frame:
|0||Solid Wood Max Executive Office Desk||Rs. 43,999.00|
|1||Solid Wood Port Writing / Office Desk||Rs. 28,999.00|
|2||Solid Wood Brew Study Table with 2 Drawers||Rs. 13,999.00|
|3||Solid Wood Brew Study Table with Drawer||Rs. 13,999.00|
|4||Solid Wood Million Study Table with Storage||Rs. 18,999.00|
|5||Solid Wood Ellen Study Table with 2 Drawers||Rs. 15,999.00|
|6||Solid Wood Ellen Study Table with 2 Drawers||Rs. 15,999.00|
|7||Solid Wood Eva Stripe Study Table with 2 Drawers||Rs. 16,999.00|
|8||Solid Wood Eva Study Table with Drawer||Rs. 13,999.00|
|9||Solid Wood Curved Writing / Study Table||Rs. 13,999.00|
|10||Solid Wood Cubex Writing / Study Table||Rs. 13,999.00|
|11||Solid Wood Crossia Writing / Study Table||Rs. 13,999.00|
|12||Solid Wood Voted Wood Writing / Office Desk||Rs. 22,999.00|
|13||Solid Wood Eva Study Table with 2 Drawers||Rs. 13,999.00|
|14||Solid Wood Slant Writing / Study Table||Rs. 13,999.00|
|15||Solid Wood Turner Writing / Office Desk / Stud…||Rs. 14,999.00|
|16||Solid Wood Jaipur Writing / Study Table||Rs. 13,999.00|
|17||Solid Wood Charlie Study Table with 2 Drawers||Rs. 11,999.00|
|18||Solid Wood Jali Kids Writing / Study Table||Rs. 7,999.00|
|19||Solid Wood Kuber Writing / Study Table||Rs. 13,999.00|
|20||Solid Wood Jodhpur Writing Table||Rs. 17,999.00|
|21||Durban Solid Sheehsam Wood Writing / Office Desk||Rs. 21,999.00|
|22||Silver Solid Sheesham Wood Office Desk||Rs. 29,999.00|
|23||Solid Wood Cube Office Table||Rs. 17,999.00|
|24||Solid Wood Cube Office Desk||Rs. 29,899.00|
|25||Solid Wood Cube Office Unit||Rs. 17,999.00|
Exporting the Pandas dataframe to excel can be done by a single line of code:
The above code would create the excel file in the same folder as the Python code. To create it in a different folder we need to mention the path.
import selenium from selenium import webdriver from pandas import DataFrame driver= webdriver.Chrome() URL = "https://www.insaraf.com/collections/desks-home-office" driver.get(URL) x=driver.find_elements_by_class_name("product-block-title") y=driver.find_elements_by_class_name("price") x1=[item.text for item in x] y1=[item.text for item in y] tally= for x,y in zip(x1,y1): tally.append([x,y]) df=DataFrame(tally,columns=["item","price"]) df.to_excel("file_name.xlsx")
“Googling” using Selenium in Python
In this example, we will use Selenium to google “credit cards”.
As an user to get this done, you need to do 3 steps:
- Log on to google.com
- Type the search term in the search box
- Click the button called “Google search”
We will get Selenium to do these on our behalf.
Log on to google.com using Selenium in Python
This part is same as a the example above. The below code will open google.com in a new browser window.
import selenium from selenium import webdriver driver= webdriver.Chrome() URL = "https://www.google.com" driver.get(URL)
Typing online using Selenium in python
To type any data using Selenium we would need to import the “keys” function
from selenium.webdriver.common.keys import Keys
The search box element is named “q” . We will use find_element_by_name and pass the value “q”.
inputElement = driver.find_element_by_name("q") inputElement.send_keys("Credit Cards")
Now , the last step would be to submit. We can do that with a single line of code:
That’s we our “Automated google search” is complete!
import selenium from selenium import webdriver from selenium.webdriver.common.keys import Keys driver= webdriver.Chrome() URL = "https://www.google.com" driver.get(URL) inputElement = driver.find_element_by_name("q") inputElement.send_keys("Credit Cards") inputElement.submit()
FAQs : Python Selenium Tutorial
The available identifiers are Name, ID, Link text , Partial link text, x-path, css selector, tag name and class name
We can add keyboard actions to our scripts by using send_keys function. To use the same we need to import Keys sub-module from selenium.webdriver.common.keys
Hope you enjoyed this brief Python Selenium tutorialc . If you want to read more on Python you may find some of my articles useful.