I’ve never used splinter or selenium before but I saw this question in stackoverflow and I felt like it was a chance to learn something new.
The user of the question was trying to retreive the properties of a login textbox by using splinter and selenium-webdriver.
By no providing the source URL debugging the problem was a little bit upsetting. Anyway I took his snippet of code and tried to execute it in my environment. After a couple of minutes debugging I’ve realized and installed the required tools for that script to work:
# Export my corporate proxy to avoid problems when using python-pip export http_proxy=http://stupid_corporate_proxy export https_proxy=http://stupid_corporate_proxy # Install python-pip sudo apt-get install python-pip python3-pip # Install Splinter sudo pip install splinter # Install Selenium sudo pip install selenium # Export the path of chrome driver export PATH=$PATH:/srv/selenium/chromedriver
I’ve downloaded the Chrome Driver and placed it in /srv/selenium/chromedriver.
Anyway, after the environment was ready I was able to test a website:
#!/usr/bin/python from splinter import Browser browser = Browser('chrome') browser.visit('https://migueleonardortiz.com.ar') results = browser.find_by_name('generator') for objectx in results : print objectx._element.get_attribute('content')
And the output of it:
mortiz@florida:~/Documents/projects/python/splinter$ python web_browser_splinter.py Divi v.2.5.6 WordPress 4.9.6
If you want to use **kwargs you’ll need a python dictionary instead:
from splinter import Browser executable_path = {'executable_path':'</path/to/chrome>'} browser = Browser('chrome', **executable_path)
So, the problem the user was experiencing wasn’t about the snippet of code in python but the source URL he was attempting to use.
By tracking down the form ID he used in his example I’ve located a website using that property, it was the HDFC BANK, an Indian bank. Although the HTML property exists, it’s being rendered by Javascript, so If you tell Splinter to retrieve it, it won’t happen because it doesn’t exists in the current DOM.
This is obviously a method to secure the bank website against bots or undesired scripting that could overload or hack their systems through brute force. If you look at the web page source you won’t find too much but a couple of scripts, but if you try inspecting the current elements displayed then you’ll notice there’s a lot of HTML embedded.
It’s the same thing when you use CURL on that URL, it will download that first HTML but won’t render the Javascript. Although I have several ideas on how to override that security to retrieve the elements I guess it’s not a good idea to post them publicly.