MENU

Downloading Files with Selenium

This article was first published on Python - datawookie , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

If you use Selenium for browser automation then at some stage you are likely to need to download a file by clicking a button or link on a website. Sometimes this just works. Other times it doesn’t.

When I encounter a stubborn download I have found that adding some specific preferences when I launch Selenium can help.

These are the preferences I apply:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
prefs = {
"download.default_directory": os.getcwd(),
"download.prompt_for_download": False,
"directory_upgrade": True,
"safebrowsing.enabled": True,
"profile.default_content_settings.popups": 0,
"profile.content_settings.exceptions.automatic_downloads.*.setting": 1,
"profile.default_content_setting_values.automatic_downloads": 1,
"profile.default_content_settings.mimetype_overrides": {
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
}
prefs = { "download.default_directory": os.getcwd(), "download.prompt_for_download": False, "directory_upgrade": True, "safebrowsing.enabled": True, "profile.default_content_settings.popups": 0, "profile.content_settings.exceptions.automatic_downloads.*.setting": 1, "profile.default_content_setting_values.automatic_downloads": 1, "profile.default_content_settings.mimetype_overrides": { "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" } }
prefs = {
  "download.default_directory": os.getcwd(),
  "download.prompt_for_download": False,
  "directory_upgrade": True,
  "safebrowsing.enabled": True,
  "profile.default_content_settings.popups": 0,
  "profile.content_settings.exceptions.automatic_downloads.*.setting": 1,
  "profile.default_content_setting_values.automatic_downloads": 1,
  "profile.default_content_settings.mimetype_overrides": {
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  }
}

What does each of those do?

  • download.default_directory
    download.default_directory — Sets the download directory. Not strictly necessary, but useful to have control over this. Defaults to
    ~/Downloads
    ~/Downloads.
  • download.prompt_for_download
    download.prompt_for_download — Prevents the browser from asking where to save the file.
  • directory_upgrade
    directory_upgrade — Allows browser to change download directory.
  • safebrowsing.enabled
    safebrowsing.enabled — Enables the Safe Browsing feature, which protects against phishing, malware, and other malicious content. Again, not strictly necessary, but good to have.
  • profile.default_content_settings.popups
    profile.default_content_settings.popups — Block popups. This refers to browser popups, not in-page dialogs or popups.
  • profile.content_settings.exceptions.automatic_downloads.*.setting
    profile.content_settings.exceptions.automatic_downloads.*.setting — Allow multiple automatic downloads without requiring user intervention.
  • profile.default_content_setting_values.automatic_downloads
    profile.default_content_setting_values.automatic_downloads — Allow automatic downloads.
  • profile.default_content_settings.mimetype_overrides
    profile.default_content_settings.mimetype_overrides — Override MIME type handling for specific file types.

Of these, the final preference, which specifies how the XLSX MIME type should be handled, is probably the most important. Where does the MIME type come from? It should be found in the server headers for the download (so crack open Developer Tools to find it). Without this setting it’s possible that the browser might apply a generic MIME type (like

application/octet-stream
application/octet-stream), and this might cause the browser to prompt the user for how to handle the downloaded file.

Take a look at a complete Python script that downloads an XLS file from here. In the interests of full disclosure, this script will work fine without those extra preferences, but it does illustrate what needs to be done for a more stubborn site. The server headers for this download are included below.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
HTTP/2 200
last-modified: Tue, 22 Mar 2022 12:47:49 GMT
content-length: 8704
content-type: application/vnd.ms-excel
date: Sat, 05 Oct 2024 04:15:52 GMT
cache-control: max-age=0
expires: Sat, 05 Oct 2024 04:15:52 GMT
server: Apache
HTTP/2 200 last-modified: Tue, 22 Mar 2022 12:47:49 GMT content-length: 8704 content-type: application/vnd.ms-excel date: Sat, 05 Oct 2024 04:15:52 GMT cache-control: max-age=0 expires: Sat, 05 Oct 2024 04:15:52 GMT server: Apache
HTTP/2 200 
last-modified: Tue, 22 Mar 2022 12:47:49 GMT
content-length: 8704
content-type: application/vnd.ms-excel
date: Sat, 05 Oct 2024 04:15:52 GMT
cache-control: max-age=0
expires: Sat, 05 Oct 2024 04:15:52 GMT
server: Apache

Clearly the browser already knows to save the

application/vnd.ms-excel
application/vnd.ms-excel MIME type specified in the
content-type
content-type header. For comparison, here are the server headers for a download from here:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
HTTP/2 200
last-modified: Thu, 27 Jan 2022 17:47:57 GMT
content-length: 9487759
content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
date: Sat, 05 Oct 2024 04:19:11 GMT
cache-control: max-age=86400
expires: Sun, 06 Oct 2024 04:19:11 GMT
server: nginx/1.25.5
HTTP/2 200 last-modified: Thu, 27 Jan 2022 17:47:57 GMT content-length: 9487759 content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet date: Sat, 05 Oct 2024 04:19:11 GMT cache-control: max-age=86400 expires: Sun, 06 Oct 2024 04:19:11 GMT server: nginx/1.25.5
HTTP/2 200 
last-modified: Thu, 27 Jan 2022 17:47:57 GMT
content-length: 9487759
content-type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
date: Sat, 05 Oct 2024 04:19:11 GMT
cache-control: max-age=86400
expires: Sun, 06 Oct 2024 04:19:11 GMT
server: nginx/1.25.5

Note that this uses a different MIME type (

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet) to download a XLSX file.

To leave a comment for the author, please follow the link and comment on their blog: Python - datawookie .

Want to share your content on python-bloggers? click here.