Python: Downloading data from the web

joushe info
2022-04-12
0

...

Are you tired of going to the browser, downloading the data you want, and then saving it to your desired folder? Well, here is your solution! You can download the data from the web using Python! Let everything be automated!

Are you tired of going to the browser, downloading the data you want, and then saving it to your desired folder? Well, here is your solution! You can download the data from the web using Python! Let everything be automated!

The data used for this tutorial was downloaded from the following source: https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv.

After you have searched your file on the web (it can be any file from any web), the first thing you should do is to right-click on the file and copy its link address as shown in the figure below.

Now, go to your Python file and paste this link address of the file in order to read and download the file. For this purpose, since we are working with a link address, we have to import the request library from the urllib library.

#Importing library
from urllib import request


#Reading the file from the link file_url = r'https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv?raw=true'

The letter r in the code stands for reading mode. Note that the link address should be inside the quotation marks (' ').

Now, we will get the file downloaded line by line and saved in a text file (which is not yet created). For this purpose, we will define a function for doing this, and then, at the end, we should call this function in order to get the data.

#Defining a function to download the file


def file_info(url):
#Opening the url file file_open = request.urlopen(url)
#Reading the file file_content = file_open.read()
#Converting into string content = str(file_content)
#Splitting the lines lines = content.split('\\n')

Notice that the function's name is file_info, and its input is called url, which can be differently named, as you prefer. However, if you do so, do not forget to change the corresponding names in the upcoming code lines!

Once the function is defined, the first thing we should do is to open the file from the web. For this, the function request.urlopen is needed. Then, in order for Python to go through the whole file and read it, the function read is needed.

The opened file by Python is in the bit format, which is a complex format to work with. Thus, the need to convert it to a string format arises. After doing so, we must split the lines of the file, otherwise, the whole content of the file will be in one long line.

Now that Python is able to read the file from the web, we will save it as a new file in the same directory as our Python script file. For this purpose, we just need 4 lines of code!

   #Saving data into a text file
   with open('vaccinations.txt', 'w') as output_file:
      for line in lines:
         save_data = output_file.write(line + '\n')
         print(save_data)

Python has the possibility to 'open' a file that does not exist in a write mode. The write mode 'w' means that the text file Python just created is ready to be written. In the first line, output_file is the name of the variable. It is similar to this:

output_file = open('vaccinations.txt', 'w')

Then, the second line of the code is used to go through the lines variable, which contains the content of our web file. Once Python has read all the lines of the web file, it will copy and paste it into the created text file using the write function, and then save it. As already explained before, the keyword '\n' is used to split the lines.

Once we got the text file created with the content from the web file, we just need to call our previous created function file_info(url).

#Calling the function
file_info(file_url)

If we run this code, the text file created by Python will be found in the same folder as your Python script.

The final code will look like this:

#Importing library
from urllib import request


#Reading the file from the link file_url = r'https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv?raw=true'
def file_info(url):
#Opening the url file file_open = request.urlopen(url)
#Reading the file file_content = file_open.read()
#Converting into string content = str(file_content)
#Splitting the lines lines = content.split('\\n')
#Saving data into a text file with open('vaccinations.txt', 'w') as output_file: for line in lines: save_data = output_file.write(line + '\n') print(save_data)
#Calling the function file_info(file_url)

Congratulations! You just made your first step into huge amount of data! Keep coding! To download the complete code, please click here.

Views: 1

joushe info

Looking for new horizons

Looking for new horizons

Notifications

Receive the new articles in your email

2022 © nepy

One single breath

Configure

Choose your own style

Color

Choose the perfect color mode for you


Navigation Position

Select a suitable navigation system


Vertical Navbar Style

Switch between styles for your vertical navbar

Customize