Are you tired of going to the browser, downloading the data you want, and then saving it to your desired folder? Well, here is your solution! You can download the data from the web using Python! Let everything be automated!
Are you tired of going to the browser, downloading the data you want, and then saving it to your desired folder? Well, here is your solution! You can download the data from the web using Python! Let everything be automated!
The data used for this tutorial was downloaded from the following source: https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv.
After you have searched your file on the web (it can be any file from any web), the first thing you should do is to right-click on the file and copy its link address as shown in the figure below.
Now, go to your Python file and paste this link address of the file in order to read and download the file. For this purpose, since we are working with a link address, we have to import the request
library from the urllib
library.
#Importing library
from urllib import request
#Reading the file from the link
file_url = r'https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv?raw=true'
The letter r
in the code stands for reading mode. Note that the link address should be inside the quotation marks (' ').
Now, we will get the file downloaded line by line and saved in a text file (which is not yet created). For this purpose, we will define a function for doing this, and then, at the end, we should call this function in order to get the data.
#Defining a function to download the file
def file_info(url):
#Opening the url file
file_open = request.urlopen(url)
#Reading the file
file_content = file_open.read()
#Converting into string
content = str(file_content)
#Splitting the lines
lines = content.split('\\n')
Notice that the function's name is file_info
, and its input is called url
, which can be differently named, as you prefer. However, if you do so, do not forget to change the corresponding names in the upcoming code lines!
Once the function is defined, the first thing we should do is to open the file from the web. For this, the function request.urlopen
is needed. Then, in order for Python to go through the whole file and read it, the function read
is needed.
The opened file by Python is in the bit
format, which is a complex format to work with. Thus, the need to convert it to a string
format arises. After doing so, we must split the lines of the file, otherwise, the whole content of the file will be in one long line.
Now that Python is able to read the file from the web, we will save it as a new file in the same directory as our Python script file. For this purpose, we just need 4 lines of code!
#Saving data into a text file
with open('vaccinations.txt', 'w') as output_file:
for line in lines:
save_data = output_file.write(line + '\n')
print(save_data)
Python has the possibility to 'open' a file that does not exist in a write mode. The write mode 'w' means that the text file Python just created is ready to be written. In the first line, output_file
is the name of the variable. It is similar to this:
output_file = open('vaccinations.txt', 'w')
Then, the second line of the code is used to go through the lines
variable, which contains the content of our web file. Once Python has read all the lines of the web file, it will copy and paste it into the created text file using the write
function, and then save it. As already explained before, the keyword '\n'
is used to split the lines.
Once we got the text file created with the content from the web file, we just need to call our previous created function file_info(url)
.
#Calling the function
file_info(file_url)
If we run this code, the text file created by Python will be found in the same folder as your Python script.
The final code will look like this:
#Importing library
from urllib import request
#Reading the file from the link
file_url = r'https://github.com/owid/covid-19-data/blob/master/public/data/vaccinations/vaccinations.csv?raw=true'
def file_info(url):
#Opening the url file
file_open = request.urlopen(url)
#Reading the file
file_content = file_open.read()
#Converting into string
content = str(file_content)
#Splitting the lines
lines = content.split('\\n')
#Saving data into a text file
with open('vaccinations.txt', 'w') as output_file:
for line in lines:
save_data = output_file.write(line + '\n')
print(save_data)
#Calling the function
file_info(file_url)
Congratulations! You just made your first step into huge amount of data! Keep coding! To download the complete code, please click here.
Views: 1
Notifications
Receive the new articles in your email