Hannes Datta
We're about to start with today's tutorial (“API 101”).
Deep-dive on APIs
for and whilePlus: as usual, focus on modularization (writing functions)!
DO: Browse the documentation of music-to-scrape's API.
Then, run the snippet below.
import requests
url = "https://api.music-to-scrape.org/users"
response = requests.get(url)
request = response.json()
print(request)
?limit=20) and run again. What happens?JSON data is structured in a hierarchy
DO:
.json file and view in VS Code (install JSON plugin first)import json
f=open('users.json', 'w', encoding='utf-8')
f.write(json.dumps(request))
f.close()
['name_of_attribute'] or.get('name_of_attribute')request['limit']
# or:
request.get('limit')
DO
request['limit'] # this one worked for extracting the value for limit...
#Q1:
request['data']
#Q2:
user_names = [] # initialize empty array
for user in request['data']: # start loop
user_names.append(user['username']) # append user name to the list of user_names (initialized above)
user_names # inspect the result
DO: Suppose we do not want to extract the ages and country names for the users…
Can you come up with a way to anonymize them (e.g., overwrite them with NA), while keeping the rest of the dictionary intact?
new_dic = []
for user in request['data']:
obj = user
obj['username']='NA'
obj['age']='NA'
new_dic.append(obj)
new_dic
.append() adds one item at a time to an existing listusers = [] # empty list
users.append('another user')
users.append('yet another user')
users
extend() adds multiple items to an existing listusers = []
new_users = ['another user','yet another user']
users.extend(new_users)
users
myapi.com/search/cats_and_dogs [not the case for music to scrape]myapi.com/search/?query=cats_and_dogs The API documentation will tell you what is required!
Example
params requires to be a dictionary with the parameter names and corresponding valuesimport requests
url = "https://api.music-to-scrape.org/users"
response = requests.get(url, params = {'limit': 15})
request = response.json()
print(request)
Can you speculate about the benefits of submitting parameters in the header (params) rather than in the URL?
DO
offset parameter to the snippet below. What happens when you set it to 1? What happens when you set it to 5?Tip: Remember iterating through pages on a website to “view” data? APIs know the same concept!
# start with this code
import requests
url = "https://api.music-to-scrape.org/users?limit=10"
response = requests.get(url)
request = response.json()
print(request)
# q1: setting the offset parameter
import requests
requests.get("https://api.music-to-scrape.org/users?limit=10").json()
requests.get("https://api.music-to-scrape.org/users?limit=10&offset=1").json()
requests.get("https://api.music-to-scrape.org/users?limit=10&offset=5").json()
# q2:
requests.get("https://api.music-to-scrape.org/users?limit=10&offset=10").json()
DO
get_users(), with parameters limit and offset, returning the dictionary of users from the API endpoint /users.import requests
def get_users(limit, offset):
obj = requests.get(f"https://api.music-to-scrape.org/users?limit={limit}&offset={offset}").json()
return(obj['data'])
get_users(10,1)
for loop.for x in range(6):
print(x)
Do: Modify the snippet below so that it calls get_users() 10 times, incrementing the offset by 10 at each iteration.
offset=0
for x in range(10):
print(get_users(limit=10, offset=offset))
offset=offset+10
for loops (you usually know beforehand when to stop), and while loops (the ending point can change, say when “there is no new data coming in”)# for loop
for x in range(6):
print(x)
cntr = 0
while cntr < 6:
print(cntr)
cntr = cntr+1
We can now combine our learnings to build a function that extracts 100 user names and meta data to new-line separated JSON files.
.py file and test whether it runs from command prompt/terminal# will develop in class
import requests
import json
cntr = 0
f=open('output.json', 'w')
while cntr <= 50:
f.write(json.dumps(get_users(limit=10, offset=cntr)))
f.write('\n')
cntr = cntr+10
f.close()
DO
The tutorial proceeds by introducing a series of additional endpoints.
user/plays - get a user's total number of playscharts/top-artists - see a list of top-performing artist for this week (and previous weeks)user/plays, following the guidelines in the documentation. Do you succeed?charts/top-artists. Do you get some output?# code in class