Review

  • Last week I talked about lists.
  • Lists are like arrays in other languages.
    • They are quick to access but may be slow to extend.
  • Literal lists are represented with brackets [ and ]

Here are examples of list literals:

[1, 2, 3, 4]                  # A list of ints
["hello", "list", "world"]    # A list of strings 
[1, 'hello', 2, 'list', True] # A mixed type list 
[[1, 'Hello'], [2, 'World']]   # A list of lists
  • Lists are held in variables just like ordinary types.
  • Some common math operations are defined on lists.

You can add lists together with the plus + operator:

>>> foo = [1, 2, 3, 4]
>>> bar = ["hello", "list", "world"]
>>> foo + bar 
[1, 2, 3, 4, 'hello', 'list', 'world']

You can multiply lists (causing them to repeat) with the times * operator:

>>> foo * 3 
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
  • List elements are accessed using brackets [ and ]
  • List indexes start a 0
>>> foo[0]
1
>>> foo[2]
3
>>> foo[-1]
4

Simply looping over a list:

>>> for item in foo : 
...   print (item) 
... 
1
2
3
4

Looping over a list with indexes and items using the enumerate() function:

>>> for index, item in enumerate(foo) : 
...   print (index, item) 
... 
0 1
1 2
2 3
3 4

Looping over an arbitrary range of integers using a range:

>>> for i in range(0,5) : 
...   print (i) 
... 
0
1
2
3
4
  • We looked at while loops.
  • Use a while loop when you don't know how many times the loop should run.
>>> while True : 
...   num = int(input('Enter a number from 1 to 10: '))
...   if num >= 1 and num <= 10 : 
...     break 
... 
Enter a number from 1 to 10: -1
Enter a number from 1 to 10: 0
Enter a number from 1 to 10: 5

Dictionaries

  • Dictionaries are like lists, but instead of an index you can use any value.
  • The index of a dictionary is known as a key
    • Each key holds a value
  • Dictionary literals are declared with curly braces
    • The key is on the left side of the colon
    • The value is on the right side of the colon

Here's a dictionary literal:

{} # An empty dictionary
{'key1': 'value1', 'key2': 'value2', 3 : 3 }

You can assign literals to a variable like this:

foo = {'key1': 'value1', 'key2': 'value2', 3 : 3 }
  • Dictionaries are accessed similarly to lists.
  • Watch out! Make sure you put quotes around strings.
>>> foo['key1']
'value1'
>>> foo['key2']
'value2'
>>> foo[3]
3
  • You can assign new keys to a dictionary using the index operator
  • You can change existing keys too
    • A dictionary can only have one value per key
    • Reassigning a key replaces the old value with the new value.
>>> foo['newval'] = 'blah'
>>> foo 
{'key1': 'value1', 'key2': 'value2', 3: 3, 'newval': 'blah'}
>>> foo['newval'] = 10 
>>> foo 
{'key1': 'value1', 'key2': 'value2', 3: 3, 'newval': 10}
  • You can loop over dictionaries easily with a for loop

This simple loop iterates over keys:

>>> for key in foo : 
...   print (key) 
... 
key1
key2
3
newval

You can loop over values too:

>>> for value in foo.values() : 
...   print (value) 
... 
value1
value2
3
10

You can also loop over both keys and values:

>>> for key, value in foo.items() : 
...   print (key, value) 
... 
key1 value1
key2 value2
3 3
newval 10

Dictionary Operations

  • It's often useful to test if a dictionary contains a key.
  • The in operator does that.

Test if an environment variable is set:

>>> import os 
>>> if 'USER' in os.environ : 
...   print (f"$USER is set to {os.environ['USER']}")
... 
$USER is set to maximus
  • You can ask the opposite question with not in

Check if a key is not in a dictionary:

>>> foo = {'one' : 1, 'two': 2}
>>> if 'three' not in foo : 
...   print ('Better add three') 
... 
Better add three
  • The in operator works on lists too!
    • It tests if a value exists in the list.
>>> with open ('/usr/share/dict/words') as w :
...   dict_words = [word.strip().lower() for word in w.readlines()]
... 
>>> if 'kazoo' in dict_words :
...   print ('Buzz buzz')
... 
Buzz buzz
  • The get() function works like the brackets except for what happens when the key doesn't exist.
    • get() returns None
    • The index operator raises a KeyError

Here's an example of how get() and index operator differ:

>>> foo.get('badkey') 
>>> foo['badkey'] 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'badkey'
  • The clear() function removes all key/values from the dictionary
  • The len() function works on dictionaries (just like lists)

Here's an example of len() with dictionaries.

>>> foo = {'one' : 1, 'two': 2}
>>> len(foo)
2
  • Notice that the length is the number of key/value pairs.
  • You can merge dictionaries using the update() function
>>> foo = {'one' : 1, 'two': 2}
>>> bar = {'three': 3, 'four': 4}
>>> foo.update(bar)
>>> foo 
{'one': 1, 'two': 2, 'three': 3, 'four': 4}

Be careful! You can overwrite keys this way.

>>> foo.update({'one': 100}) 
>>> foo
{'one': 100, 'two': 2, 'three': 3, 'four': 4}

Dictionary Data Structures

  • Python allows you to mix and match lists and dictionaries
  • You can make interesting data structures this way
  • Consider making a phone book
    • Each entry has the some data about a person.
      • Email Address
      • Mobile Number
      • Home Number
      • Work Number
  • You can represent each entry as a dictionary.
>>> contacts = {}
>>> contacts['Bob Smith'] = { 'email' : 'bob@company.com', 
...   'mobile' : '555-1212', 
...   'home' : '555-3434', 
...   'work' : '555-6767'
... }
  • Now we can access Bob's information
>>> contacts['Bob Smith']['email'] 
'bob@company.com'
  • You could also make a dictionary with users of your blog website
  • Users have the standard attributes:
    • Real Name
    • Email Address
    • Posts
  • A post contains some attributes:
    • Title
    • Text

Now let's create a user and some posts:

>>> post1 = {'title': 'First post!',  
...   'text': "This is my first post to my new blog."
... }
>>> post2 = {'title': 'Ate Cereal for Breakfast.', 
...   'text': "I ate cereal today they were Heritage O's. High in fiber."
... }
>>> bloggers = {
...   'mike' : {
...     'name' : 'Mike Matera', 
...     'email' : 'matera@matera.com',
...     'posts' : [post1, post2]
...   }
... }

Now you can access a post like this:

>>> bloggers['mike']['posts'][0]['title']
'First post!'
>>> bloggers['mike']['posts'][0]['text']
'This is my first post to my new blog.'
  • When you have complex data structures it helps to have functions to perform common activities.
  • Functions help you by naming common operations.
  • Functions help keep your structure consistent.
  • Let's add functions to manipulate our data.

Here's a function that creates a user.

def create_user(data, username, realname, email) : 
    ''' Create a user in a blog data structure 
 
    Args: 
        data - The data structure to use 
        username - The user's username 
        realname - The uers's real name 
        email - The users's email address. 
    '''
    data[username] = {}
    data[username]['name'] = realname 
    data[username]['email'] = email 
    data[username]['posts'] = []
  • Important: That when the user is created they get an empty list of posts.
  • I pass the data structure in to avoid using global data

Here's a function that adds a post:

def add_post(data, username, title, text) : 
    ''' Append the post to the user's list of blog posts. 
 
    Args:
        data - The blog data structure to use. 
        username - The user who wrote the post. 
        title - The title of the new post. 
        text - The text of the post. 
    '''
    data[username]['posts'].append({'title' : title, 'text' : text})    
  • The previous functions change our data.
  • It's very useful to have functions that access data.

Here's a function that prints all the posts from a particular user:

def print_blogs(data, username) : 
    ''' Print all of the blog entries for a user.
 
    Args:
        data - The blog data structure to use. 
        username - The user to print. 
    '''
    for blog in data[username]['posts'] : 
        print ('Title:', blog['title'])
        print ('Text:', blog['text'])

Data Structure Representations

  • When your blog entries are in a variable they're in the computer's memory
  • If your program exits your data is lost.
  • It's important to be able to save your program data
    • To do that you must pick a representation.
  • Javascript Object Notation (JSON) is a popular data format.
  • JSON is well supported by Python
  • You can save most Python data using JSON easily

Here's how to use JSON in a program:

>>> import json 
>>> json.dumps(bloggers) 
'{"mike": {"name": "Mike Matera", "email": "matera@matera.com", "posts": [{"title": "First post!", "text": "This is my first post to my new blog."}, {"title": "Ate Cereal for Breakfast.", "text": "I ate cereal today they were Heritage O\'s. High in fiber."}]}}'
  • JSON data is very similar to how data is represented using Python literals
  • The dumps() function converts Python data into a JSON string
  • The loads() function does the opposite.
>>> json_string = json.dumps(bloggers) 
>>> data = json.loads(json_string)
>>> data 
{'mike': {'name': 'Mike Matera', 'email': 'matera@matera.com', 'posts': [{'title': 'First post!', 'text': 'This is my first post to my new blog.'}, {'title': 'Ate Cereal for Breakfast.', 'text': "I ate cereal today they were Heritage O's. High in fiber."}]}}
  • We can add functions to our blogging program that allow users to load and store the blog database.

Here's a function that loads the blog database:

def load_blogs(filename) : 
    with open(filename) as f : 
        return json.loads(f.read())

And a corresponding function that saves blogs:

def save_blogs(data, filename) :
    with open(filename, 'w') as f : 
        f.write(json.dumps(data))

JSON and The Web

  • JSON is used by many websites as a part of their official Application Programming Interface (API)
  • An API is a way for a program to access a website
  • APIs make it much easier for your program to get useful data.
  • APIs have a special URL called an endpoint

See what happens when you browse to Wikipedia's endpoint:

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Programming

Hard to read for humans but easy for Python! Here's code that makes the data available as a Python program:

>>> import requests 
>>> import json 
>>> response = requests.get('https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Programming')
>>> response.status_code 
200
>>> data = json.loads(response.text) 
>>> data 
{'batchcomplete': '', 'query': {'pages': {'6271327': {'pageid': 6271327, 'ns': 0, 'title': 'Programming', 'extract': 'Programming may refer to:\nBroadcast programming, scheduling content for television\nComputer programming, the act of instructing computers to perform tasks\nProgramming language, an artificial language designed to communicate instructions to a machine\nGame programming, the software development of video games\n\nDramatic programming, fictional television content\nMathematical programming, or optimization, is the selection of a best element\nNeuro-linguistic programming, a pseudoscientific method aimed at modifying human behavior\nProgramming (music), generating music electronically\nRadio programming, scheduling content for radio'}}}}
  • The structure of the response can sometimes be a bit complicated.
  • JSON APIs have to be flexible enough to handle huge responses
    • When the response is large you need to do paging (only a few responses at a time)

Let's take a look at the contents of the response above:

>>> for key in data : 
...   print (key) 
... 
batchcomplete
query
>>> 
>>> for key in data['query'] : 
...   print (key) 
... 
pages
>>> 
>>> for key in data['query']['pages'] : 
...   print (key) 
... 
6271327
>>>
>>> for key in data['query']['pages']['6271327'] : 
...   print (key) 
... 
pageid
ns
title
extract
>>>
>>> data['query']['pages']['6271327']['pageid'] 
6271327
>>> data['query']['pages']['6271327']['title']
'Programming'

APIs in Practice

  • Most sites have an API
  • Some require authentication and some don't
  • APIs are often self-describing
    • They tell you what you can ask for

Here's an example of using the GitHub API:

>>> import requests 
>>> import json 
>>> response = requests.get('https://api.github.com/') 
>>> data = json.loads(response.text) 
>>> for endpoint in data : 
...   print (f'name: {endpoint} url: {data[endpoint]}')
... 
name: current_user_url url: https://api.github.com/user
name: current_user_authorizations_html_url url: https://github.com/settings/connections/applications{/client_id}
name: authorizations_url url: https://api.github.com/authorizations
name: code_search_url url: https://api.github.com/search/code?q={query}{&page,per_page,sort,order}
name: commit_search_url url: https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}
name: emails_url url: https://api.github.com/user/emails
name: emojis_url url: https://api.github.com/emojis
name: events_url url: https://api.github.com/events
name: feeds_url url: https://api.github.com/feeds
name: followers_url url: https://api.github.com/user/followers
name: following_url url: https://api.github.com/user/following{/target}
name: gists_url url: https://api.github.com/gists{/gist_id}
name: hub_url url: https://api.github.com/hub
name: issue_search_url url: https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}
name: issues_url url: https://api.github.com/issues
name: keys_url url: https://api.github.com/user/keys
name: notifications_url url: https://api.github.com/notifications
name: organization_repositories_url url: https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}
name: organization_url url: https://api.github.com/orgs/{org}
name: public_gists_url url: https://api.github.com/gists/public
name: rate_limit_url url: https://api.github.com/rate_limit
name: repository_url url: https://api.github.com/repos/{owner}/{repo}
name: repository_search_url url: https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}
name: current_user_repositories_url url: https://api.github.com/user/repos{?type,page,per_page,sort}
name: starred_url url: https://api.github.com/user/starred{/owner}{/repo}
name: starred_gists_url url: https://api.github.com/gists/starred
name: team_url url: https://api.github.com/teams
name: user_url url: https://api.github.com/users/{user}
name: user_organizations_url url: https://api.github.com/user/orgs
name: user_repositories_url url: https://api.github.com/users/{user}/repos{?type,page,per_page,sort}
name: user_search_url url: https://api.github.com/search/users?q={query}{&page,per_page,sort,order}
>>> user_response = requests.get(data['user_url'].format(user='mike-matera'))
>>> user_data = json.loads(user_response.text) 
>>> for key in user_data : 
...   print (f'key: {key} value: {user_data[key]}')
... 
key: login value: mike-matera
key: id value: 1709049
key: avatar_url value: https://avatars2.githubusercontent.com/u/1709049?v=4
key: gravatar_id value: 
key: url value: https://api.github.com/users/mike-matera
key: html_url value: https://github.com/mike-matera
key: followers_url value: https://api.github.com/users/mike-matera/followers
key: following_url value: https://api.github.com/users/mike-matera/following{/other_user}
key: gists_url value: https://api.github.com/users/mike-matera/gists{/gist_id}
key: starred_url value: https://api.github.com/users/mike-matera/starred{/owner}{/repo}
key: subscriptions_url value: https://api.github.com/users/mike-matera/subscriptions
key: organizations_url value: https://api.github.com/users/mike-matera/orgs
key: repos_url value: https://api.github.com/users/mike-matera/repos
key: events_url value: https://api.github.com/users/mike-matera/events{/privacy}
key: received_events_url value: https://api.github.com/users/mike-matera/received_events
key: type value: User
key: site_admin value: False
key: name value: Mike Matera
key: company value: Cabrillo College
key: blog value: http://blog.lifealgorithmic.com
key: location value: United States
key: email value: None
key: hireable value: None
key: bio value: None
key: public_repos value: 28
key: public_gists value: 1
key: followers value: 14
key: following value: 6
key: created_at value: 2012-05-05T17:22:12Z
key: updated_at value: 2018-02-26T19:03:27Z

APIs That Require Authentication

  • Many sites require you to use the API as a registered user.
  • You should NEVER give your password over an API
  • As a registered user you can retrieve an access key
    • Access keys secrets that let the site know who you are

Check out what happens when you search the Twitter API without an access key:

https://api.twitter.com/1.1/search/tweets.json?q=%40python

Here are instructions for getting a key from Twitter:

https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens

  • When you create an application on Twitter you receive a:
    • Consumer Key (API Key)
    • Consumer Secret (API Secret)
  • You use those keys to get an Access Token
  • An access token is like a password that expires soon
  • Every request you make must have an access token
    • The keys won't work!

Here's a program that gets an access token from keys:

access_token.py
''' Use the Twitter API to get an access token. ''' 
 
import base64 
import requests 
import json
 
API_KEY = 'your-key-here'
API_SECRET = 'your-secret-here'
 
endpoint = 'https://api.twitter.com/oauth2/token'
 
auth_key = f'{API_KEY}:{API_SECRET}' 
auth_encoded = base64.b64encode(auth_key.encode('utf-8')).decode('utf-8')
postdata = { 'grant_type' : 'client_credentials' }
headers = {'Authorization' : f'Basic {auth_encoded}', 'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8'}
 
auth_response = requests.post(endpoint, postdata, headers=headers) 
auth_data = json.loads(auth_response.text)
 
for key in auth_data : 
    print (f'{key}: {auth_data[key]}')

Executing the code will retrieve and print an access token. You need the access token to make requests.

Do it the Python (Easy) Way

  • Calling web APIs directly using requests is cumbersome
  • When you add authentication it gets really hard.
  • Most popular websites have Python modules that automate most of the hard work.
  • You can use the Twitter API with ease once you install the module:
$ pip-3.6 install --upgrade --user python-twitter

With the Twitter API installed you can easily get tweets:

get_tweets.py
'''Search for tweets''' 
 
import sys 
import twitter
 
prog, user = sys.argv
 
api = twitter.Api(consumer_key='your-key-here',
                  consumer_secret='your-secret-here',
                  access_token_key='your-access-token',
                  access_token_secret='your-token-secret')
 
tweets = api.GetUserTimeline(screen_name=user)
for tweet in tweets : 
    print (tweet.text)