Data Structures
Contents
Data Structures¶
This tutorial was inspired by and adapted from Shawn A. Rhoads’ PSYC 347 Course [CC BY-SA 4.0 License] and Jeremy Manning’s PSYC 32 Course [MIT License].
Learning objectives¶
This notebook is intended to teach you basic python syntax for:
Dictionaries
Classes
Everything is an object¶
Python is an object-oriented programming language. This means that each construct and datatype in Python (variables, operators, functions, etc.) is an instance of a computational object: an entity that can store data, have attributes, carry out functions, execute code, etc. Conceptually, Python code defines a set of computational objects and the rules that govern how they interact. This way of thinking and programming is inspired by ideas about how the physical world operates– e.g., where the universe defines the set of things that exist and how they interact. In the physical world, some of those objects (e.g. observable matter) can be directly measured and manipulated, whereas other objects can only be observed via their influence on observable objects (e.g. gravity). Because object-based thinking has direct analogues to how people often conceptualize the physical world, object-oriented programming can be intuitive to learn and implement.
As an aside, the most popular “alternative” to object-oriented programming is called functional programming. The basic units of computation in functional programming languages are mathematical functions. The idea of functional programming originated from lambda calculus. Functional programming techniques and approaches are beyond the scope of this course.
Lists¶
Dictionaries¶
In the previous module, we touched briefly on dictionaries in our discussion of datatypes. Specifically, we described dictionary (dict
) objects as an unordered set of named properties that each has an assigned value. Let’s pick up on the the example from that notebook:
spice_dict = {'name': 'Programming Basics for Psychological and Neural Science',
'year': 2023,
'department': 'Center for Computational Psychiatry',
'url': 'https://github.com/Center-for-Computational-Psychiatry/course_spice'}
The variable spice_dict
is a dict
object that contains several properties. We can list the properties using spice_dict.keys()
, and we can list the corresponding values (in the same order) using spice_dict.values()
. We can also access any indidual property (x
) using the syntax spice_dict['x']
:
print('spice_dict is an object of type', type(spice_dict))
print('spice_dict properties:', spice_dict.keys())
print('spice_dict values:', spice_dict.values())
print('spice_dict[\'year\']:', spice_dict['year'])
spice_dict is an object of type <class 'dict'>
spice_dict properties: dict_keys(['name', 'year', 'department', 'url'])
spice_dict values: dict_values(['Programming Basics for Psychological and Neural Science', 2023, 'Center for Computational Psychiatry', 'https://github.com/Center-for-Computational-Psychiatry/course_spice'])
spice_dict['year']: 2023
You can also dynamically add, change, and/or remove properties (keys) from dictionary objects:
spice_dict['instructor'] = 'Shawn Rhoads'
#spice_dict.pop('url')
spice_dict['name'] = 'SPICE Crash Course'
from pprint import pprint #prints dictionaries in a nicely formatted way
pprint(spice_dict)
{'department': 'Center for Computational Psychiatry',
'instructor': 'Shawn Rhoads',
'name': 'SPICE Crash Course',
'url': 'https://github.com/Center-for-Computational-Psychiatry/course_spice',
'year': 2023}
The values stored in a dictionary can be of any datatype, including other dictionaries! This can be a powerful way to represent complex concepts:
spice2023a = {'name': 'Programming Basics for Psychological and Neural Science',
'year': 2023,
'institution': 'Icahn School of Medicine at Mount Sinai',
'department': 'Center for Computational Psychiatry',
'url': 'https://github.com/Center-for-Computational-Psychiatry/course_spice'}
pprint(spice2023a)
spice2023b = {'name': 'Fundamentals of Neuroscience and Psychiatry',
'year': 2023,
'institution': 'Icahn School of Medicine at Mount Sinai',
'department': 'Center for Computational Psychiatry',
'url': 'https://github.com/Center-for-Computational-Psychiatry/course_spice'}
pprint(spice2023b)
{'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount Sinai',
'name': 'Programming Basics for Psychological and Neural Science',
'url': 'https://github.com/Center-for-Computational-Psychiatry/course_spice',
'year': 2023}
{'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount Sinai',
'name': 'Fundamentals of Neuroscience and Psychiatry',
'url': 'https://github.com/Center-for-Computational-Psychiatry/course_spice',
'year': 2023}
spice2022 = {'name': 'Programming Basics for Psychological and Neural Science',
'year': 2022,
'institution': 'Icahn School of Medicine at Mount Sinai',
'department': 'Center for Computational Psychiatry',
'url': None}
pprint(spice2022)
{'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount Sinai',
'name': 'Programming Basics for Psychological and Neural Science',
'url': None,
'year': 2022}
instructors = {'Shawn' : {'name': 'Shawn Rhoads',
'labs': ['Gu'],
'institution': 'Icahn School of Medicine at Mount Sinai',
'department': 'Center for Computational Psychiatry',
'url': 'https://shawnrhoads.github.io/',
'courses': [spice2023a]},
'Sarah' : {'name': 'Sarah Banker',
'labs': ['Gu','Schiller','Fossfeig'],
'institution': 'Icahn School of Medicine at Mount Sinai',
'department': 'Center for Computational Psychiatry',
'url': 'https://icahn.mssm.edu/profiles/sarah-banker',
'courses': [spice2023b]},
'Kaustubh': {'name': 'Kaustubh Kulkarni',
'labs': ['Gu','Schiller'],
'institution': 'Icahn School of Medicine at Mount Sinai',
'department': 'Center for Computational Psychiatry',
'url': 'https://krkulkarni.com/',
'courses': [spice2022]}
}
pprint(instructors)
{'Kaustubh': {'courses': [{'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount '
'Sinai',
'name': 'Programming Basics for Psychological and '
'Neural Science',
'url': None,
'year': 2022}],
'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount Sinai',
'labs': ['Gu', 'Schiller'],
'name': 'Kaustubh Kulkarni',
'url': 'https://krkulkarni.com/'},
'Sarah': {'courses': [{'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount '
'Sinai',
'name': 'Fundamentals of Neuroscience and Psychiatry',
'url': 'https://github.com/Center-for-Computational-Psychiatry/course_spice',
'year': 2023}],
'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount Sinai',
'labs': ['Gu', 'Schiller', 'Fossfeig'],
'name': 'Sarah Banker',
'url': 'https://icahn.mssm.edu/profiles/sarah-banker'},
'Shawn': {'courses': [{'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount '
'Sinai',
'name': 'Programming Basics for Psychological and '
'Neural Science',
'url': 'https://github.com/Center-for-Computational-Psychiatry/course_spice',
'year': 2023}],
'department': 'Center for Computational Psychiatry',
'institution': 'Icahn School of Medicine at Mount Sinai',
'labs': ['Gu'],
'name': 'Shawn Rhoads',
'url': 'https://shawnrhoads.github.io/'}}
Of note, the instructor
object (a dict
) contains (in the courses
key) a list
of dict
objects, reflecting the courses (and their respective properties) taught by the instructor. This formulation can come in handy when we want to interrogate the instructor
object. For example, we could write a function to print out the courses that a given instructor is teaching, and call this function for different instructors:
shawn = instructors['Shawn'] #provide a new (distinctive) name for existing instructors dict
sarah = instructors['Sarah'] #provide a new (distinctive) name for existing instructors dict
list_of_instructors = [shawn, sarah] #create a list of instructor dicts
#define a function that prints out the given properties for each course an
#instructor is teaching
def print_course_info(instructor, properties):
'''
inputs:
instructor: a dict with name, degree, institution, url, and courses keys
properties: a list of strings containing all or some of: department, number, url
outputs: prints out the desired information to the console
'''
print(instructor['name'],
' is in ', len(instructor['labs']), ' lab(s):',
' ', instructor['labs'], ' ',
'and is teaching ', len(instructor['courses']), ' course(s):', sep='')
for i, c in enumerate(instructor['courses']): #can you figure out what the enumerate function is doing?
print('\nCourse ', i + 1, ': ', c['name'], sep='')
for p in properties: #can you figure out what's happening here?
print('\t', p.capitalize(), ': ', c[p], sep='')
print('\nFor more information see', instructor['url'])
for f in list_of_instructors: #this is similar to the "for p in properties" line above...
print_course_info(f, ['department', 'institution']) #adjust this list to experiment
print('\n')
Shawn Rhoads is in 1 lab(s): ['Gu'] and is teaching 1 course(s):
Course 1: Programming Basics for Psychological and Neural Science
Department: Center for Computational Psychiatry
Institution: Icahn School of Medicine at Mount Sinai
For more information see https://shawnrhoads.github.io/
Sarah Banker is in 3 lab(s): ['Gu', 'Schiller', 'Fossfeig'] and is teaching 1 course(s):
Course 1: Fundamentals of Neuroscience and Psychiatry
Department: Center for Computational Psychiatry
Institution: Icahn School of Medicine at Mount Sinai
For more information see https://icahn.mssm.edu/profiles/sarah-banker
Take some time to understand the code in the preceeding cell. Also think about how you might apply these ideas to other types of representations:
Contact information in a directory or address book
Locations on a map
Response data from an experiment
Email messages
Web forms
In particular:
Which fields might you want in each circumstance?
How could you make use of a list of
dict
objects, all with the same fields, to do something useful in each of the above scenariosWhat are some tradeoffs between including many versus few fields?
Should every value be a
dict
object with its own set of properties? Or would that be overkill? What are some tradeoffs you might expect using different datatypes for each field (versus, for example, setting all values to customdict
objects)?Can you think of any potential use cases for having a
function
as a value of some field? Can you figure out how to write a short demonstration of how such a function might be called?
Classes¶
(from: https://realpython.com/python3-object-oriented-programming/)
"The primitive data structures available in Python, like numbers, strings, and lists are designed to represent simple things like the cost of something, the name of a poem, and your favorite colors, respectively.What if you wanted to represent something much more complicated?
For example, let’s say you wanted to track a number of different animals. If you used a list, the first element could be the animal’s name while the second element could represent its age.
How would you know which element is supposed to be which? What if you had 100 different animals? Are you certain each animal has both a name and an age, and so forth? What if you wanted to add other properties to these animals? This lacks organization, and it’s the exact need for classes.
Classes are used to create new user-defined data structures that contain arbitrary information about something. In the case of an animal, we could create an Animal() class to track properties about the Animal like the name and age.
It’s important to note that a class just provides structure—it’s a blueprint for how something should be defined, but it doesn’t actually provide any real content itself. The Animal() class may specify that the name and age are necessary for defining an animal, but it will not actually state what a specific animal’s name or age is.
It may help to think of a class as an idea for how something should be defined.”
One way of thinking about a Python class
is kind of like a fancier version of a dict
. Like dictionaries, classes provide a way to group together named properties, data, and functions into a single object. What makes classes special is that they can:
provide a template structure that can be applied to other Python objects
have methods (functions) that enable you to directly modify their own properties
Soon we’ll explore how these properties are implemented and what they’re useful for. But before we dig into those details, it can be useful to consider that everything in Python– every datatype, operator, etc. is an instance of a class. Knowing this fact at this point may not specifically enhance your understanding of classes (or Python more broadly). However, perhaps it gives you a sense of how encompassing and powerful classes are. They are the fundamental construction that makes Python an object-oriented language. In particular, classes allow you to define new datatypes.
Classes as templates¶
Consider the dictionaries we defined above. Each course dictionary contained the same fields (‘department’, ‘name’, ‘number’, and ‘url’), and each instructor dictionary contained the same fields (‘name’, ‘degree’, ‘institution’, ‘url’, ‘courses’). Further, the values in each dictionary’s fields were similar across instances of courses and instructors. For example, all of the courses’ ‘name’ fields contained string objects; all of the instructors’ ‘courses’ fields contained lists of course dictionary objects; and so on. In other words, the general “formats” of different course dictionaries were similar to each other, and the formats of different instructor dictionaries were similar to each other. This replicated structure can be accomplished more succinctly, and with less potential for errors or inconsistencies, using classes.
Defining a new class¶
A new class
object may be defined using the class
keyword:
class NewClassName:
<body of class definition>
The body of a class
defines the set of properties and abilities each class instance will have. An instance is a parameterized version of the class. For example, 'hello'
and 'goodbye'
are two instances of str
objects.
The body of a class
can contain any Python code. In addition, classes support a keyword called self
. Inside of the class definition’s body, self
refers to the specific instance of the class. If you haven’t encountered the notion of classes before, this probably seems quite abstract. Soon we’ll get to some concrete examples that will hopefully make things clearer.
__init__
¶
The __init__
method is a special function, defined for each type of class, that allows the user to create a new class instance. Usually the __init__
method sets properties of the object and/or carries out setup tasks. The first argument of __init__
is always self
.
Example: counter object¶
To illustrate how classes work, let’s define a new type of object: a Counter
. Counter
objects will contain an internal ‘count’ property that starts at 0 and can be incremented, decremented (to a minimum of 0), or reset:
class Counter:
def __init__(self):
self.count = 0
def increment(self):
self.count += 1
def decrement(self):
if self.count > 0:
self.count -= 1
def reset(self):
self.count = 0
Now let’s play around with some Counter
objects and see how they work:
a = Counter()
b = Counter()
print('a is an instance of', type(a))
print('b is an instance of', type(b))
a.increment() #increase a's count to 1
a.increment() #increase a's count to 2
b.decrement() #should do nothing
print('a\'s count is', a.count)
print('b\'s count is', b.count)
a is an instance of <class '__main__.Counter'>
b is an instance of <class '__main__.Counter'>
a's count is 2
b's count is 0
Notice that even though a
and b
are instances of the same type of class (Counter
), the internal counters of those two objects can change independently and be referenced independently. Try playing around with these objects:
Make sure you understand what each function does and how it works
Add a function that prints out the current value of the counter (e.g. ‘My count is 7’)
Also note the difference between accessing a property of an instance of Counter
(e.g. a.count
) versus calling one of the instance’s methods (e.g. a.reset()
).
Fancy stuff¶
As described above, the __init__
method is “special” in that Python knows to call the __init__
function in order to create a new instance of the class. The user doesn’t need to explicitly call __init__
; Python just calls it “under the hood”.
Python allows you to define dozens of other special methods that can enhance how you interact with instances of the class. For example, the __add__
method allows you to support applying addition (+
) operator to instances of the class (e.g. a + b
, where a
and b
are instances of Counter
objects). It’s worth reading through the full set of special methods in case they might come in handy later on; here is a nice summary. Note that you don’t need to memorize this list. If you find yourself wanting to support some operation on a new class that you’ve defined, you can always consult the above list or do a web search to remind yourself.
Inheritance¶
The biological taxonomy defines a hierarchy of categories for all living things. For example, humans and whales are both mammals, which are in turn both animals. This organization is useful for biologists because it provides a succinct way of noting that all “instances” of animals satisfy a particular set of properties, all instances of mammals satisfy an additional set of properties (along with also satisfying all properties of animals more generally), and so on. As one moves down the phylogenetic tree from general to specific, the more specific instances “inherit” the properties of the less-specific classes that they belong to.
In Python we can define a given class (e.g. Mammal
) to be a specific “kind” of another broader class (e.g. Animal
) using a technique called inheritance. The syntax is (source):
class DerivedClassName(BaseClassName):
<statement-1>
.
.
.
<statement-N>
The class DerivedClassName
inherits all of the properties and methods of BaseClassName
unless they are specifically re-defined. For example, let’s create a sub-class of Counter
that keeps track of how many people are enrolled in a given course:
class Registrar(Counter):
def __init__(self, course_name, max_enrollment):
self.name = course_name
self.limit = max_enrollment
self.count = 0
self.students = [] #an empty list
def enroll(self, student):
if self.count < self.limit:
self.students.append(student)
self.increment() #uses Counter's increment method
else:
raise Exception('enrollment cap reached, cannot enroll student')
def drop(self, student):
if student in self.students:
self.students.remove(student)
self.decrement() #uses Counter's decrement method
else:
raise Exception('student not found: ' + student)
def roster(self):
print('\n')
if self.count == 0:
print('No students are enrolled.')
else:
print('Course roster (',
self.count,
' students are enrolled):\n',
'\n'.join(self.students), sep='') #can you figure out what this line does?
Let’s experiment with this new type of object:
spice23 = Registrar('SPICE', 10)
spice23.roster() #empty roster
spice23.enroll('Macey')
spice23.enroll('Tvisha')
spice23.enroll('John')
spice23.enroll('Olivia')
spice23.roster() #four students enrolled
spice23.drop('John')
spice23.roster() #three students enrolled
spice23.enroll('Chris')
spice23.enroll('Nikash')
spice23.enroll('Advait')
spice23.enroll('Rithvik')
spice23.enroll('Julia')
spice23.enroll('Fatiha')
spice23.enroll('Leila') #max enrollment reached
spice23.roster()
spice23.enroll('Qixiu') #should give an error
No students are enrolled.
Course roster (4 students are enrolled):
Macey
Tvisha
John
Olivia
Course roster (3 students are enrolled):
Macey
Tvisha
Olivia
Course roster (10 students are enrolled):
Macey
Tvisha
Olivia
Chris
Nikash
Advait
Rithvik
Julia
Fatiha
Leila
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/tmp/ipykernel_2239/1372793491.py in <module>
21 spice23.roster()
22
---> 23 spice23.enroll('Qixiu') #should give an error
/tmp/ipykernel_2239/3931153931.py in enroll(self, student)
11 self.increment() #uses Counter's increment method
12 else:
---> 13 raise Exception('enrollment cap reached, cannot enroll student')
14
15 def drop(self, student):
Exception: enrollment cap reached, cannot enroll student
Let’s also attempt to drop a student who hasn’t enrolled in the class (this should also give an error):
spice2023.drop('Heidi')
Next steps¶
If you’re new to the idea of classes and inheritence, it’s worth taking some time to practice. Some ideas:
Modify the
Counter
class defined above to support addition and subtraction between different counters. For example, ifa
andb
are bothCounter
objects,a
-b
should yield a new instance ofCounter
object whose count is the either the difference betweena.count
andb.count
or 0 (whichever is greater).
# Insert your code here