1. Basics of Series and DataFrames
Contents
1. Basics of Series
and DataFrames
¶
pandas documentation¶
First, I want to give you a quick tour of the pandas documentation website. This documentation is essential to using pandas and I refer back to it literally all the time.
pandas User Guide: The User Guide covers all of pandas by topic area. Each of the subsections introduces a topic (such as “working with missing data”), and discusses how pandas approaches the problem, with many examples throughout.
pandas API reference: This page gives an overview of all public pandas objects, functions and methods.
What is a DataFrame
?¶
There are two main concepts that are essential to using pandas.
The first important concept is a DataFrame
. In pandas, a DataFrame
is a table of data organized and accessed by rows and columns. In many ways, it is equivalent to a basic spreadsheet or CSV file. Here’s a visual of what our sampleData.csv looks like as a DataFrame
.
Files
You can download sampleData.csv here
.
Indexes:
By default, pandas assigns an index
to every row and every column in a DataFrame
.
The index
(position) is an integer that starts at 0 and counts up.
50 rows → row index from 0 to 49
10 columns → column index from 0 to 9
Hint
When index
is used generically, it refers to the row index rather than the column index.
For rows, the index
is typically identical to the row label, but sometimes people change the index
or row labels to be strings or letters.
10 rows → row index from A to J
Labels:
Column labels correspond to the name of each column. They are typically a string, but can be numbers too.
Most of the time, column labels of a
DataFrame
are equivalent to the “column headers” (or first row) seen in most spreadsheet editors.For 3rd column in our
DataFrame
, the column index is2
while the column label is “creator”.
What is a Series
?¶
The second important concept is a Series
. In pandas, a Series
is equivalent to a column in a CSV or spreadsheet. DataFrames
are composed of 2 or more Series
.
Series
also have a row index or label. Here is an example of a Series
(or column) named “title” from sampleData.csv.
Make a DataFrame
¶
To use pandas, we need to import the library. Typically, pandas is abbreviated as pd
.
import pandas as pd
Now, let’s use the pandas function read_csv()
to read sampleData.csv into a DataFrame
.
New function
pd.read_csv()
: Reads a CSV file into a DataFrame.
Let’s give our resulting DataFrame
the commonly used variable name df
, although we could call it whatever we want. Let’s also print df
to see what it looks like in the terminal.
filename = 'sampleData.csv'
df = pd.read_csv(filename)
print(df)
item_identifier advisor creator \
0 1.0 Wolberger, Cynthia Daniels, Casey M.
1 2.0 Fallin, Margaret Daniele Collado Torres, Leonardo
2 3.0 Neelon, Sara E. Benjamin Caswell, Bess L. L.
3 4.0 Roter, Debra L. Jamal, Leila
4 5.0 Sears, Cynthia L Rouhani, Saba
.. ... ... ...
85 86.0 Spall, James C Chen, Tianyi
86 87.0 Dredze, Mark Benton, Adrian
87 87.0 Dredze, Mark Benton, Adrian
88 88.0 Andreou, Andreas G Fischl, Kate Danielle
89 89.0 Shpitser, Ilya Nabi Abdolyousefi, Razieh
date_issued title \
0 2015-07-31 Characterization of the ADP-ribosylated proteo...
1 2016-07-25 Annotation-Agnostic Differential Expression an...
2 2016-09-07 Child diet over three seasons in rural Zambia:...
3 2017-02-02 Exploring Parental Involvement in Rare Disease...
4 2018-05-23 Gut Microbes, Enteropathy and Child Growth: Th...
.. ... ...
85 2018-08-20 A Fast Reduced-Space Algorithmic Framework for...
86 2018-10-25 Learning Representations of Social Media Users
87 2018-10-25 Learning Representations of Social Media Users
88 2019-05-13 Neuromorphic Models of the Amygdala with Appli...
89 2021-03-29 Causal Inference Methods For Bias Correction I...
degree_discipline \
0 Biochemistry
1 Biostatistics
2 Human Nutrition
3 Bioethics
4 Global Disease Epidemiology & Control
.. ...
85 Mathematics
86 Computer Science
87 Computer Science
88 Computer Engineering
89 Computer Science
degree_grantor \
0 Johns Hopkins University. Bloomberg School of ...
1 Johns Hopkins University. Bloomberg School of ...
2 Johns Hopkins University. Bloomberg School of ...
3 Johns Hopkins University. Bloomberg School of ...
4 Johns Hopkins University. Bloomberg School of ...
.. ...
85 Johns Hopkins University. Whiting School of En...
86 Johns Hopkins University. Whiting School of En...
87 Johns Hopkins University. Whiting School of En...
88 Johns Hopkins University. Whiting School of En...
89 Johns Hopkins University. Whiting School of En...
degree_department \
0 Biochemistry and Molecular Biology
1 Biostatistics
2 International Health
3 Health Policy and Management
4 International Health
.. ...
85 Applied Mathematics and Statistics
86 Computer Science
87 Computer Science
88 Electrical and Computer Engineering
89 Computer Science
committee_member contributor_author
0 Leung, Anthony K. L.|Matunis, Michael J.|Dingl... NaN
1 Leek, Jeffrey T.|Hansen, Kasper D.|Battle, Ale... NaN
2 West, Keith P., Jr.|Talegawkar, Sameera|Fanzo,... NaN
3 Kass, Nancy E.|Saloner, Brendan|Bodurtha, Joan... NaN
4 Kosek, Margaret N|Dowdy, David W|Sack, David ... NaN
.. ... ...
85 Basu, Amitabh|Curtis, Frank E|Robinson, Daniel P NaN
86 Arora, Raman|Yarowsky, David|Hovy, Dirk NaN
87 Arora, Raman|Yarowsky, David|Hovy, Dirk NaN
88 Etienne-Cummings, Ralph|Sarma, Sridevi|Pouliqu... NaN
89 Scharfstein, Daniel|Tchetgen Tchetgen, Eric|Og... NaN
[90 rows x 10 columns]
While this doesn’t print out every row or column of df, it helps us see the overall structure of the DataFrame
. pandas also has many useful functions to help get a closer look at your data. Let’s try some!
New function
pd.read_json()
: Converts a JSON string to a DataFrame or Series.
New function
pd.read_excel()
: Reads an Excel file into a DataFrame.
New function
pd.read_sql()
: Reads SQL database table into a DataFrame.
New function
pd.from_dict()
: Creates a DataFrame from dictionary or an array of dictionaries.
View your DataFrame
¶
Sometimes it’s nice to get an overview of your data without having to scroll through a bunch of rows. head()
and tail()
are really great for this.
Use head()
on your DataFrame
to see the first 5 rows with column labels, and use tail()
to see the last 5 rows with column labels. You can change the number of rows by putting a different value in the parentheses.
New function
df.head()
: Returns the first n rows of the DataFrame.
New function
df.tail()
: Returns the last n rows of the DataFrame.
print(df.head())
item_identifier advisor creator \
0 1.0 Wolberger, Cynthia Daniels, Casey M.
1 2.0 Fallin, Margaret Daniele Collado Torres, Leonardo
2 3.0 Neelon, Sara E. Benjamin Caswell, Bess L. L.
3 4.0 Roter, Debra L. Jamal, Leila
4 5.0 Sears, Cynthia L Rouhani, Saba
date_issued title \
0 2015-07-31 Characterization of the ADP-ribosylated proteo...
1 2016-07-25 Annotation-Agnostic Differential Expression an...
2 2016-09-07 Child diet over three seasons in rural Zambia:...
3 2017-02-02 Exploring Parental Involvement in Rare Disease...
4 2018-05-23 Gut Microbes, Enteropathy and Child Growth: Th...
degree_discipline \
0 Biochemistry
1 Biostatistics
2 Human Nutrition
3 Bioethics
4 Global Disease Epidemiology & Control
degree_grantor \
0 Johns Hopkins University. Bloomberg School of ...
1 Johns Hopkins University. Bloomberg School of ...
2 Johns Hopkins University. Bloomberg School of ...
3 Johns Hopkins University. Bloomberg School of ...
4 Johns Hopkins University. Bloomberg School of ...
degree_department \
0 Biochemistry and Molecular Biology
1 Biostatistics
2 International Health
3 Health Policy and Management
4 International Health
committee_member contributor_author
0 Leung, Anthony K. L.|Matunis, Michael J.|Dingl... NaN
1 Leek, Jeffrey T.|Hansen, Kasper D.|Battle, Ale... NaN
2 West, Keith P., Jr.|Talegawkar, Sameera|Fanzo,... NaN
3 Kass, Nancy E.|Saloner, Brendan|Bodurtha, Joan... NaN
4 Kosek, Margaret N|Dowdy, David W|Sack, David ... NaN
print(df.tail(12))
item_identifier advisor creator \
78 79.0 Chirikjian, Gregory S. Ackerman, Martin Kendal
79 80.0 Prince, Jerry L. Uneri, Ali
80 81.0 Whitcomb, Louis L. Bohren, Jonathan
81 82.0 NaN Tao, Lingling
82 83.0 Braverman, Vladimir Yang, Lin
83 84.0 Kazanzides, Peter Chen, Zihan
84 85.0 Gayme, Dennice F. Hameduddin, Ismail
85 86.0 Spall, James C Chen, Tianyi
86 87.0 Dredze, Mark Benton, Adrian
87 87.0 Dredze, Mark Benton, Adrian
88 88.0 Andreou, Andreas G Fischl, Kate Danielle
89 89.0 Shpitser, Ilya Nabi Abdolyousefi, Razieh
date_issued title \
78 2016-02-18 Design and Calibration of Robotic Systems with...
79 2017-01-12 Imaging and registration for surgical guidance...
80 2017-01-26 Intent-Recognition-Based Traded Control for Te...
81 2017-02-26 Learning Discriminative Feature Representation...
82 2017-10-09 Taming Big Data By Streaming
83 2017-10-27 A Scalable, High-Performance, Real-Time Contro...
84 2018-02-26 Tackling viscoelastic turbulence
85 2018-08-20 A Fast Reduced-Space Algorithmic Framework for...
86 2018-10-25 Learning Representations of Social Media Users
87 2018-10-25 Learning Representations of Social Media Users
88 2019-05-13 Neuromorphic Models of the Amygdala with Appli...
89 2021-03-29 Causal Inference Methods For Bias Correction I...
degree_discipline degree_grantor \
78 Robotics Johns Hopkins University. Whiting School of En...
79 Computer Science Johns Hopkins University. Whiting School of En...
80 Robotics Johns Hopkins University. Whiting School of En...
81 Computer Science Johns Hopkins University. Whiting School of En...
82 Computer Science Johns Hopkins University. Whiting School of En...
83 Computer Science Johns Hopkins University. Whiting School of En...
84 Mechanical Engineering Johns Hopkins University. Whiting School of En...
85 Mathematics Johns Hopkins University. Whiting School of En...
86 Computer Science Johns Hopkins University. Whiting School of En...
87 Computer Science Johns Hopkins University. Whiting School of En...
88 Computer Engineering Johns Hopkins University. Whiting School of En...
89 Computer Science Johns Hopkins University. Whiting School of En...
degree_department \
78 Mechanical Engineering
79 Computer Science
80 Mechanical Engineering
81 Electrical and Computer Engineering
82 Computer Science
83 Computer Science
84 Mechanical Engineering
85 Applied Mathematics and Statistics
86 Computer Science
87 Computer Science
88 Electrical and Computer Engineering
89 Computer Science
committee_member contributor_author
78 Boctor, Emad M.|Shiffman, Bernard|Whitcomb, Lo... NaN
79 Siewerdsen, Jeffrey H.|Taylor, Russell H.|Woli... NaN
80 Kazanzides, Peter|Leonard, Simon NaN
81 Vidal, Rene|Khudanpur, Sanjeev P.|Tran, Trac D... NaN
82 Szalay, Alexander S.|Priebe, Carey E.|Basu, Am... NaN
83 Taylor, Russell H.|Whitcomb, Louis L. NaN
84 Meneveau, Charles V.|Zaki, Tamer A. NaN
85 Basu, Amitabh|Curtis, Frank E|Robinson, Daniel P NaN
86 Arora, Raman|Yarowsky, David|Hovy, Dirk NaN
87 Arora, Raman|Yarowsky, David|Hovy, Dirk NaN
88 Etienne-Cummings, Ralph|Sarma, Sridevi|Pouliqu... NaN
89 Scharfstein, Daniel|Tchetgen Tchetgen, Eric|Og... NaN
Pandas also offers many properties to understand your DataFrame
. Let’s try them out.
New property
df.columns
: Returns the column labels of the DataFrame.
New property
df.shape
: Returns a tuple (rows, columns) representing the DataFrame.
New property
df.empty
: Indicates if the DataFrame is empty.
print(df.columns)
print(df.shape)
print(df.empty)
Index(['item_identifier', 'advisor', 'creator', 'date_issued', 'title',
'degree_discipline', 'degree_grantor', 'degree_department',
'committee_member', 'contributor_author'],
dtype='object')
(90, 10)
False
When we printed df.columns
, we got a list of 10 column labels.
When we printed df.shape
, we got the result (90, 10)
. This means our DataFrame
has 90 rows and 10 columns.
When we printed df.empty
, we got the result False
. This means our DataFrame
is NOT empty.
Access a row¶
There are two main way to access a row. The first is by using .loc[]
, which grabs a row by its label.
Typically, the index label is the same as the row integer (but not always, remember you can change the index labels to be anything you want). The default row index starts at 0, and counts up by 1. Let’s access the 11th row.
New function
df.loc[]
: Access a group of rows and columns by label(s) or a boolean array.
row_11 = df.loc[10]
print(row_11)
item_identifier 11.0
advisor Passey, Benjamin H.
creator Henkes, Gregory Arthur
date_issued 2014-09-22
title Carbonate clumped isotope geochemistry of mari...
degree_discipline Earth & Planetary Sciences
degree_grantor Johns Hopkins University. Krieger School of Ar...
degree_department Earth and Planetary Sciences
committee_member Ferry, John M.
contributor_author NaN
Name: 10, dtype: object
The second way to access a row is by using .iloc[]
, which grabs a row by its integer.
The row integer will remain the same regardless of your index labels. Rows integers always start at 0, and count up by 1. Let’s access the 11th row.
New function
df.iloc[]
: Purely integer-location based indexing for selection by position.
row_11 = df.iloc[10]
print(row_11)
item_identifier 11.0
advisor Passey, Benjamin H.
creator Henkes, Gregory Arthur
date_issued 2014-09-22
title Carbonate clumped isotope geochemistry of mari...
degree_discipline Earth & Planetary Sciences
degree_grantor Johns Hopkins University. Krieger School of Ar...
degree_department Earth and Planetary Sciences
committee_member Ferry, John M.
contributor_author NaN
Name: 10, dtype: object
You can also grab sections of rows using both of these functions. Guess what rows each function will bring up.
Hint
These work a lot like list indexing!
print(df.iloc[30:41])
item_identifier advisor creator \
30 31.0 Schleif, Robert F Martens, Andrew Ted
31 32.0 NaN You, Can
32 33.0 Poole, Deborah Reyna, Zachary E.
33 34.0 Swartz, Morris L. Feng, Lei
34 35.0 Eyink, Gregory Wang, Shengwen
35 36.0 NaN O'Briain, Katarina Louisa
36 37.0 Bowen, Kit Marquez, Sara Anne
37 38.0 Kaplan, Jared Chen, Hongbin
38 39.0 Riess, Adam G. Huang, Caroline
39 40.0 Williams, Michael Wilk, Thomas Michael
40 41.0 Nealon, Christopher Huttner, Tobias Reed
date_issued title \
30 2017-07-19 Synonymous codons affect polysome spacing, pro...
31 2017-07-23 Higgs Boson Properties and Search for Addition...
32 2017-08-24 The Matter of Law: Reconsidering the Natural L...
33 2018-01-30 Measurement of t-tbar Forward-backward Asymmet...
34 2018-05-07 Some properties of closed hypersurfaces of sma...
35 2018-07-10 Trade Secrets: Georgic Poetry and the Rise of ...
36 2018-09-17 Experimental and Theoretical Explorations of A...
37 2019-07-05 On Black Hole Information Paradox in AdS3/CFT2
38 2019-09-25 The Mira Distance Scale
39 2019-10-21 Metaethics for Neo-Pragmatists: A Pragmatic Ac...
40 2020-06-22 On Occasion: American Poetry at the Margins of...
degree_discipline degree_grantor \
30 Biology Johns Hopkins University. Krieger School of Ar...
31 Physics Johns Hopkins University. Krieger School of Ar...
32 Political Science Johns Hopkins University. Krieger School of Ar...
33 Physics Johns Hopkins University. Krieger School of Ar...
34 Mathematics Johns Hopkins University. Krieger School of Ar...
35 English Johns Hopkins University. Krieger School of Ar...
36 Chemistry Johns Hopkins University. Krieger School of Ar...
37 Physics Johns Hopkins University. Krieger School of Ar...
38 Physics Johns Hopkins University. Krieger School of Ar...
39 Philosophy Johns Hopkins University. Krieger School of Ar...
40 English Johns Hopkins University. Krieger School of Ar...
degree_department committee_member \
30 Biology Hilser, Vincent J|Green, Rachel|Roberts, Elijah
31 Physics and Astronomy Gritsan, Andrei|Blumenfeld, Barry J.|Schlaufma...
32 Political Science Culbert, Jennifer|Bennett, Jane|Connolly, Will...
33 Physics and Astronomy Gritsan, Andrei|Schlaufman, Kevin C.|Lu, Fei|S...
34 Mathematics Bernstein, Jacob|Kazhdan, Misha|Sogge, Christo...
35 English Kramnick, Jonathan|Favret, Mary|Achinstein, Sh...
36 Chemistry Toscano, John|Townsend, Craig
37 Physics and Astronomy Bah, Ibrahima|Basu, Amitabh|Kitchloo, Nitu|Li, Yi
38 Physics and Astronomy Kamionkowski, Marc|Schlaufman, Kevin|Sabbi, El...
39 Philosophy Bok, Hilary|Moyar, Dean|Lance, Mark|Egginton, ...
40 English Hickman, Jared|Walters, Ronald G|Chambers, Sam...
contributor_author
30 NaN
31 NaN
32 NaN
33 NaN
34 NaN
35 NaN
36 NaN
37 NaN
38 NaN
39 NaN
40 NaN
print(df.iloc[[0, 30, 60]])
item_identifier advisor creator date_issued \
0 1.0 Wolberger, Cynthia Daniels, Casey M. 2015-07-31
30 31.0 Schleif, Robert F Martens, Andrew Ted 2017-07-19
60 61.0 Tung, Leslie Gorospe, Giann 2017-10-27
title degree_discipline \
0 Characterization of the ADP-ribosylated proteo... Biochemistry
30 Synonymous codons affect polysome spacing, pro... Biology
60 Shape Theoretic and Machine Learning Based Met... Biomedical Engineering
degree_grantor \
0 Johns Hopkins University. Bloomberg School of ...
30 Johns Hopkins University. Krieger School of Ar...
60 Johns Hopkins University. School of Medicine
degree_department \
0 Biochemistry and Molecular Biology
30 Biology
60 Biomedical Engineering
committee_member contributor_author
0 Leung, Anthony K. L.|Matunis, Michael J.|Dingl... NaN
30 Hilser, Vincent J|Green, Rachel|Roberts, Elijah NaN
60 Vidal, Rene|Younes, Laurent NaN
❓ What’s up with the extra brackets in df.iloc[[0, 30, 60]]
❓
To make sure that the function is taking your list of rows as a list –not as 3 distinct parameters– you’ve got to put your values in a list bracket. The example below gives the same result, and hopefully helps to show how we get double brackets when putting a list in .iloc[]
.
rowsToGrab = [0, 30, 60]
print(df.iloc[rowsToGrab])
item_identifier advisor creator date_issued \
0 1.0 Wolberger, Cynthia Daniels, Casey M. 2015-07-31
30 31.0 Schleif, Robert F Martens, Andrew Ted 2017-07-19
60 61.0 Tung, Leslie Gorospe, Giann 2017-10-27
title degree_discipline \
0 Characterization of the ADP-ribosylated proteo... Biochemistry
30 Synonymous codons affect polysome spacing, pro... Biology
60 Shape Theoretic and Machine Learning Based Met... Biomedical Engineering
degree_grantor \
0 Johns Hopkins University. Bloomberg School of ...
30 Johns Hopkins University. Krieger School of Ar...
60 Johns Hopkins University. School of Medicine
degree_department \
0 Biochemistry and Molecular Biology
30 Biology
60 Biomedical Engineering
committee_member contributor_author
0 Leung, Anthony K. L.|Matunis, Michael J.|Dingl... NaN
30 Hilser, Vincent J|Green, Rachel|Roberts, Elijah NaN
60 Vidal, Rene|Younes, Laurent NaN
Access a Series¶
There are two main ways to access a specific column (also known as a Series
) by its label in a DataFrame
.
Dot notation
Uses the column label as a property of the
DataFrame
.Example:
df.column_name
Bracket notation
Places column label string in brackets to select part of the
DataFrame
.Example:
df['column_name']
Let’s try both options.
degree_discipline = df.degree_discipline
print(degree_discipline)
0 Biochemistry
1 Biostatistics
2 Human Nutrition
3 Bioethics
4 Global Disease Epidemiology & Control
...
85 Mathematics
86 Computer Science
87 Computer Science
88 Computer Engineering
89 Computer Science
Name: degree_discipline, Length: 90, dtype: object
degree_discipline = df['degree_discipline']
print(degree_discipline)
0 Biochemistry
1 Biostatistics
2 Human Nutrition
3 Bioethics
4 Global Disease Epidemiology & Control
...
85 Mathematics
86 Computer Science
87 Computer Science
88 Computer Engineering
89 Computer Science
Name: degree_discipline, Length: 90, dtype: object
In most cases, both notations work equally well! Sometimes though, dot notation will fail. If that happens, just switch to bracket notation.
Access a specific cell¶
There are also a bunch of ways to look at a specific cell – here’s some!
We can add a column label (2nd parameter value) to .loc[]
to grab a cell value by its index label and column name.
print(df.loc[9, 'degree_discipline'])
Spanish
We can also add a column integer (second parameter value) to .iloc[]
to grab a cell value by its row integer position and its column integer position.
Hint
To figure out the integer of your column, start counting at 0 at the leftmost column.
print(df.iloc[9, 5])
Spanish
We can also use iat[]
and at[]
to view a single cell.
New function
df.iat[]
: Access single value by index.
New function
df.at[]
: Access single value by label.
print(df.at[9, 'degree_discipline'])
print(df.iat[9,5])
Spanish
Spanish
Get descriptive stats¶
There are also many functions in pandas to get computational or descriptive stats about your Series or DataFrames. Here’s a quick look at my two favorites (check out the full lists for Series and DataFrames here).
New function
df[].unique()
: Returns unique values of Series.
New function
df[].value_counts()
: Returns Series containing counts of unique value in column.
department_unique = df['degree_department'].unique()
print(department_unique)
unique_list = list(department_unique)
print(unique_list)
['Biochemistry and Molecular Biology' 'Biostatistics'
'International Health' 'Health Policy and Management'
'German and Romance Languages and Literatures' 'Anthropology'
'Political Science' 'Earth and Planetary Sciences' 'Sociology'
'Chemistry' 'Physics and Astronomy' 'Mathematics' 'Chemical Biology'
'Biology' 'History' 'English' 'Philosophy'
'Psychological and Brain Sciences' 'Molecular Biology and Genetics'
'Neuroscience' nan 'Oncology' 'Biomedical Engineering'
'Functional Anatomy and Evolution'
'Biochemistry, Cellular and Molecular Biology'
'McKusick-Nathans Institute of Genetic Medicine' 'Biological Chemistry'
'Cellular and Molecular Medicine' 'Physiology' 'Cell Biology'
'Community-Public Health' 'Applied Mathematics and Statistics'
'Electrical and Computer Engineering' 'Mechanical Engineering'
'Computer Science' 'Chemical and Biomolecular Engineering'
'Geography and Environmental Engineering']
['Biochemistry and Molecular Biology', 'Biostatistics', 'International Health', 'Health Policy and Management', 'German and Romance Languages and Literatures', 'Anthropology', 'Political Science', 'Earth and Planetary Sciences', 'Sociology', 'Chemistry', 'Physics and Astronomy', 'Mathematics', 'Chemical Biology', 'Biology', 'History', 'English', 'Philosophy', 'Psychological and Brain Sciences', 'Molecular Biology and Genetics', 'Neuroscience', nan, 'Oncology', 'Biomedical Engineering', 'Functional Anatomy and Evolution', 'Biochemistry, Cellular and Molecular Biology', 'McKusick-Nathans Institute of Genetic Medicine', 'Biological Chemistry', 'Cellular and Molecular Medicine', 'Physiology', 'Cell Biology', 'Community-Public Health', 'Applied Mathematics and Statistics', 'Electrical and Computer Engineering', 'Mechanical Engineering', 'Computer Science', 'Chemical and Biomolecular Engineering', 'Geography and Environmental Engineering']
department_counts = df['degree_department'].value_counts()
print(department_counts)
Computer Science 8
Physics and Astronomy 7
German and Romance Languages and Literatures 5
Biology 5
Chemistry 5
Mathematics 4
McKusick-Nathans Institute of Genetic Medicine 4
Mechanical Engineering 4
Neuroscience 4
Political Science 4
Earth and Planetary Sciences 3
Electrical and Computer Engineering 3
English 3
Biomedical Engineering 2
Anthropology 2
Psychological and Brain Sciences 2
Applied Mathematics and Statistics 2
International Health 2
Biochemistry and Molecular Biology 2
Community-Public Health 2
Sociology 1
Biostatistics 1
Molecular Biology and Genetics 1
Geography and Environmental Engineering 1
Philosophy 1
Biological Chemistry 1
Functional Anatomy and Evolution 1
Health Policy and Management 1
Cellular and Molecular Medicine 1
Biochemistry, Cellular and Molecular Biology 1
Oncology 1
Physiology 1
Chemical and Biomolecular Engineering 1
Cell Biology 1
Chemical Biology 1
History 1
Name: degree_department, dtype: int64