1. Basics of Series and DataFrames

pandas documentation

First, I want to give you a quick tour of the pandas documentation website. This documentation is essential to using pandas and I refer back to it literally all the time.

pandas User Guide: The User Guide covers all of pandas by topic area. Each of the subsections introduces a topic (such as “working with missing data”), and discusses how pandas approaches the problem, with many examples throughout.

pandas API reference: This page gives an overview of all public pandas objects, functions and methods.

What is a DataFrame?

There are two main concepts that are essential to using pandas.

The first important concept is a DataFrame. In pandas, a DataFrame is a table of data organized and accessed by rows and columns. In many ways, it is equivalent to a basic spreadsheet or CSV file. Here’s a visual of what our sampleData.csv looks like as a DataFrame.

Visual of DataFrame

Indexes: By default, pandas assigns an index to every row and every column in a DataFrame.

The index (position) is an integer that starts at 0 and counts up.

  • 50 rows → row index from 0 to 49

  • 10 columns → column index from 0 to 9

Hint

When index is used generically, it refers to the row index rather than the column index.

For rows, the index is typically identical to the row label, but sometimes people change the index or row labels to be strings or letters.

  • 10 rows → row index from A to J

Labels:

  • Column labels correspond to the name of each column. They are typically a string, but can be numbers too.

  • Most of the time, column labels of a DataFrame are equivalent to the “column headers” (or first row) seen in most spreadsheet editors.

  • For 3rd column in our DataFrame, the column index is 2 while the column label is “creator”.

What is a Series?

The second important concept is a Series. In pandas, a Series is equivalent to a column in a CSV or spreadsheet. DataFrames are composed of 2 or more Series.

Series also have a row index or label. Here is an example of a Series (or column) named “title” from sampleData.csv.

Visual of DataFrame

Make a DataFrame

To use pandas, we need to import the library. Typically, pandas is abbreviated as pd.

import pandas as pd

Now, let’s use the pandas function read_csv() to read sampleData.csv into a DataFrame.

New function

pd.read_csv(): Reads a CSV file into a DataFrame.

Let’s give our resulting DataFrame the commonly used variable name df, although we could call it whatever we want. Let’s also print df to see what it looks like in the terminal.

filename = 'sampleData.csv'
df = pd.read_csv(filename)
print(df)
    item_identifier                   advisor                    creator  \
0               1.0        Wolberger, Cynthia          Daniels, Casey M.   
1               2.0  Fallin, Margaret Daniele   Collado Torres, Leonardo   
2               3.0  Neelon, Sara E. Benjamin        Caswell, Bess L. L.   
3               4.0           Roter, Debra L.               Jamal, Leila   
4               5.0          Sears, Cynthia L              Rouhani, Saba   
..              ...                       ...                        ...   
85             86.0           Spall, James  C               Chen, Tianyi   
86             87.0              Dredze, Mark             Benton, Adrian   
87             87.0              Dredze, Mark             Benton, Adrian   
88             88.0       Andreou, Andreas  G      Fischl, Kate Danielle   
89             89.0            Shpitser, Ilya  Nabi Abdolyousefi, Razieh   

   date_issued                                              title  \
0   2015-07-31  Characterization of the ADP-ribosylated proteo...   
1   2016-07-25  Annotation-Agnostic Differential Expression an...   
2   2016-09-07  Child diet over three seasons in rural Zambia:...   
3   2017-02-02  Exploring Parental Involvement in Rare Disease...   
4   2018-05-23  Gut Microbes, Enteropathy and Child Growth: Th...   
..         ...                                                ...   
85  2018-08-20  A Fast Reduced-Space Algorithmic Framework for...   
86  2018-10-25     Learning Representations of Social Media Users   
87  2018-10-25     Learning Representations of Social Media Users   
88  2019-05-13  Neuromorphic Models of the Amygdala with Appli...   
89  2021-03-29  Causal Inference Methods For Bias Correction I...   

                        degree_discipline  \
0                            Biochemistry   
1                           Biostatistics   
2                         Human Nutrition   
3                               Bioethics   
4   Global Disease Epidemiology & Control   
..                                    ...   
85                            Mathematics   
86                       Computer Science   
87                       Computer Science   
88                   Computer Engineering   
89                       Computer Science   

                                       degree_grantor  \
0   Johns Hopkins University. Bloomberg School of ...   
1   Johns Hopkins University. Bloomberg School of ...   
2   Johns Hopkins University. Bloomberg School of ...   
3   Johns Hopkins University. Bloomberg School of ...   
4   Johns Hopkins University. Bloomberg School of ...   
..                                                ...   
85  Johns Hopkins University. Whiting School of En...   
86  Johns Hopkins University. Whiting School of En...   
87  Johns Hopkins University. Whiting School of En...   
88  Johns Hopkins University. Whiting School of En...   
89  Johns Hopkins University. Whiting School of En...   

                      degree_department  \
0    Biochemistry and Molecular Biology   
1                         Biostatistics   
2                  International Health   
3          Health Policy and Management   
4                  International Health   
..                                  ...   
85   Applied Mathematics and Statistics   
86                     Computer Science   
87                     Computer Science   
88  Electrical and Computer Engineering   
89                     Computer Science   

                                     committee_member  contributor_author  
0   Leung, Anthony K. L.|Matunis, Michael J.|Dingl...                 NaN  
1   Leek, Jeffrey T.|Hansen, Kasper D.|Battle, Ale...                 NaN  
2   West, Keith P., Jr.|Talegawkar, Sameera|Fanzo,...                 NaN  
3   Kass, Nancy E.|Saloner, Brendan|Bodurtha, Joan...                 NaN  
4   Kosek, Margaret  N|Dowdy, David W|Sack, David ...                 NaN  
..                                                ...                 ...  
85   Basu, Amitabh|Curtis, Frank E|Robinson, Daniel P                 NaN  
86            Arora, Raman|Yarowsky, David|Hovy, Dirk                 NaN  
87            Arora, Raman|Yarowsky, David|Hovy, Dirk                 NaN  
88  Etienne-Cummings, Ralph|Sarma, Sridevi|Pouliqu...                 NaN  
89  Scharfstein, Daniel|Tchetgen Tchetgen, Eric|Og...                 NaN  

[90 rows x 10 columns]

While this doesn’t print out every row or column of df, it helps us see the overall structure of the DataFrame. pandas also has many useful functions to help get a closer look at your data. Let’s try some!

New function

pd.read_json(): Converts a JSON string to a DataFrame or Series.

New function

pd.read_excel(): Reads an Excel file into a DataFrame.

New function

pd.read_sql(): Reads SQL database table into a DataFrame.

New function

pd.from_dict(): Creates a DataFrame from dictionary or an array of dictionaries.

View your DataFrame

Sometimes it’s nice to get an overview of your data without having to scroll through a bunch of rows. head() and tail() are really great for this.

Use head() on your DataFrame to see the first 5 rows with column labels, and use tail() to see the last 5 rows with column labels. You can change the number of rows by putting a different value in the parentheses.

New function

df.head(): Returns the first n rows of the DataFrame.

New function

df.tail(): Returns the last n rows of the DataFrame.

print(df.head())
   item_identifier                   advisor                   creator  \
0              1.0        Wolberger, Cynthia         Daniels, Casey M.   
1              2.0  Fallin, Margaret Daniele  Collado Torres, Leonardo   
2              3.0  Neelon, Sara E. Benjamin       Caswell, Bess L. L.   
3              4.0           Roter, Debra L.              Jamal, Leila   
4              5.0          Sears, Cynthia L             Rouhani, Saba   

  date_issued                                              title  \
0  2015-07-31  Characterization of the ADP-ribosylated proteo...   
1  2016-07-25  Annotation-Agnostic Differential Expression an...   
2  2016-09-07  Child diet over three seasons in rural Zambia:...   
3  2017-02-02  Exploring Parental Involvement in Rare Disease...   
4  2018-05-23  Gut Microbes, Enteropathy and Child Growth: Th...   

                       degree_discipline  \
0                           Biochemistry   
1                          Biostatistics   
2                        Human Nutrition   
3                              Bioethics   
4  Global Disease Epidemiology & Control   

                                      degree_grantor  \
0  Johns Hopkins University. Bloomberg School of ...   
1  Johns Hopkins University. Bloomberg School of ...   
2  Johns Hopkins University. Bloomberg School of ...   
3  Johns Hopkins University. Bloomberg School of ...   
4  Johns Hopkins University. Bloomberg School of ...   

                    degree_department  \
0  Biochemistry and Molecular Biology   
1                       Biostatistics   
2                International Health   
3        Health Policy and Management   
4                International Health   

                                    committee_member  contributor_author  
0  Leung, Anthony K. L.|Matunis, Michael J.|Dingl...                 NaN  
1  Leek, Jeffrey T.|Hansen, Kasper D.|Battle, Ale...                 NaN  
2  West, Keith P., Jr.|Talegawkar, Sameera|Fanzo,...                 NaN  
3  Kass, Nancy E.|Saloner, Brendan|Bodurtha, Joan...                 NaN  
4  Kosek, Margaret  N|Dowdy, David W|Sack, David ...                 NaN  
print(df.tail(12))
    item_identifier                 advisor                    creator  \
78             79.0  Chirikjian, Gregory S.    Ackerman, Martin Kendal   
79             80.0        Prince, Jerry L.                 Uneri, Ali   
80             81.0      Whitcomb, Louis L.           Bohren, Jonathan   
81             82.0                     NaN              Tao, Lingling   
82             83.0     Braverman, Vladimir                  Yang, Lin   
83             84.0       Kazanzides, Peter                Chen, Zihan   
84             85.0       Gayme, Dennice F.         Hameduddin, Ismail   
85             86.0         Spall, James  C               Chen, Tianyi   
86             87.0            Dredze, Mark             Benton, Adrian   
87             87.0            Dredze, Mark             Benton, Adrian   
88             88.0     Andreou, Andreas  G      Fischl, Kate Danielle   
89             89.0          Shpitser, Ilya  Nabi Abdolyousefi, Razieh   

   date_issued                                              title  \
78  2016-02-18  Design and Calibration of Robotic Systems with...   
79  2017-01-12  Imaging and registration for surgical guidance...   
80  2017-01-26  Intent-Recognition-Based Traded Control for Te...   
81  2017-02-26  Learning Discriminative Feature Representation...   
82  2017-10-09                       Taming Big Data By Streaming   
83  2017-10-27  A Scalable, High-Performance, Real-Time Contro...   
84  2018-02-26                   Tackling viscoelastic turbulence   
85  2018-08-20  A Fast Reduced-Space Algorithmic Framework for...   
86  2018-10-25     Learning Representations of Social Media Users   
87  2018-10-25     Learning Representations of Social Media Users   
88  2019-05-13  Neuromorphic Models of the Amygdala with Appli...   
89  2021-03-29  Causal Inference Methods For Bias Correction I...   

         degree_discipline                                     degree_grantor  \
78                Robotics  Johns Hopkins University. Whiting School of En...   
79        Computer Science  Johns Hopkins University. Whiting School of En...   
80                Robotics  Johns Hopkins University. Whiting School of En...   
81        Computer Science  Johns Hopkins University. Whiting School of En...   
82        Computer Science  Johns Hopkins University. Whiting School of En...   
83        Computer Science  Johns Hopkins University. Whiting School of En...   
84  Mechanical Engineering  Johns Hopkins University. Whiting School of En...   
85             Mathematics  Johns Hopkins University. Whiting School of En...   
86        Computer Science  Johns Hopkins University. Whiting School of En...   
87        Computer Science  Johns Hopkins University. Whiting School of En...   
88    Computer Engineering  Johns Hopkins University. Whiting School of En...   
89        Computer Science  Johns Hopkins University. Whiting School of En...   

                      degree_department  \
78               Mechanical Engineering   
79                     Computer Science   
80               Mechanical Engineering   
81  Electrical and Computer Engineering   
82                     Computer Science   
83                     Computer Science   
84               Mechanical Engineering   
85   Applied Mathematics and Statistics   
86                     Computer Science   
87                     Computer Science   
88  Electrical and Computer Engineering   
89                     Computer Science   

                                     committee_member  contributor_author  
78  Boctor, Emad M.|Shiffman, Bernard|Whitcomb, Lo...                 NaN  
79  Siewerdsen, Jeffrey H.|Taylor, Russell H.|Woli...                 NaN  
80                   Kazanzides, Peter|Leonard, Simon                 NaN  
81  Vidal, Rene|Khudanpur, Sanjeev P.|Tran, Trac D...                 NaN  
82  Szalay, Alexander S.|Priebe, Carey E.|Basu, Am...                 NaN  
83              Taylor, Russell H.|Whitcomb, Louis L.                 NaN  
84                Meneveau, Charles V.|Zaki, Tamer A.                 NaN  
85   Basu, Amitabh|Curtis, Frank E|Robinson, Daniel P                 NaN  
86            Arora, Raman|Yarowsky, David|Hovy, Dirk                 NaN  
87            Arora, Raman|Yarowsky, David|Hovy, Dirk                 NaN  
88  Etienne-Cummings, Ralph|Sarma, Sridevi|Pouliqu...                 NaN  
89  Scharfstein, Daniel|Tchetgen Tchetgen, Eric|Og...                 NaN  

Pandas also offers many properties to understand your DataFrame. Let’s try them out.

New property

df.columns: Returns the column labels of the DataFrame.

New property

df.shape: Returns a tuple (rows, columns) representing the DataFrame.

New property

df.empty: Indicates if the DataFrame is empty.

print(df.columns)
print(df.shape)
print(df.empty)
Index(['item_identifier', 'advisor', 'creator', 'date_issued', 'title',
       'degree_discipline', 'degree_grantor', 'degree_department',
       'committee_member', 'contributor_author'],
      dtype='object')
(90, 10)
False

When we printed df.columns, we got a list of 10 column labels.

When we printed df.shape, we got the result (90, 10). This means our DataFrame has 90 rows and 10 columns.

When we printed df.empty, we got the result False. This means our DataFrame is NOT empty.

Access a row

There are two main way to access a row. The first is by using .loc[], which grabs a row by its label.

Typically, the index label is the same as the row integer (but not always, remember you can change the index labels to be anything you want). The default row index starts at 0, and counts up by 1. Let’s access the 11th row.

New function

df.loc[]: Access a group of rows and columns by label(s) or a boolean array.

row_11 = df.loc[10]
print(row_11)
item_identifier                                                    11.0
advisor                                             Passey, Benjamin H.
creator                                          Henkes, Gregory Arthur
date_issued                                                  2014-09-22
title                 Carbonate clumped isotope geochemistry of mari...
degree_discipline                            Earth & Planetary Sciences
degree_grantor        Johns Hopkins University. Krieger School of Ar...
degree_department                          Earth and Planetary Sciences
committee_member                                         Ferry, John M.
contributor_author                                                  NaN
Name: 10, dtype: object

The second way to access a row is by using .iloc[], which grabs a row by its integer.

The row integer will remain the same regardless of your index labels. Rows integers always start at 0, and count up by 1. Let’s access the 11th row.

New function

df.iloc[]: Purely integer-location based indexing for selection by position.

row_11 = df.iloc[10]
print(row_11)
item_identifier                                                    11.0
advisor                                             Passey, Benjamin H.
creator                                          Henkes, Gregory Arthur
date_issued                                                  2014-09-22
title                 Carbonate clumped isotope geochemistry of mari...
degree_discipline                            Earth & Planetary Sciences
degree_grantor        Johns Hopkins University. Krieger School of Ar...
degree_department                          Earth and Planetary Sciences
committee_member                                         Ferry, John M.
contributor_author                                                  NaN
Name: 10, dtype: object

You can also grab sections of rows using both of these functions. Guess what rows each function will bring up.

Hint

These work a lot like list indexing!

print(df.iloc[30:41])
    item_identifier              advisor                    creator  \
30             31.0    Schleif, Robert F        Martens, Andrew Ted   
31             32.0                  NaN                   You, Can   
32             33.0       Poole, Deborah          Reyna, Zachary E.   
33             34.0    Swartz, Morris L.                  Feng, Lei   
34             35.0       Eyink, Gregory             Wang, Shengwen   
35             36.0                  NaN  O'Briain, Katarina Louisa   
36             37.0           Bowen, Kit         Marquez, Sara Anne   
37             38.0        Kaplan, Jared              Chen, Hongbin   
38             39.0      Riess, Adam  G.            Huang, Caroline   
39             40.0    Williams, Michael       Wilk, Thomas Michael   
40             41.0  Nealon, Christopher       Huttner, Tobias Reed   

   date_issued                                              title  \
30  2017-07-19  Synonymous codons affect polysome spacing, pro...   
31  2017-07-23  Higgs Boson Properties and Search for Addition...   
32  2017-08-24  The Matter of Law: Reconsidering the Natural L...   
33  2018-01-30  Measurement of t-tbar Forward-backward Asymmet...   
34  2018-05-07  Some properties of closed hypersurfaces of sma...   
35  2018-07-10  Trade Secrets: Georgic Poetry and the Rise of ...   
36  2018-09-17  Experimental and Theoretical Explorations of A...   
37  2019-07-05     On Black Hole Information Paradox in AdS3/CFT2   
38  2019-09-25                            The Mira Distance Scale   
39  2019-10-21  Metaethics for Neo-Pragmatists: A Pragmatic Ac...   
40  2020-06-22  On Occasion: American Poetry at the Margins of...   

    degree_discipline                                     degree_grantor  \
30            Biology  Johns Hopkins University. Krieger School of Ar...   
31            Physics  Johns Hopkins University. Krieger School of Ar...   
32  Political Science  Johns Hopkins University. Krieger School of Ar...   
33            Physics  Johns Hopkins University. Krieger School of Ar...   
34        Mathematics  Johns Hopkins University. Krieger School of Ar...   
35            English  Johns Hopkins University. Krieger School of Ar...   
36          Chemistry  Johns Hopkins University. Krieger School of Ar...   
37            Physics  Johns Hopkins University. Krieger School of Ar...   
38            Physics  Johns Hopkins University. Krieger School of Ar...   
39         Philosophy  Johns Hopkins University. Krieger School of Ar...   
40            English  Johns Hopkins University. Krieger School of Ar...   

        degree_department                                   committee_member  \
30                Biology    Hilser, Vincent J|Green, Rachel|Roberts, Elijah   
31  Physics and Astronomy  Gritsan, Andrei|Blumenfeld, Barry J.|Schlaufma...   
32      Political Science  Culbert, Jennifer|Bennett, Jane|Connolly, Will...   
33  Physics and Astronomy  Gritsan, Andrei|Schlaufman, Kevin C.|Lu, Fei|S...   
34            Mathematics  Bernstein, Jacob|Kazhdan, Misha|Sogge, Christo...   
35                English  Kramnick, Jonathan|Favret, Mary|Achinstein, Sh...   
36              Chemistry                      Toscano, John|Townsend, Craig   
37  Physics and Astronomy  Bah, Ibrahima|Basu, Amitabh|Kitchloo, Nitu|Li, Yi   
38  Physics and Astronomy  Kamionkowski, Marc|Schlaufman, Kevin|Sabbi, El...   
39             Philosophy  Bok, Hilary|Moyar, Dean|Lance, Mark|Egginton, ...   
40                English  Hickman, Jared|Walters, Ronald G|Chambers, Sam...   

    contributor_author  
30                 NaN  
31                 NaN  
32                 NaN  
33                 NaN  
34                 NaN  
35                 NaN  
36                 NaN  
37                 NaN  
38                 NaN  
39                 NaN  
40                 NaN  
print(df.iloc[[0, 30, 60]])
    item_identifier             advisor              creator date_issued  \
0               1.0  Wolberger, Cynthia    Daniels, Casey M.  2015-07-31   
30             31.0   Schleif, Robert F  Martens, Andrew Ted  2017-07-19   
60             61.0        Tung, Leslie       Gorospe, Giann  2017-10-27   

                                                title       degree_discipline  \
0   Characterization of the ADP-ribosylated proteo...            Biochemistry   
30  Synonymous codons affect polysome spacing, pro...                 Biology   
60  Shape Theoretic and Machine Learning Based Met...  Biomedical Engineering   

                                       degree_grantor  \
0   Johns Hopkins University. Bloomberg School of ...   
30  Johns Hopkins University. Krieger School of Ar...   
60       Johns Hopkins University. School of Medicine   

                     degree_department  \
0   Biochemistry and Molecular Biology   
30                             Biology   
60              Biomedical Engineering   

                                     committee_member  contributor_author  
0   Leung, Anthony K. L.|Matunis, Michael J.|Dingl...                 NaN  
30    Hilser, Vincent J|Green, Rachel|Roberts, Elijah                 NaN  
60                        Vidal, Rene|Younes, Laurent                 NaN  

What’s up with the extra brackets in df.iloc[[0, 30, 60]]

To make sure that the function is taking your list of rows as a list –not as 3 distinct parameters– you’ve got to put your values in a list bracket. The example below gives the same result, and hopefully helps to show how we get double brackets when putting a list in .iloc[].

rowsToGrab = [0, 30, 60]
print(df.iloc[rowsToGrab])
    item_identifier             advisor              creator date_issued  \
0               1.0  Wolberger, Cynthia    Daniels, Casey M.  2015-07-31   
30             31.0   Schleif, Robert F  Martens, Andrew Ted  2017-07-19   
60             61.0        Tung, Leslie       Gorospe, Giann  2017-10-27   

                                                title       degree_discipline  \
0   Characterization of the ADP-ribosylated proteo...            Biochemistry   
30  Synonymous codons affect polysome spacing, pro...                 Biology   
60  Shape Theoretic and Machine Learning Based Met...  Biomedical Engineering   

                                       degree_grantor  \
0   Johns Hopkins University. Bloomberg School of ...   
30  Johns Hopkins University. Krieger School of Ar...   
60       Johns Hopkins University. School of Medicine   

                     degree_department  \
0   Biochemistry and Molecular Biology   
30                             Biology   
60              Biomedical Engineering   

                                     committee_member  contributor_author  
0   Leung, Anthony K. L.|Matunis, Michael J.|Dingl...                 NaN  
30    Hilser, Vincent J|Green, Rachel|Roberts, Elijah                 NaN  
60                        Vidal, Rene|Younes, Laurent                 NaN  

Access a Series

There are two main ways to access a specific column (also known as a Series) by its label in a DataFrame.

  1. Dot notation

    • Uses the column label as a property of the DataFrame.

    • Example: df.column_name

  2. Bracket notation

    • Places column label string in brackets to select part of the DataFrame.

    • Example: df['column_name']

Let’s try both options.

degree_discipline = df.degree_discipline
print(degree_discipline)
0                              Biochemistry
1                             Biostatistics
2                           Human Nutrition
3                                 Bioethics
4     Global Disease Epidemiology & Control
                      ...                  
85                              Mathematics
86                         Computer Science
87                         Computer Science
88                     Computer Engineering
89                         Computer Science
Name: degree_discipline, Length: 90, dtype: object
degree_discipline = df['degree_discipline']
print(degree_discipline)
0                              Biochemistry
1                             Biostatistics
2                           Human Nutrition
3                                 Bioethics
4     Global Disease Epidemiology & Control
                      ...                  
85                              Mathematics
86                         Computer Science
87                         Computer Science
88                     Computer Engineering
89                         Computer Science
Name: degree_discipline, Length: 90, dtype: object

In most cases, both notations work equally well! Sometimes though, dot notation will fail. If that happens, just switch to bracket notation.

Access a specific cell

There are also a bunch of ways to look at a specific cell – here’s some!

We can add a column label (2nd parameter value) to .loc[] to grab a cell value by its index label and column name.

print(df.loc[9, 'degree_discipline'])
Spanish

We can also add a column integer (second parameter value) to .iloc[] to grab a cell value by its row integer position and its column integer position.

Hint

To figure out the integer of your column, start counting at 0 at the leftmost column.

print(df.iloc[9, 5])
Spanish

We can also use iat[] and at[] to view a single cell.

New function

df.iat[]: Access single value by index.

New function

df.at[]: Access single value by label.

print(df.at[9, 'degree_discipline'])
print(df.iat[9,5])
Spanish
Spanish

Get descriptive stats

There are also many functions in pandas to get computational or descriptive stats about your Series or DataFrames. Here’s a quick look at my two favorites (check out the full lists for Series and DataFrames here).

New function

df[].unique(): Returns unique values of Series.

New function

df[].value_counts(): Returns Series containing counts of unique value in column.

department_unique = df['degree_department'].unique()
print(department_unique)

unique_list = list(department_unique)
print(unique_list)
['Biochemistry and Molecular Biology' 'Biostatistics'
 'International Health' 'Health Policy and Management'
 'German and Romance Languages and Literatures' 'Anthropology'
 'Political Science' 'Earth and Planetary Sciences' 'Sociology'
 'Chemistry' 'Physics and Astronomy' 'Mathematics' 'Chemical Biology'
 'Biology' 'History' 'English' 'Philosophy'
 'Psychological and Brain Sciences' 'Molecular Biology and Genetics'
 'Neuroscience' nan 'Oncology' 'Biomedical Engineering'
 'Functional Anatomy and Evolution'
 'Biochemistry, Cellular and Molecular Biology'
 'McKusick-Nathans Institute of Genetic Medicine' 'Biological Chemistry'
 'Cellular and Molecular Medicine' 'Physiology' 'Cell Biology'
 'Community-Public Health' 'Applied Mathematics and Statistics'
 'Electrical and Computer Engineering' 'Mechanical Engineering'
 'Computer Science' 'Chemical and Biomolecular Engineering'
 'Geography and Environmental Engineering']
['Biochemistry and Molecular Biology', 'Biostatistics', 'International Health', 'Health Policy and Management', 'German and Romance Languages and Literatures', 'Anthropology', 'Political Science', 'Earth and Planetary Sciences', 'Sociology', 'Chemistry', 'Physics and Astronomy', 'Mathematics', 'Chemical Biology', 'Biology', 'History', 'English', 'Philosophy', 'Psychological and Brain Sciences', 'Molecular Biology and Genetics', 'Neuroscience', nan, 'Oncology', 'Biomedical Engineering', 'Functional Anatomy and Evolution', 'Biochemistry, Cellular and Molecular Biology', 'McKusick-Nathans Institute of Genetic Medicine', 'Biological Chemistry', 'Cellular and Molecular Medicine', 'Physiology', 'Cell Biology', 'Community-Public Health', 'Applied Mathematics and Statistics', 'Electrical and Computer Engineering', 'Mechanical Engineering', 'Computer Science', 'Chemical and Biomolecular Engineering', 'Geography and Environmental Engineering']
department_counts = df['degree_department'].value_counts()
print(department_counts)
Computer Science                                  8
Physics and Astronomy                             7
German and Romance Languages and Literatures      5
Biology                                           5
Chemistry                                         5
Mathematics                                       4
McKusick-Nathans Institute of Genetic Medicine    4
Mechanical Engineering                            4
Neuroscience                                      4
Political Science                                 4
Earth and Planetary Sciences                      3
Electrical and Computer Engineering               3
English                                           3
Biomedical Engineering                            2
Anthropology                                      2
Psychological and Brain Sciences                  2
Applied Mathematics and Statistics                2
International Health                              2
Biochemistry and Molecular Biology                2
Community-Public Health                           2
Sociology                                         1
Biostatistics                                     1
Molecular Biology and Genetics                    1
Geography and Environmental Engineering           1
Philosophy                                        1
Biological Chemistry                              1
Functional Anatomy and Evolution                  1
Health Policy and Management                      1
Cellular and Molecular Medicine                   1
Biochemistry, Cellular and Molecular Biology      1
Oncology                                          1
Physiology                                        1
Chemical and Biomolecular Engineering             1
Cell Biology                                      1
Chemical Biology                                  1
History                                           1
Name: degree_department, dtype: int64