1. Basics of `Series` and `DataFrames`¶

pandas documentation¶

First, I want to give you a quick tour of the pandas documentation website. This documentation is essential to using pandas and I refer back to it literally all the time.

pandas User Guide: The User Guide covers all of pandas by topic area. Each of the subsections introduces a topic (such as “working with missing data”), and discusses how pandas approaches the problem, with many examples throughout.

pandas API reference: This page gives an overview of all public pandas objects, functions and methods.

What is a `DataFrame`?¶

There are two main concepts that are essential to using pandas.

The first important concept is a DataFrame. In pandas, a DataFrame is a table of data organized and accessed by rows and columns. In many ways, it is equivalent to a basic spreadsheet or CSV file. Here’s a visual of what our sampleData.csv looks like as a DataFrame.

Files

You can download sampleData.csv here.

Indexes: By default, pandas assigns an index to every row and every column in a DataFrame.

The index (position) is an integer that starts at 0 and counts up.

50 rows → row index from 0 to 49
10 columns → column index from 0 to 9

Hint

When index is used generically, it refers to the row index rather than the column index.

For rows, the index is typically identical to the row label, but sometimes people change the index or row labels to be strings or letters.

10 rows → row index from A to J

Labels:

Column labels correspond to the name of each column. They are typically a string, but can be numbers too.
Most of the time, column labels of a DataFrame are equivalent to the “column headers” (or first row) seen in most spreadsheet editors.
For 3rd column in our DataFrame, the column index is 2 while the column label is “creator”.

What is a `Series`?¶

The second important concept is a Series. In pandas, a Series is equivalent to a column in a CSV or spreadsheet. DataFrames are composed of 2 or more Series.

Series also have a row index or label. Here is an example of a Series (or column) named “title” from sampleData.csv.

Make a `DataFrame`¶

To use pandas, we need to import the library. Typically, pandas is abbreviated as pd.

import pandas as pd

Now, let’s use the pandas function read_csv() to read sampleData.csv into a DataFrame.

New function

pd.read_csv(): Reads a CSV file into a DataFrame.

Let’s give our resulting DataFrame the commonly used variable name df, although we could call it whatever we want. Let’s also print df to see what it looks like in the terminal.

filename = 'sampleData.csv'
df = pd.read_csv(filename)
print(df)

    item_identifier                   advisor                    creator  \
             1.0        Wolberger, Cynthia          Daniels, Casey M.   
             2.0  Fallin, Margaret Daniele   Collado Torres, Leonardo   
             3.0  Neelon, Sara E. Benjamin        Caswell, Bess L. L.   
             4.0           Roter, Debra L.               Jamal, Leila   
             5.0          Sears, Cynthia L              Rouhani, Saba   
..              ...                       ...                        ...   
           86.0           Spall, James  C               Chen, Tianyi   
           87.0              Dredze, Mark             Benton, Adrian   
           87.0              Dredze, Mark             Benton, Adrian   
           88.0       Andreou, Andreas  G      Fischl, Kate Danielle   
           89.0            Shpitser, Ilya  Nabi Abdolyousefi, Razieh   

   date_issued                                              title  \
 2015-07-31  Characterization of the ADP-ribosylated proteo...   
 2016-07-25  Annotation-Agnostic Differential Expression an...   
 2016-09-07  Child diet over three seasons in rural Zambia:...   
 2017-02-02  Exploring Parental Involvement in Rare Disease...   
 2018-05-23  Gut Microbes, Enteropathy and Child Growth: Th...   
..         ...                                                ...   
2018-08-20  A Fast Reduced-Space Algorithmic Framework for...   
2018-10-25     Learning Representations of Social Media Users   
2018-10-25     Learning Representations of Social Media Users   
2019-05-13  Neuromorphic Models of the Amygdala with Appli...   
2021-03-29  Causal Inference Methods For Bias Correction I...   

                        degree_discipline  \
                          Biochemistry   
                         Biostatistics   
                       Human Nutrition   
                             Bioethics   
 Global Disease Epidemiology & Control   
..                                    ...   
                          Mathematics   
                     Computer Science   
                     Computer Science   
                 Computer Engineering   
                     Computer Science   

                                       degree_grantor  \
 Johns Hopkins University. Bloomberg School of ...   
 Johns Hopkins University. Bloomberg School of ...   
 Johns Hopkins University. Bloomberg School of ...   
 Johns Hopkins University. Bloomberg School of ...   
 Johns Hopkins University. Bloomberg School of ...   
..                                                ...   
Johns Hopkins University. Whiting School of En...   
Johns Hopkins University. Whiting School of En...   
Johns Hopkins University. Whiting School of En...   
Johns Hopkins University. Whiting School of En...   
Johns Hopkins University. Whiting School of En...   

                      degree_department  \
  Biochemistry and Molecular Biology   
                       Biostatistics   
                International Health   
        Health Policy and Management   
                International Health   
..                                  ...   
 Applied Mathematics and Statistics   
                   Computer Science   
                   Computer Science   
Electrical and Computer Engineering   
                   Computer Science   

                                     committee_member  contributor_author  
 Leung, Anthony K. L.|Matunis, Michael J.|Dingl...                 NaN  
 Leek, Jeffrey T.|Hansen, Kasper D.|Battle, Ale...                 NaN  
 West, Keith P., Jr.|Talegawkar, Sameera|Fanzo,...                 NaN  
 Kass, Nancy E.|Saloner, Brendan|Bodurtha, Joan...                 NaN  
 Kosek, Margaret  N|Dowdy, David W|Sack, David ...                 NaN  
..                                                ...                 ...  
 Basu, Amitabh|Curtis, Frank E|Robinson, Daniel P                 NaN  
          Arora, Raman|Yarowsky, David|Hovy, Dirk                 NaN  
          Arora, Raman|Yarowsky, David|Hovy, Dirk                 NaN  
Etienne-Cummings, Ralph|Sarma, Sridevi|Pouliqu...                 NaN  
Scharfstein, Daniel|Tchetgen Tchetgen, Eric|Og...                 NaN  

[90 rows x 10 columns]

While this doesn’t print out every row or column of df, it helps us see the overall structure of the DataFrame. pandas also has many useful functions to help get a closer look at your data. Let’s try some!

New function

pd.read_json(): Converts a JSON string to a DataFrame or Series.

New function

pd.read_excel(): Reads an Excel file into a DataFrame.

New function

pd.read_sql(): Reads SQL database table into a DataFrame.

New function

pd.from_dict(): Creates a DataFrame from dictionary or an array of dictionaries.

View your `DataFrame`¶

Sometimes it’s nice to get an overview of your data without having to scroll through a bunch of rows. head() and tail() are really great for this.

Use head() on your DataFrame to see the first 5 rows with column labels, and use tail() to see the last 5 rows with column labels. You can change the number of rows by putting a different value in the parentheses.

New function

df.head(): Returns the first n rows of the DataFrame.

New function

df.tail(): Returns the last n rows of the DataFrame.

print(df.head())

   item_identifier                   advisor                   creator  \
            1.0        Wolberger, Cynthia         Daniels, Casey M.   
            2.0  Fallin, Margaret Daniele  Collado Torres, Leonardo   
            3.0  Neelon, Sara E. Benjamin       Caswell, Bess L. L.   
            4.0           Roter, Debra L.              Jamal, Leila   
            5.0          Sears, Cynthia L             Rouhani, Saba   

  date_issued                                              title  \
2015-07-31  Characterization of the ADP-ribosylated proteo...   
2016-07-25  Annotation-Agnostic Differential Expression an...   
2016-09-07  Child diet over three seasons in rural Zambia:...   
2017-02-02  Exploring Parental Involvement in Rare Disease...   
2018-05-23  Gut Microbes, Enteropathy and Child Growth: Th...   

                       degree_discipline  \
                         Biochemistry   
                        Biostatistics   
                      Human Nutrition   
                            Bioethics   
Global Disease Epidemiology & Control   

                                      degree_grantor  \
Johns Hopkins University. Bloomberg School of ...   
Johns Hopkins University. Bloomberg School of ...   
Johns Hopkins University. Bloomberg School of ...   
Johns Hopkins University. Bloomberg School of ...   
Johns Hopkins University. Bloomberg School of ...   

                    degree_department  \
Biochemistry and Molecular Biology   
                     Biostatistics   
              International Health   
      Health Policy and Management   
              International Health   

                                    committee_member  contributor_author  
Leung, Anthony K. L.|Matunis, Michael J.|Dingl...                 NaN  
Leek, Jeffrey T.|Hansen, Kasper D.|Battle, Ale...                 NaN  
West, Keith P., Jr.|Talegawkar, Sameera|Fanzo,...                 NaN  
Kass, Nancy E.|Saloner, Brendan|Bodurtha, Joan...                 NaN  
Kosek, Margaret  N|Dowdy, David W|Sack, David ...                 NaN  

print(df.tail(12))

    item_identifier                 advisor                    creator  \
           79.0  Chirikjian, Gregory S.    Ackerman, Martin Kendal   
           80.0        Prince, Jerry L.                 Uneri, Ali   
           81.0      Whitcomb, Louis L.           Bohren, Jonathan   
           82.0                     NaN              Tao, Lingling   
           83.0     Braverman, Vladimir                  Yang, Lin   
           84.0       Kazanzides, Peter                Chen, Zihan   
           85.0       Gayme, Dennice F.         Hameduddin, Ismail   
           86.0         Spall, James  C               Chen, Tianyi   
           87.0            Dredze, Mark             Benton, Adrian   
           87.0            Dredze, Mark             Benton, Adrian   
           88.0     Andreou, Andreas  G      Fischl, Kate Danielle   
           89.0          Shpitser, Ilya  Nabi Abdolyousefi, Razieh   

   date_issued                                              title  \
2016-02-18  Design and Calibration of Robotic Systems with...   
2017-01-12  Imaging and registration for surgical guidance...   
2017-01-26  Intent-Recognition-Based Traded Control for Te...   
2017-02-26  Learning Discriminative Feature Representation...   
2017-10-09                       Taming Big Data By Streaming   
2017-10-27  A Scalable, High-Performance, Real-Time Contro...   
2018-02-26                   Tackling viscoelastic turbulence   
2018-08-20  A Fast Reduced-Space Algorithmic Framework for...   
2018-10-25     Learning Representations of Social Media Users   
2018-10-25     Learning Representations of Social Media Users   
2019-05-13  Neuromorphic Models of the Amygdala with Appli...   
2021-03-29  Causal Inference Methods For Bias Correction I...   

         degree_discipline                                     degree_grantor  \
              Robotics  Johns Hopkins University. Whiting School of En...   
      Computer Science  Johns Hopkins University. Whiting School of En...   
              Robotics  Johns Hopkins University. Whiting School of En...   
      Computer Science  Johns Hopkins University. Whiting School of En...   
      Computer Science  Johns Hopkins University. Whiting School of En...   
      Computer Science  Johns Hopkins University. Whiting School of En...   
Mechanical Engineering  Johns Hopkins University. Whiting School of En...   
           Mathematics  Johns Hopkins University. Whiting School of En...   
      Computer Science  Johns Hopkins University. Whiting School of En...   
      Computer Science  Johns Hopkins University. Whiting School of En...   
  Computer Engineering  Johns Hopkins University. Whiting School of En...   
      Computer Science  Johns Hopkins University. Whiting School of En...   

                      degree_department  \
             Mechanical Engineering   
                   Computer Science   
             Mechanical Engineering   
Electrical and Computer Engineering   
                   Computer Science   
                   Computer Science   
             Mechanical Engineering   
 Applied Mathematics and Statistics   
                   Computer Science   
                   Computer Science   
Electrical and Computer Engineering   
                   Computer Science   

                                     committee_member  contributor_author  
Boctor, Emad M.|Shiffman, Bernard|Whitcomb, Lo...                 NaN  
Siewerdsen, Jeffrey H.|Taylor, Russell H.|Woli...                 NaN  
                 Kazanzides, Peter|Leonard, Simon                 NaN  
Vidal, Rene|Khudanpur, Sanjeev P.|Tran, Trac D...                 NaN  
Szalay, Alexander S.|Priebe, Carey E.|Basu, Am...                 NaN  
            Taylor, Russell H.|Whitcomb, Louis L.                 NaN  
              Meneveau, Charles V.|Zaki, Tamer A.                 NaN  
 Basu, Amitabh|Curtis, Frank E|Robinson, Daniel P                 NaN  
          Arora, Raman|Yarowsky, David|Hovy, Dirk                 NaN  
          Arora, Raman|Yarowsky, David|Hovy, Dirk                 NaN  
Etienne-Cummings, Ralph|Sarma, Sridevi|Pouliqu...                 NaN  
Scharfstein, Daniel|Tchetgen Tchetgen, Eric|Og...                 NaN  

Pandas also offers many properties to understand your DataFrame. Let’s try them out.

New property

df.columns: Returns the column labels of the DataFrame.

New property

df.shape: Returns a tuple (rows, columns) representing the DataFrame.

New property

df.empty: Indicates if the DataFrame is empty.

print(df.columns)
print(df.shape)
print(df.empty)

Index(['item_identifier', 'advisor', 'creator', 'date_issued', 'title',
       'degree_discipline', 'degree_grantor', 'degree_department',
       'committee_member', 'contributor_author'],
      dtype='object')
(90, 10)
False

When we printed df.columns, we got a list of 10 column labels.

When we printed df.shape, we got the result (90, 10). This means our DataFrame has 90 rows and 10 columns.

When we printed df.empty, we got the result False. This means our DataFrame is NOT empty.

Access a row¶

There are two main way to access a row. The first is by using .loc[], which grabs a row by its label.

Typically, the index label is the same as the row integer (but not always, remember you can change the index labels to be anything you want). The default row index starts at 0, and counts up by 1. Let’s access the 11th row.

New function

df.loc[]: Access a group of rows and columns by label(s) or a boolean array.

row_11 = df.loc[10]
print(row_11)

item_identifier                                                    11.0
advisor                                             Passey, Benjamin H.
creator                                          Henkes, Gregory Arthur
date_issued                                                  2014-09-22
title                 Carbonate clumped isotope geochemistry of mari...
degree_discipline                            Earth & Planetary Sciences
degree_grantor        Johns Hopkins University. Krieger School of Ar...
degree_department                          Earth and Planetary Sciences
committee_member                                         Ferry, John M.
contributor_author                                                  NaN
Name: 10, dtype: object

The second way to access a row is by using .iloc[], which grabs a row by its integer.

The row integer will remain the same regardless of your index labels. Rows integers always start at 0, and count up by 1. Let’s access the 11th row.

New function

df.iloc[]: Purely integer-location based indexing for selection by position.

row_11 = df.iloc[10]
print(row_11)

item_identifier                                                    11.0
advisor                                             Passey, Benjamin H.
creator                                          Henkes, Gregory Arthur
date_issued                                                  2014-09-22
title                 Carbonate clumped isotope geochemistry of mari...
degree_discipline                            Earth & Planetary Sciences
degree_grantor        Johns Hopkins University. Krieger School of Ar...
degree_department                          Earth and Planetary Sciences
committee_member                                         Ferry, John M.
contributor_author                                                  NaN
Name: 10, dtype: object

You can also grab sections of rows using both of these functions. Guess what rows each function will bring up.

Hint

These work a lot like list indexing!

print(df.iloc[30:41])

    item_identifier              advisor                    creator  \
           31.0    Schleif, Robert F        Martens, Andrew Ted   
           32.0                  NaN                   You, Can   
           33.0       Poole, Deborah          Reyna, Zachary E.   
           34.0    Swartz, Morris L.                  Feng, Lei   
           35.0       Eyink, Gregory             Wang, Shengwen   
           36.0                  NaN  O'Briain, Katarina Louisa   
           37.0           Bowen, Kit         Marquez, Sara Anne   
           38.0        Kaplan, Jared              Chen, Hongbin   
           39.0      Riess, Adam  G.            Huang, Caroline   
           40.0    Williams, Michael       Wilk, Thomas Michael   
           41.0  Nealon, Christopher       Huttner, Tobias Reed   

   date_issued                                              title  \
2017-07-19  Synonymous codons affect polysome spacing, pro...   
2017-07-23  Higgs Boson Properties and Search for Addition...   
2017-08-24  The Matter of Law: Reconsidering the Natural L...   
2018-01-30  Measurement of t-tbar Forward-backward Asymmet...   
2018-05-07  Some properties of closed hypersurfaces of sma...   
2018-07-10  Trade Secrets: Georgic Poetry and the Rise of ...   
2018-09-17  Experimental and Theoretical Explorations of A...   
2019-07-05     On Black Hole Information Paradox in AdS3/CFT2   
2019-09-25                            The Mira Distance Scale   
2019-10-21  Metaethics for Neo-Pragmatists: A Pragmatic Ac...   
2020-06-22  On Occasion: American Poetry at the Margins of...   

    degree_discipline                                     degree_grantor  \
          Biology  Johns Hopkins University. Krieger School of Ar...   
          Physics  Johns Hopkins University. Krieger School of Ar...   
Political Science  Johns Hopkins University. Krieger School of Ar...   
          Physics  Johns Hopkins University. Krieger School of Ar...   
      Mathematics  Johns Hopkins University. Krieger School of Ar...   
          English  Johns Hopkins University. Krieger School of Ar...   
        Chemistry  Johns Hopkins University. Krieger School of Ar...   
          Physics  Johns Hopkins University. Krieger School of Ar...   
          Physics  Johns Hopkins University. Krieger School of Ar...   
       Philosophy  Johns Hopkins University. Krieger School of Ar...   
          English  Johns Hopkins University. Krieger School of Ar...   

        degree_department                                   committee_member  \
              Biology    Hilser, Vincent J|Green, Rachel|Roberts, Elijah   
Physics and Astronomy  Gritsan, Andrei|Blumenfeld, Barry J.|Schlaufma...   
    Political Science  Culbert, Jennifer|Bennett, Jane|Connolly, Will...   
Physics and Astronomy  Gritsan, Andrei|Schlaufman, Kevin C.|Lu, Fei|S...   
          Mathematics  Bernstein, Jacob|Kazhdan, Misha|Sogge, Christo...   
              English  Kramnick, Jonathan|Favret, Mary|Achinstein, Sh...   
            Chemistry                      Toscano, John|Townsend, Craig   
Physics and Astronomy  Bah, Ibrahima|Basu, Amitabh|Kitchloo, Nitu|Li, Yi   
Physics and Astronomy  Kamionkowski, Marc|Schlaufman, Kevin|Sabbi, El...   
           Philosophy  Bok, Hilary|Moyar, Dean|Lance, Mark|Egginton, ...   
              English  Hickman, Jared|Walters, Ronald G|Chambers, Sam...   

    contributor_author  
               NaN  
               NaN  
               NaN  
               NaN  
               NaN  
               NaN  
               NaN  
               NaN  
               NaN  
               NaN  
               NaN  

print(df.iloc[[0, 30, 60]])

    item_identifier             advisor              creator date_issued  \
             1.0  Wolberger, Cynthia    Daniels, Casey M.  2015-07-31   
           31.0   Schleif, Robert F  Martens, Andrew Ted  2017-07-19   
           61.0        Tung, Leslie       Gorospe, Giann  2017-10-27   

                                                title       degree_discipline  \
 Characterization of the ADP-ribosylated proteo...            Biochemistry   
Synonymous codons affect polysome spacing, pro...                 Biology   
Shape Theoretic and Machine Learning Based Met...  Biomedical Engineering   

                                       degree_grantor  \
 Johns Hopkins University. Bloomberg School of ...   
Johns Hopkins University. Krieger School of Ar...   
     Johns Hopkins University. School of Medicine   

                     degree_department  \
 Biochemistry and Molecular Biology   
                           Biology   
            Biomedical Engineering   

                                     committee_member  contributor_author  
 Leung, Anthony K. L.|Matunis, Michael J.|Dingl...                 NaN  
  Hilser, Vincent J|Green, Rachel|Roberts, Elijah                 NaN  
                      Vidal, Rene|Younes, Laurent                 NaN  

❓ What’s up with the extra brackets in df.iloc[[0, 30, 60]] ❓

To make sure that the function is taking your list of rows as a list –not as 3 distinct parameters– you’ve got to put your values in a list bracket. The example below gives the same result, and hopefully helps to show how we get double brackets when putting a list in .iloc[].

rowsToGrab = [0, 30, 60]
print(df.iloc[rowsToGrab])

    item_identifier             advisor              creator date_issued  \
             1.0  Wolberger, Cynthia    Daniels, Casey M.  2015-07-31   
           31.0   Schleif, Robert F  Martens, Andrew Ted  2017-07-19   
           61.0        Tung, Leslie       Gorospe, Giann  2017-10-27   

                                                title       degree_discipline  \
 Characterization of the ADP-ribosylated proteo...            Biochemistry   
Synonymous codons affect polysome spacing, pro...                 Biology   
Shape Theoretic and Machine Learning Based Met...  Biomedical Engineering   

                                       degree_grantor  \
 Johns Hopkins University. Bloomberg School of ...   
Johns Hopkins University. Krieger School of Ar...   
     Johns Hopkins University. School of Medicine   

                     degree_department  \
 Biochemistry and Molecular Biology   
                           Biology   
            Biomedical Engineering   

                                     committee_member  contributor_author  
 Leung, Anthony K. L.|Matunis, Michael J.|Dingl...                 NaN  
  Hilser, Vincent J|Green, Rachel|Roberts, Elijah                 NaN  
                      Vidal, Rene|Younes, Laurent                 NaN  

Access a Series¶

There are two main ways to access a specific column (also known as a Series) by its label in a DataFrame.

Dot notation
- Uses the column label as a property of the DataFrame.
- Example: df.column_name
Bracket notation
- Places column label string in brackets to select part of the DataFrame.
- Example: df['column_name']

Let’s try both options.

degree_discipline = df.degree_discipline
print(degree_discipline)

                            Biochemistry
                           Biostatistics
                         Human Nutrition
                               Bioethics
   Global Disease Epidemiology & Control
                      ...                  
                            Mathematics
                       Computer Science
                       Computer Science
                   Computer Engineering
                       Computer Science
Name: degree_discipline, Length: 90, dtype: object

degree_discipline = df['degree_discipline']
print(degree_discipline)

                            Biochemistry
                           Biostatistics
                         Human Nutrition
                               Bioethics
   Global Disease Epidemiology & Control
                      ...                  
                            Mathematics
                       Computer Science
                       Computer Science
                   Computer Engineering
                       Computer Science
Name: degree_discipline, Length: 90, dtype: object

In most cases, both notations work equally well! Sometimes though, dot notation will fail. If that happens, just switch to bracket notation.

Access a specific cell¶

There are also a bunch of ways to look at a specific cell – here’s some!

We can add a column label (2nd parameter value) to .loc[] to grab a cell value by its index label and column name.

print(df.loc[9, 'degree_discipline'])

Spanish

We can also add a column integer (second parameter value) to .iloc[] to grab a cell value by its row integer position and its column integer position.

Hint

To figure out the integer of your column, start counting at 0 at the leftmost column.

print(df.iloc[9, 5])

Spanish

We can also use iat[] and at[] to view a single cell.

New function

df.iat[]: Access single value by index.

New function

df.at[]: Access single value by label.

print(df.at[9, 'degree_discipline'])
print(df.iat[9,5])

Spanish
Spanish

Get descriptive stats¶

There are also many functions in pandas to get computational or descriptive stats about your Series or DataFrames. Here’s a quick look at my two favorites (check out the full lists for Series and DataFrames here).

New function

df[].unique(): Returns unique values of Series.

New function

df[].value_counts(): Returns Series containing counts of unique value in column.

department_unique = df['degree_department'].unique()
print(department_unique)

unique_list = list(department_unique)
print(unique_list)

['Biochemistry and Molecular Biology' 'Biostatistics'
 'International Health' 'Health Policy and Management'
 'German and Romance Languages and Literatures' 'Anthropology'
 'Political Science' 'Earth and Planetary Sciences' 'Sociology'
 'Chemistry' 'Physics and Astronomy' 'Mathematics' 'Chemical Biology'
 'Biology' 'History' 'English' 'Philosophy'
 'Psychological and Brain Sciences' 'Molecular Biology and Genetics'
 'Neuroscience' nan 'Oncology' 'Biomedical Engineering'
 'Functional Anatomy and Evolution'
 'Biochemistry, Cellular and Molecular Biology'
 'McKusick-Nathans Institute of Genetic Medicine' 'Biological Chemistry'
 'Cellular and Molecular Medicine' 'Physiology' 'Cell Biology'
 'Community-Public Health' 'Applied Mathematics and Statistics'
 'Electrical and Computer Engineering' 'Mechanical Engineering'
 'Computer Science' 'Chemical and Biomolecular Engineering'
 'Geography and Environmental Engineering']
['Biochemistry and Molecular Biology', 'Biostatistics', 'International Health', 'Health Policy and Management', 'German and Romance Languages and Literatures', 'Anthropology', 'Political Science', 'Earth and Planetary Sciences', 'Sociology', 'Chemistry', 'Physics and Astronomy', 'Mathematics', 'Chemical Biology', 'Biology', 'History', 'English', 'Philosophy', 'Psychological and Brain Sciences', 'Molecular Biology and Genetics', 'Neuroscience', nan, 'Oncology', 'Biomedical Engineering', 'Functional Anatomy and Evolution', 'Biochemistry, Cellular and Molecular Biology', 'McKusick-Nathans Institute of Genetic Medicine', 'Biological Chemistry', 'Cellular and Molecular Medicine', 'Physiology', 'Cell Biology', 'Community-Public Health', 'Applied Mathematics and Statistics', 'Electrical and Computer Engineering', 'Mechanical Engineering', 'Computer Science', 'Chemical and Biomolecular Engineering', 'Geography and Environmental Engineering']

department_counts = df['degree_department'].value_counts()
print(department_counts)

Computer Science                                  8
Physics and Astronomy                             7
German and Romance Languages and Literatures      5
Biology                                           5
Chemistry                                         5
Mathematics                                       4
McKusick-Nathans Institute of Genetic Medicine    4
Mechanical Engineering                            4
Neuroscience                                      4
Political Science                                 4
Earth and Planetary Sciences                      3
Electrical and Computer Engineering               3
English                                           3
Biomedical Engineering                            2
Anthropology                                      2
Psychological and Brain Sciences                  2
Applied Mathematics and Statistics                2
International Health                              2
Biochemistry and Molecular Biology                2
Community-Public Health                           2
Sociology                                         1
Biostatistics                                     1
Molecular Biology and Genetics                    1
Geography and Environmental Engineering           1
Philosophy                                        1
Biological Chemistry                              1
Functional Anatomy and Evolution                  1
Health Policy and Management                      1
Cellular and Molecular Medicine                   1
Biochemistry, Cellular and Molecular Biology      1
Oncology                                          1
Physiology                                        1
Chemical and Biomolecular Engineering             1
Cell Biology                                      1
Chemical Biology                                  1
History                                           1
Name: degree_department, dtype: int64

Workshop: Pandas for Metadata Transformation and Cleanup

1. Basics of Series and DataFrames

Contents

1. Basics of `Series` and `DataFrames`¶

pandas documentation¶

What is a `DataFrame`?¶

What is a `Series`?¶

Make a `DataFrame`¶

View your `DataFrame`¶

Access a row¶

Access a Series¶

Access a specific cell¶

Get descriptive stats¶

Workshop: Pandas for Metadata Transformation and Cleanup

1. Basics of Series and DataFrames

Contents

1. Basics of Series and DataFrames¶

pandas documentation¶

What is a DataFrame?¶

What is a Series?¶

Make a DataFrame¶

View your DataFrame¶

Access a row¶

Access a Series¶

Access a specific cell¶

Get descriptive stats¶

1. Basics of `Series` and `DataFrames`¶

What is a `DataFrame`?¶

What is a `Series`?¶

Make a `DataFrame`¶

View your `DataFrame`¶