Simple Introduction to Pandas DataFrames with Examples

Pandas DataFrames are 2 dimensional data tables used for storing and representing data using rows & columns. Like any other 2 dimensional data table, a Pandas DataFrame uses rows to store the data instances and columns to store the data values in each instance. Let’s understand more with an example.

Pandas DataFrames
Pandas DataFrames

A firm wants to store the following details for all it’s clients: Client Name, Annual Revenue & Profit Percentage. The below table is an example of how it would look like:

Serial No.Client NameRevenueProfit %
1Client_A$300,00025%
2Client_B$250,00032%
3Client_C $ 180,00020%
4Client_D$320,00033%

You may have noticed how the data table is structured:

  • The field names are stored as columns names (e.g. Client Name, Revenue etc.).
  • The rows are identified by serial numbers (e.g. 1, 2, 3 etc.)
  • Based on the serial numbers (row names) & field names (column names) , the data is populated.

Pandas DataFrames allow you to store data in a similar way. Most importantly you can store a DataFrame in a single variable. Technically speaking a Pandas DataFrame is an object. There are a whole bunch of methods available to manipulate the DataFrames.

In other words, with single lines of codes you can flexibly cut and slice the data for better views and analysis. For example, you can drop columns or rows from Pandas DataFrames. Before getting into more details about data handling using DataFrames, let’s understand how to create a DataFrame in python. You can check some of the other articles in this blog to learn more about manipulating Pandas DataFrames using available functions (methods).

Creating DataFrame using Pandas Library

To use DataFrames in Python, you need to first install Pandas library. Once the installation is completed you need to import the DataFrame object from Pandas. Here’s the code to get the import done :

from pandas import DataFrame 

The DataFrame object accepts the following attributes : data , index, columns, dtype & copy. To get started , you really don’t need to understand all these attributes. The most frequently used ones are “data” , “columns” & “index“. The “data” attribute is used for storing the data, the “columns” attribute is used for storing the column names & the “index” attribute is used for storing the “row names“. Do note that you can create a DataFrame without explicitly passing values for “columns” & “index“. If you don’t pass values, the columns & rows will be named by the index values (i.e. 0,1,2,34 …).

In fact you can choose to not pass any attribute (including data). If you create a DataFrame without any attribute, you will be essentially creating an empty DataFrame.

Understanding data, columns & index attributes in Pandas DataFrames:

Python considers the data in each row in a DataFrame as a list. Thus the complete data in a DataFrame can be represented as a “list of lists“. We can pass a list of lists as the value for the data attribute while creating a DataFrame. Let’s try to create a DataFrame using the data from the above example by storing the data first as a list of lists:

from pandas import DataFrame

#Storing each client info in a separate list

L1= ['Client A', '$300,000', '25%']
L2= ['Client B', '$250,000', '32%']
L3= ['Client C', '$180,000', '20%']
L4= ['Client D', '$320,000', '33%']

#Creating a list of lists

all_data=[L1,L2,L3,L4]

#Passing the list of lists as Data attribute to a DataFrame
df=DataFrame(all_data)

df

The output looks like:

    0         1      2
0 Client A $300,000 25%
1 Client B $250,000 32%
2 Client C $180,000 20%
3 Client D $320,000 33%

So we have the data in the format that we need but the rows and columns are yet to be named.

To name the columns we need to use the “columns” attribute. The columns attribute accepts the column names as a list. We can update the above code to add the column names as shown below:

from pandas import DataFrame

L1= ['Client A', '$300,000', '25%']
L2= ['Client B', '$250,000', '32%']
L3= ['Client C', '$180,000', '20%']
L4= ['Client D', '$320,000', '33%']

all_data=[L1,L2,L3,L4]

df=DataFrame(all_data,columns=['Client Name','Revenue','Profit %'])

df

The DataFrame that we now get will come with the column names:

  Client Name      Revenue     Profit %
0 Client A         $300,000    25%
1 Client B         $250,000    32%
2 Client C         $180,000    20%
3 Client D         $320,000    33%

Similarly we can create a list of row names and pass it to the index attribute to add row names to the DataFrame.

There are 2 other interesting ways to create Pandas DataFrames :

  1. You can create a Pandas DataFrame from a python dictionary
  2. You can create a Pandas DataFrame from an external data source (e.g. an excel file)

Converting a Python Dictionary to a Pandas DataFrame

A Python Dictionary stores data in the key:value format. To create a Pandas DataFrame from a Python Dictionary , we need to ensure that the dictionary has the column names as keys & and the list of values as the values.

The following dictionary can be converted to the DataFrame shown in the above example:

d ={'Client Name': ['Client_A', 'Client_B', 'Client_C', 'Client_D'],
 'Profit %': ['25%', '32%', '20%', '33%'],
 'Revenue': ['$300,000', '$250,000', '$180,000', '$320,000']}

To convert a dictionary to a DataFrame, we will use the from_dict method :

df1 = DataFrame.from_dict(d)

Creating Pandas DataFrames from external excel or csv

You can easily convert an excel or CSV to a Pandas DataFrame. To learn more do read my article on using pandas to read excel or csv. It’s also important to note that Pandas also allows you easily export dataframes to excel or CSV. Thus, Pandas can be used seamlessly to analyze excel or CSV data.

FAQs: Pandas DataFrames

How to create an empty DataFrame in Pandas ?

You can create an empty DataFrame in Pandas using the following code-

from Pandas import DataFrame
df = DataFrame ()

What are the different ways to create a DataFrame in Pandas


You can create DataFrames in Pandas from External Data Sources, for Python Dictionaries & from Python list of lists.

Summary
Pandas DataFrames | Basic introduction tutorial with examples
Article Name
Pandas DataFrames | Basic introduction tutorial with examples
Description
This tutorial introduces you to Pandas DataFrames in Python. Learn how to create DataFrames using a variety of ways with examples.
Author
Publisher Name
Digital Marketing Chef
Publisher Logo

4 thoughts on “Simple Introduction to Pandas DataFrames with Examples”

  1. Excellent way of explaining, and nice piece of writing to obtain facts regarding
    my presentation subject matter, which i am going
    to convey in school.

Leave a Reply

Your email address will not be published. Required fields are marked *