A Tip A Day – Python Tip #5 – Pandas Concat & Append

In this article we are going to learn about Concat & Append and the comparision between these two functions.

Concat Vs Append
Concat Vs Append – Image by Author
import pandas as pd

Lets take 2 dataframes of fruits.

fruit = {
    'orange' : [3,2,0,1],
    'apple' : [0,3,7,2],
    'grapes' : [7,14,6,15]
}
df1 = pd.DataFrame(fruit)
df1

Output:

orangeapplegrapes
0307
12314
2076
31215
fruit = {
    'grapes' : [13,12,10,2,55,98],
    'mango' : [10,13,17,2,9,76],
    'banana' : [20,23,27,4,0,9],
    'pear' : [21,24,28,51,22,25],
    'pineapple' : [30,33,38,30,36,31]
}
df2 = pd.DataFrame(fruit)
df2

Output:

grapesmangobananapearpineapple
01310202130
11213232433
21017272838
32245130
455902236
5987692531
df2 = df2.drop(df2.index[2]) 
df2

Output:

grapesmangobananapearpineapple
01310202130
11213232433
32245130
455902236
5987692531

Concat

Concat function takes the main argument “objs” as a set of objects.

Another key argument is axis.

If axis = 0 then concatenates the given two dataframes in row wise.

If both the dataframes contains rows with same index then both the rows will be retained without any change or reset in indexes.

pd.concat((df1, df2), axis = 0)

Output:

orangeapplegrapesmangobananapearpineapple
03.00.07NaNNaNNaNNaN
12.03.014NaNNaNNaNNaN
20.07.06NaNNaNNaNNaN
31.02.015NaNNaNNaNNaN
0NaNNaN1310.020.021.030.0
1NaNNaN1213.023.024.033.0
3NaNNaN22.04.051.030.0
4NaNNaN559.00.022.036.0
5NaNNaN9876.09.025.031.0

If axis = 1 then concatenates in column wise. If no data available with same index, then NaN will be filled.

For example, the second dataframe df2 does not contain record with index 2. So, After concatenation, the records of df2 will have NaN for index 2 record as df1 has record with index 2.

If both the dataframes contains same columns then both the columns will be retained without any change in column name.

pd.concat((df1, df2), axis = 1)

Output:

orangeapplegrapesgrapesmangobananapearpineapple
03.00.07.013.010.020.021.030.0
12.03.014.012.013.023.024.033.0
20.07.06.0NaNNaNNaNNaNNaN
31.02.015.02.02.04.051.030.0
4NaNNaNNaN55.09.00.022.036.0
5NaNNaNNaN98.076.09.025.031.0

Append

Append is the specific case of concat which concats the second dataframe’s records at the end of first dataframe.

Append has no axis argument.

Syntax of Append is different from Concat. Append considers the calling dataframe as main object and adds rows to that dataframe from the dataframes that are passed to the function as argument.

If any of the dataframe contains new columns that is not existing in calling dataframe, then it will be added as new column

Syntax: DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=False)

df1

Output:

orangeapplegrapes
0307
12314
2076
31215
df2

Output:

grapesmangobananapearpineapple
01310202130
11213232433
32245130
455902236
5987692531
df1.append(df2)

Output:

orangeapplegrapesmangobananapearpineapple
03.00.07NaNNaNNaNNaN
12.03.014NaNNaNNaNNaN
20.07.06NaNNaNNaNNaN
31.02.015NaNNaNNaNNaN
0NaNNaN1310.020.021.030.0
1NaNNaN1213.023.024.033.0
3NaNNaN22.04.051.030.0
4NaNNaN559.00.022.036.0
5NaNNaN9876.09.025.031.0
Result

Performance: Which is faster pandas concat or append?

Well, both are almost equally faster.

However there will be a slight change depending on the data.

  1. Append function will add rows of second data frame to first dataframe iteratively one by one. Concat function will do a single operation to finish the job, which makes it faster than append().

2. As append will add rows one by one, if the dataframe is significantly very small, then append operation is fine as only a few appends will be done for the number of rows in second dataframe.

%%time
df = pd.DataFrame(columns=['A'])
for i in range(30):
    df = df.append({'A': i*2}, ignore_index=True)

Wall time: 51.4 ms

%%time
df = pd.concat([pd.DataFrame([i*2], columns=['A']) for i in range(30)], ignore_index=True)

Wall time: 9.93 ms

3. Append function will create a new resultant dataframe instead of modifying the existing one. Due to this buffering and creating process, Append operation’s performance is less than concat() function. However Append() is fine if the number of append operation is a very few. If there are a multiple append operations needed, it is better to use concat().


We will meet with a new tip in Python. Thank you! 👍

Like to support? Just click the heart icon ❤️.

Happy Programming!🎈.

Default image
Asha Ponraj

Data science and Machine Learning enthusiast | Software Developer | Blog Writter

Articles: 84

Leave a Reply