A Tip A Day – Python Tip #6 – Pandas Merge

Pandas concat & append works like an array concatenation either column wise or row wise.

But Merge & Join works similar to Database table joins. Lets see Merge in this article and learn about Join in next post!

So, Merge can be used when we need to join two data frames having different columns and there is a key column in both the data frames.

For example, if we have two data frames, one with store details and another with product details of all the stores, we can merge the data frames using shop Id.

Signature:
pd.merge(left,
right,
how: str = ‘inner’,
on=None,
left_on=None,
right_on=None,
left_index: bool = False,
right_index: bool = False,
sort: bool = False,
suffixes=(‘_x’, ‘_y’),
copy: bool = True,
indicator: bool = False,
validate=None,
)

  • left & right params are data.
  • how -> mentions how the merge should happen. It takes values ‘left’, ‘right’, ‘inner’, ‘outer’. default value is ‘inner’.
    • ‘left’ -> Collects all data from left dataframe and common data of left and right.
    • ‘right’ -> Collects all data from right dataframe and common data of left and right.
    • ‘inner’ -> Collects only the common data from both the left and right dataframes. This more like an intersection.
    • ‘outer’ -> Collects all from left and right including common.
  • on -> Mentions which column should be taken as key column that exists in both the left and right dataframe.
Pandas Merge
Pandas Merge – Image by Author
import pandas as pd
shop = {
    'Shop_id' : ['SP01', 'SP02', 'SP03', 'SP04'],
    'City' : ['Chennai', 'Madurai', 'Trichy', 'Coimbatore'],
    'ZipCode' : [600001, 625001, 620001, 641001]
}

shop = pd.DataFrame(shop)
shop
Shop_idCityZipCode
0SP01Chennai600001
1SP02Madurai625001
2SP03Trichy620001
3SP04Coimbatore641001
product = {
    'Shop_id' : ['SP01', 'SP02', 'SP02', 'SP03', 'SP03', 'SP03', 'SP05'],
    'product_id' : ['p01', 'p02', 'p03', 'p01', 'p02', 'p03', 'p02'],
    'price' : [220, 500, 145, 225, 510, 150, 505]
}

product = pd.DataFrame(product)
product
Shop_idproduct_idprice
0SP01p01220
1SP02p02500
2SP02p03145
3SP03p01225
4SP03p02510
5SP03p03150
6SP05p02505

‘left’ -> Collects all data from left dataframe and common data of left and right.

pd.merge(shop, product, how = 'left', on = 'Shop_id')
Shop_idCityZipCodeproduct_idprice
0SP01Chennai600001p01220.0
1SP02Madurai625001p02500.0
2SP02Madurai625001p03145.0
3SP03Trichy620001p01225.0
4SP03Trichy620001p02510.0
5SP03Trichy620001p03150.0
6SP04Coimbatore641001NaNNaN

‘right’ -> Collects all data from right dataframe and common data of left and right.

pd.merge(shop, product, how = 'right', on = 'Shop_id')
Shop_idCityZipCodeproduct_idprice
0SP01Chennai600001.0p01220
1SP02Madurai625001.0p02500
2SP02Madurai625001.0p03145
3SP03Trichy620001.0p01225
4SP03Trichy620001.0p02510
5SP03Trichy620001.0p03150
6SP05NaNNaNp02505

‘inner’ -> Collects only the common data from both the left and right dataframes. This more like an intersection.

pd.merge(shop, product, how = 'inner', on = 'Shop_id') 
Shop_idCityZipCodeproduct_idprice
0SP01Chennai600001p01220
1SP02Madurai625001p02500
2SP02Madurai625001p03145
3SP03Trichy620001p01225
4SP03Trichy620001p02510
5SP03Trichy620001p03150

‘outer’ -> Collects all from left and right including common.

pd.merge(shop, product, how = 'outer', on = 'Shop_id')
Shop_idCityZipCodeproduct_idprice
0SP01Chennai600001.0p01220.0
1SP02Madurai625001.0p02500.0
2SP02Madurai625001.0p03145.0
3SP03Trichy620001.0p01225.0
4SP03Trichy620001.0p02510.0
5SP03Trichy620001.0p03150.0
6SP04Coimbatore641001.0NaNNaN
7SP05NaNNaNp02505.0

Excited to try this? Enjoy!

We will meet with a new tip in Python. Thank you! 👍

Like to support? Just click the heart icon ❤️.

Happy Programming!🎈.

Asha Ponraj
Asha Ponraj

Data science and Machine Learning enthusiast | Software Developer | Blog Writter

Articles: 86

Leave a Reply

Your email address will not be published. Required fields are marked *