Removing multiple rows from your data with Python

Categories Python

With A/B testing there are times when you have to remove data from your dataset due to issues that skew your results. Pandas df.drop is pretty well covered but I had a hard time finding out how to drop multiple rows based on multiple values. Below is a walk through of how I approached that problem.

A preview of my dataframe

 print(test_data.shape) 
 (84, 9) 

So I have to remove two specific days that is only on our mobile traffic. First I created a new dataframe that only included the rows for those days. Here’s the final code but I’ll break it down in a minute.

srm_mobile = test_data[
     (test_data["date"] == "20210223") & (test_data["devicecategory"] == "mobile")
     | (test_data["date"] == "20210302") & (test_data["devicecategory"] == "mobile")
 ].index

So you start with whatever the criteria is that you want to filter on, in this case a date and device type.

test_data["date"] == "20210223" & test_data["devicecategory"] == "mobile"

Then all you have to do is wrap each condition and separate them with a pipe | to indicate the “or” function.

I’ve only tried it on a few lines but I believe you can add as many conditions as you need.

Next you follow the standard Pandas drop function.

test_data.drop(srm_mobile, inplace=True)

print(test_data.shape)
(80, 9) 

As you can see, I’ve dropped the necessary four rows that were skewing the overall data. I can go on and finish the analysis.