With A/B testing there are times when you have to remove data from your dataset due to issues that skew your results. Pandas df.drop
is pretty well covered but I had a hard time finding out how to drop multiple rows based on multiple values. Below is a walk through of how I approached that problem.
A preview of my dataframe
print(test_data.shape) (84, 9)
So I have to remove two specific days that is only on our mobile traffic. First I created a new dataframe that only included the rows for those days. Here’s the final code but I’ll break it down in a minute.
srm_mobile = test_data[ (test_data["date"] == "20210223") & (test_data["devicecategory"] == "mobile") | (test_data["date"] == "20210302") & (test_data["devicecategory"] == "mobile") ].index
So you start with whatever the criteria is that you want to filter on, in this case a date and device type.
test_data["date"] == "20210223" & test_data["devicecategory"] == "mobile"
Then all you have to do is wrap each condition and separate them with a pipe |
to indicate the “or” function.
I’ve only tried it on a few lines but I believe you can add as many conditions as you need.
Next you follow the standard Pandas drop function.
test_data.drop(srm_mobile, inplace=True) print(test_data.shape) (80, 9)
As you can see, I’ve dropped the necessary four rows that were skewing the overall data. I can go on and finish the analysis.