I have a strange problem (at least till I find what silly mistake I am making). A short summary is presented below with a link to the stackoverflow question (https://stackoverflow.com/questions/51129600/python-pycharm-runtimes).
Essentially, I generate a large dataframe upstream in my code and then pass it to a FOR loop that generates groups (2.8 million) using a SINGLE column and saves to a list. The large dataframe (temp_df1) is 10million rows X 18 cols. This GROUPBY operation takes 25 mins to run as it creates groups and appends to a list. This is significantly long. So I tested the code by saving the large dataframe (temp_df1) to a CSV and then batching that in. The GROUPBY operation when run on this pre-saved and batched in CSV only takes 7 minutes.
So what is it that I am doing that is causing such a drastic difference in run times?
I am using the latest version of PyCharm. Have no SettingwithCopy warnings. Running on a machine with 256 GB RAM and 24 cores so no memory or CPU limitations either. Thanks.