Hello, respectable minds;)
I was analyzing SNPs data where rows represent patients and columns represent SNPs, of course, there are some Nan values because some SNPs exist in some patients while don't exist in others, at first every sample was represented by 2 rows one to show the reference allele and the other row to show the alternative allele at every SNP.
I was trying to replace the Nan values in every single column with the Reference allele of that specific SNP (column) so my approach was to:
1- create a variable containing all elements of every col as pd.Series and Get the first valid value
2- Then use this value to replace Nan's in this specific column: After that i will remove all rows representing the REF allele
My code used a For loop to loop over every column to get the REF allele and use it to replace Nan as follows:
for col in df3.columns: s = pd.Series (df3[col]) first_valid_Ref_value = s.loc[s.first_valid_index()] print(first_valid_Ref_value) df3[[col]] = df3[[col]].fillna(first_valid_Ref_value)##
This piece of code took more than 7 hours to loop over 151865 SNPs (columns) and did not finish but suddenly windows required to restart and shut down my Linux VM that is hosted in windows 10 OS
Now I had 2 Questions:
1- Is there is a better way to loop over column and replace Nan values that saves time than the way I'm doing it ?
2- How to secure my code from being stopped while working in jupyter notebook, is there is a command like for example 'nohup' which is used in terminal that we can use in jupyternotebook such that if the note stopped suddenly our code is still running in the back ground ,, or else is there a way to restart from where we stopped instead of restart from the beginning?