In this script I use genode GTF for mouse (last realase vM18)
At the beginning, I wanted to modify the mouse GTF annotation using a python script (modify some positions in chr12 annotation)
Of course I simplified my script to focus on my issue. I'll try to make it clear.
This is my script :
import os
import sys
import getopt
import os.path
import numpy as np
import pandas as pd
def usage():
print('Usage:\n')
print('\tpython ' +
sys.argv[0] + ' -g <genome type> -y <gtf file>')
print('\t\t-h or --help : display this help')
print('\t\t-g or --genome : will add the genome to the cytoband file name')
print('\t\t-y or --file_gtf : gtf file')
def main(argv):
genome = ""
file_gtf = ""
try:
opts, args = getopt.getopt(sys.argv[1:], 'g:y:', ['genome=', 'file_gtf=', 'help'])
except getopt.GetoptError:
usage()
sys.exit(2)
##############################OPTIONS##############################
for opt, arg in opts:
if opt in ('-h', '--help'):
usage()
sys.exit(2)
elif opt in ('-g', '--genome'):
genome = arg
elif opt in ('-y', '--file_gtf'):
file_gtf = arg
else:
print("Error : Bad option -> " + opt)
usage()
sys.exit(2)
##############################PROGRAM##############################
new_vcf_file = open(genome + ".gtf", "a")
with open(file_gtf, 'r') as vcf_f:
for line in vcf_f:
if line[0] != '#':
array_line = line.split("\t")
if array_line[0] == "chr12":
new_vcf_file.write(line)
else:
new_vcf_file.write(line)
else:
new_vcf_file.write(line)
new_vcf_file.close()
if __name__ == '__main__':
main(sys.argv[1:])
I know it does nothing because I simplified it. Just read and write lines.
I run my script with this command line :
python test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
At the end of this command a new GTF file is generated (testBiostar.gtf). To show my issue I count the number of line of this file and I delete it to re-run easily.
In this script I do not delete lines from the gencode GTF, so I expect to have the same number of lines
17:32:15 [ubuntu16@ubuntu16-VirtualBox Data]$ wc -l gencode.vM18.chr_patch_hapl_scaff.annotation.gtf
1818071 gencode.vM18.chr_patch_hapl_scaff.annotation.gtf
Then I run my script multiple times :
17:31:21 [ubuntu16@ubuntu16-VirtualBox Data]$ python /media/sf_raid/Projects/PTCB/test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
1754744 testBiostar.gtf
17:31:47 [ubuntu16@ubuntu16-VirtualBox Data]$ python /media/sf_raid/Projects/PTCB/test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
1754735 testBiostar.gtf
Everytime I launch the script I got different results (sometimes I got a result I already got...)
EDIT : While typing I found the answer but I don't know why (computer science as its finest)
I opened the testBiostar.gtf result file and I found that some lines were cut and glue to others. My hints were on the read/write process... I do not know where exactly, but I replaced :
new_vcf_file = open(genome + ".gtf", "a")
To :
new_vcf_file = open(genome + ".gtf", "w")
17:54:10 [ubuntu16@ubuntu16-VirtualBox Data]$ python /media/sf_raid/Projects/PTCB/test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
1818071 testBiostar.gtf
17:55:10 [ubuntu16@ubuntu16-VirtualBox Data]$ python /media/sf_raid/Projects/PTCB/test_biostar.py -g testBiostar -y gencode.vM18.chr_patch_hapl_scaff.annotation.gtf && wc -l testBiostar.gtf && rm testBiostar.gtf
1818071 testBiostar.gtf
My question is why it is working now ? I know the 'a' is for 'append to the end of the file' but after each run I delete this file
I bet it is a stupid answer... but I can not find it
Thanks
This have no relation to the answer, why you do not use python
argparse? this will save you theusage()function and help more in processing input args.Habits mostly, I learned python with
getoptso I keep using it. I will give a try toargparsethanksYou will not regret :D
I stopped counting how many time this happens to me ;)