Cannot Modify Fasta File description
1
0
Entering edit mode
10 weeks ago

I wanted to Generate genome indexes using a tool called STAR, this requires reference genome sequences (FASTA files) and annotations (GTF file), when I launched STAR a message popped up:

Fatal INPUT FILE error, no valid exon lines in the GTF file: /content/drive/MyDrive/gencode.v34.annotation.gtf
Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.


So I checked chromosome naming and found that chromosomes in the reference genome are named [chromosome 1] and in the GTF file they are named [chr1], so thought to change chromosomes naming in the reference genome using this code:

But when I check the document I found them unchanged:

Just wanted to know what I'm doing wrong.

biopython • 487 views
2
Entering edit mode

When you modify a file's contents in memory, the file doesn't change. You'll need to write the in-memory contents to a new file for that file to contain the changes you make.

1
Entering edit mode

Save time and if possible download matching annotation files from whichever location you choose to get the sequence data from. Everything will match without having to mess with this sort of thing.

1
Entering edit mode

from the 2nd image, it looks like chromosome names are not in the form chromosome1, chromosome2 .. so on. they are in the form NC_***, so replacing chromosome1 to chr1 would not help as chromosome names from fasta file and GTF still be different. I would follow the GenoMax suggestion for the ease

1
Entering edit mode
10 weeks ago
Alban Nabla ▴ 20

If you are using the records on the fly during the for loop iteration, then you could simply change the assignment to:

for record in SeqIO.parse('yourfile.fsa', 'fasta'):
record.description = record.description.replace('chromosome', 'chr')
## do something with the record here


If you can't use these edited records on the fly, then you will need to save the changes to a new fasta file using Bio.SeqIO.Write(), as mentioned by Ram