Question: rename fasta headers in FastaAlternateReferenceMaker output
0
gravatar for mosquitoes
3.1 years ago by
mosquitoes0
United States
mosquitoes0 wrote:

Hi,

I would like to create a new fasta file from the original genome fasta and a vcf file. The fasta file will only have full gene sequences included.

I can use the gatk FastaAlternateReferenceMaker to accomplish this:

java -jar -Xmx16g ~/bin/GenomeAnalysisTK-3.6/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R ref_genome.fasta -o sample_SNV.fasta -V sample_SNV_selected.vcf -L ref_gene.bed

But I would like the output fasta to have the gene names as the header. For instance the current fasta output from gatk is:

 >1 chr01:2350
AGAAAGGACAGAAAAAAAGATGGTGAAGTAGAAAGAGGGCGAAATGAAAAAAGGGAAAGC
AAAAGAGATGATGAAAGTCATAGAGAGAGAGATGAAAAAAGGGAAAGCAAAAGAGATGAT

I would like the output to 1) not have a sequential numerical output and 2) to contain the gene name from column 4 of the .bed file.

Is there a way to either modify 1) the input bed file or 2) the output fasta file by giving 'some tool' the fasta and the bed file?

Thanks!

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by mosquitoes0
1

There are many threads related to renaming fasta file headers on biostars. Here are a couple but search for others
renaming all fasta headers in a file
replace fasta headers with another name in a text file

ADD REPLYlink written 3.1 years ago by genomax70k
0
gravatar for mosquitoes
3.1 years ago by
mosquitoes0
United States
mosquitoes0 wrote:

Thanks!

I used this python script and it worked great:

fasta= open('file.fasta')
newnames= open('list.txt')
newfasta= open('file_annot.fasta', 'w')

for line in fasta:
    if line.startswith('>'):
        newname= newnames.readline()
        newfasta.write(newname)
    else:
        newfasta.write(line)

fasta.close()
newnames.close()
newfasta.close()
ADD COMMENTlink written 3.1 years ago by mosquitoes0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1079 users visited in the last hour