Question: vcf to fasta CDS converter
0
gravatar for mosquitoes
4.4 years ago by
mosquitoes0
United States
mosquitoes0 wrote:

I need to create a fasta file that contains only CDS for each sample that I have NGS and genotyped using gatk. I've used gatk FastaAlternateReferenceMaker and then BEDtools and the .gff to pull out all the exons (or CDSs), but this does not put the coding sequences together for each gene. Also, gatk FastaAlternateReferenceMaker outputs a fasta with chromosome names listed chr1...etc. (i.e. not matching the names in the .gff). My genome has many contigs and it is time consuming to change these by hand. Is there a better way to do this? Any tools out there exist to go from a vcf file to a fasta file specific to each sample I've sequenced that has the CDS for each gene?

I need this fasta to eventually feed into PAML so I can calculate dn/ds for each gene. If there's a better way to do this also, please let me know.

Thanks!

 

sequencing next-gen • 2.0k views
ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 4.4 years ago by mosquitoes0
1
gravatar for Devon Ryan
4.4 years ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

Just use gtf2fasta (I think tophat comes with such an executable, but if not there are python scripts out there) after changing the chromosome names of either the fasta files or the GTF file (you could just use awk to do that). You'll need to modify the GTF such that it only contains the CDS entries and then rename those to "exon", since most conversion programs are expecting to make transcripts (again, you can do this with awk).

ADD COMMENTlink written 4.4 years ago by Devon Ryan89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1111 users visited in the last hour