Question: Make fasta sequence of specific lengths from Vcf file of specific lengths
gravatar for shreyasibiswas88
2.2 years ago by
United States
shreyasibiswas8830 wrote:

Hi all,

I have a multi-sample vcf file of 350 loci spread across the genome from 85 individuals. I want to use this to ultimately build phylogenetic trees. For this i want to first make fasta sequences of 100kb length for each of the 350 loci for the 85 individuals, something similar to what GATK FastaAlternateRefrenceMaker does. I don't know how to specify sequence length in this tool though. Is it even possible to do what I am trying to do using this tool or any other tool? I would like some suggestions on how to take care of this.

Thank you.

ADD COMMENTlink modified 2.2 years ago by WouterDeCoster43k • written 2.2 years ago by shreyasibiswas8830

You could make the fasta files and then trim them to be of the length you want. 100kb sounds long if you are going to try and build phylogenetic trees.

ADD REPLYlink written 2.2 years ago by genomax80k
gravatar for WouterDeCoster
2.2 years ago by
WouterDeCoster43k wrote:

I don't know if your approach for phylogenetic trees with fragments like that makes sense, but I'll leave that to others.

How I would handle your project:

  1. Use the VCF and GATK FastaAlternateRefrenceMaker to make a fasta file per individual
  2. Make a bed file for each locus (100kb per interval)
  3. Use bedtools getfasta to slice out your 100kb fragments from the fasta files
ADD COMMENTlink written 2.2 years ago by WouterDeCoster43k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1856 users visited in the last hour