Extracting variants from genic region
1
0
Entering edit mode
2.2 years ago
nagarsaggi ▴ 40

Dear community, I am doing a whole-genome phylogenomic analysis of a diploid fungus. I took only the primary (considering it as a collapsed assembly) contigs of phased assembly as a reference and aligned whole-genome sequencing reads of more than 150 isolates. I did extensive variant filtering but still, the number of the variants across all 150 isolates are too high to be handled. Hence I thought to only take variants from the genic region for further analysis. Therefore I have the following questions

Would this be okay if I only take variants in the genic region for further analysis?

If it is okay to use the genic region for population genetic analysis then could someone suggest to me how I best extract variants from the genic region (with the help of a gff file, I guess) from the multi-samples vcf file which I have generated using freebayes. Thanks

snp • 423 views
ADD COMMENT
0
Entering edit mode
2.2 years ago
awk -F '\t' '($3=="exon") {printf("%s\t%d\t%s\n",$1,int($4)-1,$5);}' file.gtf |\
     sort -T . -t $'\t' -k1,1 -k2,2n | bedtools merge > exons.bed

bcftools view -O z -o exons.vcf.gz --regions-file exons.bed indexed.vcf.gz
ADD COMMENT

Login before adding your answer.

Traffic: 1488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6