Mutation hunting in 100 exome samples
0
0
Entering edit mode
4 months ago
soniabedi.07 ▴ 10

Hi,

I have two main questions related to fishing out information in 100 vcf files

1) If I want to look for particular gene in 100 vcf files, or or 2) If I want to look for particular mutations in 100 vcf files,

How can achieve the above tasks ( not necessarily together) quickly?? Do I use R? If so, which package and how to go about it?

vcf R mutations • 310 views
1
Entering edit mode

bcftools should be your go to choice. Take a look at this page for querying and this one for filtering.

0
Entering edit mode

Thanks @GenoMax.

So how do I look for common mutations/genes within 100 vcf files?? Do I combine all vcf into 1 or hunt one by one??

0
Entering edit mode

I am hoping that your VCF files are annotated and contains all required information you are seeking. The best way is to combine all the VCFs into a multisample VCF file and perform your filtering. The links provided above will be helpful. Specifically, you can look into BCFtools merge.

When you have a multisample VCF file, variation in a sample is usually defined by its genotype (1/1 or 0/1). if it is 0/0, then there is no variant in that sample. You should also keep in mind to convert genotypes to missing (./.) or 0/0 if the genotype quality is below 20 or 30. This is a basic quality control procedure that needs to be followed. I am assuming that you are working with a diploid organism.

Then, you can keep only those loci that are related to your genes of interest. It is possible that you know your gene coordinates. You can use bedtools intersect (https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html) in order to get only those loci that are present in or around your gene. Or, check the links above. Hope this helps.

0
Entering edit mode

I will recommend bcftools isec based or previous experiences. Consider reading though this post