Question: Extract below 100% overlapping intervals among samples from WGS data
0
gravatar for FL512
17 days ago by
FL5120
FL5120 wrote:

I am thinking about how I can extract shared overlap interval from WGS data with arbitrary percentage.

According to the bedtools document, overlapping intervals can be extracted. https://bedtools.readthedocs.io/en/latest/content/tools/intersect.html This is very useful and working well for me if I have a few samples.

However, I am analyzing several hundreds of samples, ended in no overlapped interval detected. This is understandable, let's say if 99 samples have T/A variant on the Chr1 position 1 but 1 sample does not have it, it results in no shared overlap interval. To overcome this situation, I would liked to extract variants that are overlapped in more than 99% among samples, 95%, 90% or even less, until I can find the overlapping intervals.

Does anyone know how to do it or could you please let me know the helpful websites? Or maybe GATK SelectVariants is doable?

Thank you!

gatk wgs bedtools • 77 views
ADD COMMENTlink modified 17 days ago by Pierre Lindenbaum134k • written 17 days ago by FL5120
1
gravatar for Pierre Lindenbaum
17 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum134k wrote:

filter on samtools depth+bed and then use the bed to filter the vcf

samtools depth S*.bam | awk '{N=0;for(i=3;i<=NF;i++) {if(int($i)>0) N+=1.0;} if((N/(NF-2)) >= 0.9) printf("%s\t%d\t%s\n",$1,int($2)-1,$2);}' | bedtools merge

RF01    10  3295
RF02    20  2668
RF03    9   2585
RF04    21  2352
RF05    15  1565
RF06    31  1348
RF07    12  1063
RF08    8   1056
RF09    11  1036
RF10    6   340
RF10    397 741
RF11    2   272
RF11    390 663
ADD COMMENTlink modified 17 days ago • written 17 days ago by Pierre Lindenbaum134k

Dear Pierre, Thank you very much for your quick & kind response. I appreciate it. I will give it a try tonight and let you know the results.

ADD REPLYlink written 17 days ago by FL5120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour
_