Question: Filter variants supported by 20% of total reads
gravatar for Ks-seq
11 months ago by
Ks-seq10 wrote:

Hi All

How can I get all variants (.vcf) found in ≤20% of total reads in .bam?

Explanation :

  1. Single-End RNAseq, I have STAR 2-pass aligned .bam files (several samples).

  2. Competed Marked duplicates & sort (Picard), and SplitNCigarReads, BaseRecalibration (GATK)

  3. I generated .vcf (samtool mpileup and BCFTools call) containing number of reads supporting a variant (DP=raw_depth, AD=Total Allelic depth).

My aim is to:

A:get variant.vcf present in ≤20% of total reads in .bam file.

B: detect rare variants.

Is there any program or script available ?

Thanks in advance.

rna-seq snp sequence alignment • 220 views
ADD COMMENTlink modified 11 months ago by Pierre Lindenbaum133k • written 11 months ago by Ks-seq10

Your data is from an RNA-seq experiment, but you are calling variants with SAMtools...?

ADD REPLYlink written 11 months ago by Kevin Blighe69k

Thanks for your comment, Yes, I want to detect rare variant so I want to see everything before any filtering. Therefore, samtools mpileup will give me everything (i think so, if not then please c) And then correct me). Thereafter, I want to study variant present in ≤20% of total reads. Basically, I want get all variants found in ≤20% of total reads in .bam file.

ADD REPLYlink written 11 months ago by Ks-seq10

You could probably do this manually via mpileup output and even combinations of cut, uniq -c, and awk. There are likely many ways.

ADD REPLYlink written 11 months ago by Kevin Blighe69k

Could you please guide me with an example? THis is how my vcf looks lke:


Y 5325732 . T G,<*> 0 . DP=1;ADF=0,1,0;ADR=0,0,0;AD=0,1,0;I16=0,0,1,0,0,0,0,0,0,0,20,400,0,0,1,1;QS=0,1,0;SGB=-0.379885;MQ0F=0 PL:DP:SP:ADF:ADR:AD 4,3,0,4,3,4:1:0:0,1,0:0,0,0:0,1,0

Y 5325733 . T G,<*> 0 . DP=1;ADF=0,1,0;ADR=0,0,0;AD=0,1,0;I16=0,0,1,0,0,0,0,0,0,0,20,400,0,0,2,4;QS=0,1,0;SGB=-0.379885;MQ0F=0 PL:DP:SP:ADF:ADR:AD 4,3,0,4,3,4:1:0:0,1,0:0,0,0:0,1,0

Y 5325734 . A G,<*> 0 . DP=1;ADF=0,1,0;ADR=0,0,0;AD=0,1,0;I16=0,0,1,0,0,0,0,0,0,0,20,400,0,0,3,9;QS=0,1,0;SGB=-0.379885;MQ0F=0 PL:DP:SP:ADF:ADR:AD 4,3,0,4,3,4:1:0:0,1,0:0,0,0:0,1,0

Y 5325735 . T C,<*> 0 . DP=1;ADF=0,1,0;ADR=0,0,0;AD=0,1,0;I16=0,0,1,0,0,0,0,0,0,0,20,400,0,0,4,16;QS=0,1,0;SGB=-0.379885;MQ0F=0 PL:DP:SP:ADF:ADR:AD 4,3,0,4,3,4:1:0:0,1,0:0,0,0:0,1,0

Y 5325736 . G A,<*> 0 . DP=1;ADF=0,1,0;ADR=0,0,0;AD=0,1,0;I16=0,0,1,0,0,0,0,0,0,0,20,400,0,0,5,25;QS=0,1,0;SGB=-0.379885;MQ0F=0 PL:DP:SP:ADF:ADR:AD 4,3,0,4,3,4:1:0:0,1,0:0,0,0:0,1,0

I tried samtools mpileup and bcftools mpileup and bcftools call but cant get anything out. Appreciating you much for your help.

ADD REPLYlink modified 10 months ago • written 10 months ago by Ks-seq10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1854 users visited in the last hour