Question: filtering VCF files
0
gravatar for Bogdan
4.3 years ago by
Bogdan1000
Palo Alto, CA, USA
Bogdan1000 wrote:

Dear all,

after reading the submission for SMC (Somatic Mutation Challenge), we have identified a submission does the filtering of the VCF files in the following way (please see below) : any suggestions regarding a package that implements this filtering strategies ? thanks !

  • Read depth filtering: remove mutations when at least 2/3 of mutant allele bases in the tumor sample are of base quality < 25
  • Mapping quality filter: remove mutations when the median mapping quality of reads supporting mutant allele is < 20
  • Read position filter: remove mutations when the mutant allele is localized only at the extremities of reads (+/- 8 bases)
  • Strand bias filter: remove mutations when fisher test indicates strand disequilibrium only for the mutant allele (threshold 0.001)
  • Match normal filter: remove mutations when the mutant allele is present in more than 3% of the reads at a quality > 25 in the matching normal sample
  • Simple repeats filter: remove mutations that fall into a repeated region of the genome
  • Centromer filter: remove mutations that fall into centromer or telomer regions of the genome
  • Panel of normal filter: remove mutations that appear to be a SNP (3% of mutant allele) in at least 2 of other normal genomes, or that are frequent sequencing error (> 1 read carrying mutant allele) in at least half of the genomes in the pane
vcf • 1.6k views
ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by Bogdan1000

please validate or comment your previous questions:

ADD REPLYlink written 4.3 years ago by Pierre Lindenbaum129k

Thank you gentlemen !

ADD REPLYlink written 4.3 years ago by Bogdan1000
2
gravatar for Pierre Lindenbaum
4.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

for almost all those tools I would use samtools view (SNP-POS) piped into https://github.com/lindenb/jvarkit/wiki/BioAlcidae to parse the reads and their cigar string in order to create a BED file. The VCF would be then filtered-out with this bed and betools.

ADD COMMENTlink written 4.3 years ago by Pierre Lindenbaum129k

thanks Pierre ! please consider validated the previous questions ! About BioAlcidae ..looks a bit too complicated, although a specific example would help certainly ! Is there perhaps an alternative way to BioAlcidae ? thanks !

ADD REPLYlink written 4.3 years ago by Bogdan1000
1

By "validate", Pierre means to accept correct answers and useful comments to your original questions using the green arrow or thumbs up buttons. That allows subsequent readers of your posts to assess the validity of the responses.

You may find it useful to review the posts on how to use Biostars (available here and here).

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by harold.smith.tarheel4.6k

I don't say it's easy :-) but as far as I can see , most all the filters you need requires programming a new tool.

ADD REPLYlink written 4.3 years ago by Pierre Lindenbaum129k

yes, quite complicated, especially when various somatic callers do not provide all the needed fields in the VCF files ;)

ADD REPLYlink written 4.3 years ago by Bogdan1000

If it can help: I quickly wrote a tool to insert a BAM into a sqlite3 database: https://github.com/lindenb/jvarkit/wiki/BamTosql : you 'll be able to filter the data using MAPQ, clipping, base quality etc...

ADD REPLYlink written 4.3 years ago by Pierre Lindenbaum129k

thanks Pierre, that was a great effort, thanks for sharing its results !

ADD REPLYlink written 4.3 years ago by Bogdan1000
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 678 users visited in the last hour