How can I create R8 (homopolymer repeat) filter without using illumina pipeline?
1
0
Entering edit mode
6.5 years ago
Can Holyavkin ▴ 240

illumina instruments have built-in -or online- analysis software for variant analysis (CASAVA). This software can filter out the false positive variants near the homopolymer repeats (AAAAAAAA) and filter them with "R8" tag.

Is it possible to make homopolymer repeat filter without using illumina's own pipeline? (with another software?)

P.S I noticed that, GATK have HomopolymerRun script that makes similar job. But it is no longer supported and recommended.

Edit: I posted same question on StackExchange. However, I couldn't find answer yet.

bioinformatics homopolymer filters repeat • 2.6k views
4
Entering edit mode
6.5 years ago

I wrote a tool to find the number of homopolymers around a variation: https://github.com/lindenb/jvarkit/wiki/VCFPolyX

\$ java  -jar dist/vcfpolyx.jar -R reference.fa input.vcf
(...)
2   1133956 .   A   G   2468.84 .   POLYX=23
2   1133956 .   A   AG  3604.25 .   POLYX=23
2   2981671 .   T   G   47.18   .   POLYX=24
(...)

0
Entering edit mode

Thank you. It will help much. :)

0
Entering edit mode

no sequence dictionary in the reference.

I've created one with picard CreateSequenceDictionary and is in sam format. But still the error persists. Your issue tracker link was broken, that is why I came here. What might be the problem?

0
Entering edit mode

did you create an index with samtools faidx ?

0
Entering edit mode

Initially I've indexed with samtools but this error appeared saying create with picard, so I've done with picard. But still it is not resolved. Screenshot.

0
Entering edit mode

what the name of the index created by picard. It should be hg19.dict

0
Entering edit mode

Thank you. Working well. I've used different name for reference file other hg19 but the VCF file is containing hg19, I thought this was the problem. Anyways thank you once again.