Entering edit mode
6 months ago
Maxine • 0
I have a VCF file including millions of structural variations. I want to filter those SVs by their locations (exons, introns, intergenic or mixed). Is any mature pipeline I can follow? Thanks in advance.
get a BED of exons and the extract the SV using "bcftools view --regions-file exons.bed SV.indexed.vcf.gz"
In my understanding, it would work to determine if the POS from VCF is located in an exon region. Am I right? But the thing is the POS only stores the start position of an SV, its end position is stored in INFO column.
the VCF should contain the INFO/END attrribute in the INFO column.
Yes, the INFO/END exit in VCF file. But does "bcftools view --region-file ..." subset VCF based on only POS column? I wonder "bcftools view" take account of INFO/END.
and you can just try it.
you can try to run it through SnpEff ? (given that your genome is available for it)
As I posted above, for SV, the end position should be considered. I'm not familiar with SnpEff, can it manipulate SV data?
As lieven.sterck noted, your most straightforward option is SnpEff. If you have a standard VCF format, you don't need to manipulate anything. Do you have any idea what variant caller was used for calling the SVs?