Question: Filtering a multi-person structural variant VCF file by type of SV
1
gravatar for omid.alavijeh
5 weeks ago by
omid.alavijeh20 wrote:

Hello all,

First post so apologies for any etiquette breaches.

I have a large cohort of over 1000 patients who have had WGS done and then Canvas/Manta structural variant caller applied to their data. I have then merged these into a single VCF file.

In the ID (3rd column) I have the types of call made by one of the two alogrithms e.g. MantaDEL, MantaINV, CanvasGAIN etc etc. After each piece of text there is a string of number representing the graph plot associated with how that call was generated e.g. MantaDel0:1:3:00 which is unique to that call.

I am trying to create individual VCFs of each unique type of call. However, if I run the following code to see what the unique SV types are:

bcftools query -f '%ID/n' SV.vcf.gz

I unsurprisingly get a massive list of each "ID" with the unique number string after it when all I want is the first part, as in to know what are the types of SV call so I can then use bcftools to create individual file types for each.

I hope this makes sense and apologies if this has been covered elsewhere.

Many thanks

bcftools sv vcf • 121 views
ADD COMMENTlink modified 5 weeks ago by Pierre Lindenbaum124k • written 5 weeks ago by omid.alavijeh20

just curious: how did you merge those SV calls ?

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum124k

I used bcftools to merge the files.

ADD REPLYlink written 5 weeks ago by omid.alavijeh20

it won't work. The very "same" CNVs can be localized at the same place but with a few difference of number of bases.

eg:

chr1 100 . DEL . . END=1000
chr1 101 . DEL . . END=999
ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum124k

I was hoping once I had created the filed to merge calls based on similar distances. I was going to use Plink for the CNV analysis and potentially a few other tools. What would your approach be?

All the best

Omid

ADD REPLYlink written 5 weeks ago by omid.alavijeh20

when all I want is the first part, as in to know what are the types of SV call

an example is needed (input / output )

ADD REPLYlink written 5 weeks ago by Pierre Lindenbaum124k

Input would be a standard vcf file with the ID column being made up of calls similar to this:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

1 2827693 MantaDEL:0:2:5000 CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA C . PASS SVTYPE=DEL;END=2827680;BKPTID=Pindel_LCS_D1099159;HOMLEN=1;HOMSEQ=C;SVLEN=-66 GT:GQ 1/1:13.9

2 321682 . T MantaBND:5:7:1:0 6 PASS IMPRECISE;SVTYPE=DEL;END=321887;SVLEN=-105;CIPOS=-56,20;CIEND=-10,62 GT:GQ 0/1:12

2 14477084 . C MantaINS:88:22:00:3 12 PASS IMPRECISE;SVTYPE=DEL;END=14477381;SVLEN=-297;MEINFO=AluYa5,5,307,+;CIPOS=-22,18;CIEND=-12,32 GT:GQ 0/1:12

3 9425916 . C MantaINS:5:333:000 23 PASS IMPRECISE;SVTYPE=INS;END=9425916;SVLEN=6027;CIPOS=-16,22;MIINFO=L1HS,1,6025,- GT:GQ 1/1:15

And the output in this case would be three VCFs for "MantaINS" "MantaBND" and "MantaDEL".

I was planning on generating a list of all the IDs and then using Grep e.g. grep "MantaINS*" but was wondering if it could be done in one line from BCFtools.

Many thanks

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by omid.alavijeh20
3
gravatar for Pierre Lindenbaum
5 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:
for T in  MantaDEL MantaINS MantaBND
do
   bcftools view input.vcf  | awk -v T=${T} '{split($3,a,/\:/);if(a[1]==T) print $1,$2,$3;}'  > ${T}.txt
done
ADD COMMENTlink written 5 weeks ago by Pierre Lindenbaum124k
1

Dear Pierre,

Many thanks for this, it worked delightfully well (I changed to "print $0" as I wanted the whole line). I have upvoted and accepted your answer (is that the right thing to do?, let me know if I should celebrate your helpfulness any further)

All the best

Omid

ADD REPLYlink written 5 weeks ago by omid.alavijeh20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1988 users visited in the last hour