Remove a row if two specific columns have similar/share contents
1
0
Entering edit mode
3.2 years ago
waqasnayab ▴ 250

Hi,

I have a line in VCF file:

chr1    11785112        .       T       G,T     .       str10   AC=6,0;ADP=72;AN=8;SF=0f,1f;STATUS=.    GT:ADR:ABQ:RDR:FREQ:RDF:ADF:PVAL:AD:SDP:RBQ:DP:RD:GQ    0/0:0:0:0:0%:0:0:1E0:0:46:0:46:0:0      1/1:5:47:0:100%:0:82:6.9142E-52:87:87:0:87:0:255     1/1:4:50:0:100%:0:51:1.0149E-32:55:84:0:84:0:255        1/1:3:49:0:100%:0:31:3.5146E-20:34:71:0:70:0:194

and might be many more like this. I want to remove such lines where REF base is repeated in the ALT bases. I tried:

awk '!($4~$5)' FA-MO-D1B-D1S.mpileup.output.snps.indel_srt_smplrnme_d1b_d1s.vcf > family.vcf

but no luck.

Any help is appreciated.

Waqas.

SNP next-gen • 514 views
ADD COMMENT
0
Entering edit mode

There are dedicated tools such as bcftools, gatk or vcftools as suggested in Remove positions that are non-variant in a subset of samples from a vcf file

ADD REPLY
0
Entering edit mode

Thanks Pierre, it worked like a charm!

Regards,

Waqas.

ADD REPLY
4
Entering edit mode
3.2 years ago
awk -F '\t' '/^#/ {print;next;} {N=split($5,a,/[,]/);P=1;for(i=1;i<=N;i++) {if(a[i]==$4) {P=0;break;}} if(P) print}' in.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6