Question: Bedtools intersect/bcftools annotate dublicate the SNPs
0
gravatar for User000
4 weeks ago by
User000290
User000290 wrote:

Hello,

I have a VCF file with SNPs and BED file with genes. I would like to extract only SNPs that reside on the genes. With the script below I could achieve this goal. However, in the output file some SNPs get doubled then and some are not:

chr1B   812345  .   G   A   214.635
chr1B   812345  .   G   A   214.635
chr1B   812413  .   A   T   601.948
chr1B   812413  .   A   T   601.948
chr1B   824119  .   T   C   349.381
chr1B   824169  .   T   C   241.202
chr1B   824175  .   C   T   225.089
chr1B   824186  .   C   G   1480.12
chr1B   824591  .   T   C   164.335

This is very strange, do you think there is some bug there? How to solve this? I am using bcftools version: 1.10.2-27-g9d66868 and bedtools version: v2.29.2

(SAMPLES,) =glob_wildcards('/path/to/{sample}.vcf.gz')
rule all:
    input: 
        expand("{sample}.flanking.vcf", sample=SAMPLES)

rule bedtools:
    input:
        invcf="/path/to/{sample}.vcf.gz",
        bedgz="/path/to/input.bed.gz"
    output:
        outvcf="{sample}.flanking.vcf"
    shell:
        """
        /Tools/bedtools2/bin/bedtools intersect -a {input.invcf} -b {input.bedgz} -header -wa | /Tools/bcftools/bcftools annotate -a {input.bedgz} -c CHROM,FROM,TO,GENE -h <(echo '##INFO=<ID=GENE,Number=1,Type=String,Description="Gene name">') > {output.outvcf}
        """
snp bcftools bedtools • 104 views
ADD COMMENTlink modified 29 days ago • written 4 weeks ago by User000290
1
gravatar for User000
29 days ago by
User000290
User000290 wrote:

I found a solution here: link. Hope could be useful for the community. So I modified my shell like this:

/Tools/bedtools2/bin/bedtools intersect -a {input.invcf} -b {input.bedgz} -u -header -wa | /Tools/bcftools/bcftools annotate -a {input.bedgz} -c CHROM,FROM,TO,GENE -l GENE:append -h <(echo '##INFO=<ID=GENE,Number=.,Type=String,Description="Gene name">') > {output.outvcf}

This would give just one single snp with all the genes that overlap for that snp.

ADD COMMENTlink written 29 days ago by User000290
1
gravatar for Carlo Yague
4 weeks ago by
Carlo Yague4.9k
Canada
Carlo Yague4.9k wrote:

Could it be that some SNP fall onto multiple genes ? Because of overlapping genes or sense/antisense gene pairs. To check this, I would just look at the output of

intersect -a {input.invcf} -b {input.bedgz} -header -wa -wb > out.bed

where the option -wb ensure that the full gene entry of input.bedgz is outputed for every match. Btw I'm not sure to see the point of using bcftools annotate here, when you already matched the SNP with the genes using bedtools.

ADD COMMENTlink written 4 weeks ago by Carlo Yague4.9k

oh, I use annotate only to append the gene name in the INFO field...

ADD REPLYlink written 4 weeks ago by User000290

The gene is exactly the same, the scores everything

ADD REPLYlink written 4 weeks ago by User000290
1

I can see that the gene is the same after you use bcftools annotate, but is it the same using bedtools interesct -wa -wb ? It shouldn't be the same then.

ADD REPLYlink written 4 weeks ago by Carlo Yague4.9k

you are right, there are overlapping genes, basically -wa -wb did not show anything, but I checked the positions on duplicated SNPs manually and see that the genes are overlapping (because I am using flanking +/- 1kb). Any ideas how to solve this? :/

ADD REPLYlink written 4 weeks ago by User000290

Sorry, I just realized it is the last column showing the genes! and it is showing the right genes, but I need this information (the name of the gene) to be inside the INFO field......how can I do that?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by User000290
0
gravatar for User000
4 weeks ago by
User000290
User000290 wrote:

Found the solution: https://groups.google.com/forum/#!topic/bedtools-discuss/Vi1-PVGqH18

ADD COMMENTlink written 4 weeks ago by User000290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1084 users visited in the last hour