Question: Incorporating Raw Read Coverage Per Sample In Merged Vcf
0
gravatar for Drio
5.9 years ago by
Drio910
United States
Drio910 wrote:

I have a merged vcf file and I'd like to add the raw depth coverage per each sample for each of the snps. My plan was to extract the raw coverage from each bam and then add a new field to the merged vcf.

Notice that I am not asking for the DP field. I want to know what was the raw depth coverage at that locus on each of the samples.

Have you guys done that before? If so what approach do you follow? Is there any tool available for that or you have done it by hand?

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Drio910
1
gravatar for Drio
5.9 years ago by
Drio910
United States
Drio910 wrote:

I think this feature should be incorporated in the vcf-tools, concretely in the merge tool. In the meantime I wrote some code that implements what I described.

Thanks for your post Pierre. I found some "issues" with your solution:

  1. No header describing the new INFO field is added to the metadata. Maybe you added it but didn't pasted it?
  2. We have to pass the bams to your java tool. There is no need to do that since the path to the bams should be contained in the original vcf header. So the tool can extract them.
  3. It won't scale for big vcf files. Too many IOs to retrieve the coverage at site.

My solution can be found here. It implements 1 and 2 and overcomes 3 by only computing the RDP field when the functional consequence of the SNP is interesting. More details in the gist.

The usage looks like this:

$ gzip -cd merged.vcf.gz | java vcfAddCoverage

A gzipped input example can be found here.

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Drio910
0
gravatar for Pierre Lindenbaum
5.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum112k wrote:

For one sample the following java program reads a VCF from stdin and appends 'REALDP' to each sample (one bam per sample)

(warning: Not fully tested, I've played with some random bams/reference):

import java.io.BufferedReader;
import java.io.File;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;

import net.sf.samtools.SAMFileReader;
import net.sf.samtools.SAMRecordIterator;
import net.sf.samtools.SAMFileReader.ValidationStringency;


public class Biostar42544 {

    public static void main(String[] args) throws Exception
        {
        if(args.length==01) return;
        Pattern tab=Pattern.compile("[\t]");
        List<SAMFileReader> inputSams=new ArrayList<SAMFileReader>();
        for(int i=0;i< args.length;++i)
            {
            SAMFileReader inputSam=new SAMFileReader(new File(args[i]));
            inputSam.setValidationStringency(ValidationStringency.SILENT);
            inputSams.add(inputSam);
            }
        String line;
        BufferedReader in=new BufferedReader(new InputStreamReaderSystem.in));
        while((line=in.readLine())!=null)
            {
            if(line.startsWith("#"))
                {
                System.out.println(line);
                continue;
                }
            String tokens[]=tab.split(line);
            int pos=Integer.parseInt(tokens[1]);
            tokens[8]+=":REALDP";
            for(int i=0;i< tokens.length-9  && i< inputSams.size();++i)
                {
                int count=0;
                SAMRecordIterator iter=inputSams.get(i).query(tokens[0],pos, pos, false);
                while(iter.hasNext())
                    {
                    iter.next();
                    count++;
                    }
                iter.close();
                tokens[9+i]+=(":"+count);
                }
            for(int i=0;i< tokens.length;++i)
                {
                if(i>0) System.out.print('\t');
                System.out.print(tokens[i]);
                }
            System.out.println();
            }
        for(SAMFileReader i:inputSams) i.close();
        }

    }

Compile:

$ javac -cp .:sam-1.63.jar Biostar42544.java

Usage:

$ samtools mpileup -g -f ex1.fa sorted.bam sorted1.bam sorted2.bam | \ bcftools view -vcg - |\ java -cp .:sam-1.63.jar Biostar42544 sorted1.bam sorted2.bam

##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
(...)
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROMPOSIDREFALTQUALFILTERINFOFORMATex1ex1b
seq138.CA27.3.DP=6;VDB=0.0286;AF1=0.5002;AC1=2;DP4=3,0,3,0;MQ=60;FQ=25.6;PV4=1,1,1,1GT:PL:GQ:REALDP0/1:40,0,38:39:450/1:22,0,21:22:45
seq1107.CG5.83.DP=18;VDB=0.0029;AF1=0.4946;AC1=2;DP4=6,0,3,0;MQ=60;FQ=8.15;PV4=1,6.9e-06,1,1GT:PL:GQ:REALDP0/1:25,0,65:25:240/1:14,0,40:15:24
seq1288.AACATAG999.INDEL;DP=78;VDB=0.1070;AF1=0.5;AC1=2;DP4=18,12,24,12;MQ=60;FQ=999;PV4=0.62,0.36,1,0.017GT:PL:GQ:REALDP0/1:255,0,255:99:780/1:255,0,255:99:78
seq1548.CA999.DP=117;VDB=0.0921;AF1=0.5;AC1=2;DP4=33,24,12,39;MQ=60;FQ=999;PV4=0.0004,0.0075,1,1
(...)
seq2784.CAATTCAATTAATT999.INDEL;DP=147;VDB=0.1121;AF1=1;AC1=4;DP4=0,0,54,54;MQ=57;FQ=-147GT:PL:GQ:REALDP1/1:255,217,0:99:1471/1:255,108,0:99:147
seq21344.AC280.DP=90;VDB=0.1087;AF1=0.5;AC1=2;DP4=15,27,9,27;MQ=59;FQ=283;PV4=0.34,0.0082,0.0027,1GT:PL:GQ:REALDP0/1:179,0,215:99:960/1:136,0,172:99:96
ADD COMMENTlink written 5.9 years ago by Pierre Lindenbaum112k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 776 users visited in the last hour