Coverage in whole genome sequencing
1
0
Entering edit mode
6.5 years ago
Tania ▴ 180

Hi everyone

If a range of positions is annotated as no-call in whole genome <var-GSxx-ASM.tsv.bz2> in cgatools output files. I am interested in a varaint in this region that doesn't show up in the variants file and I need to check the coverage. Does no-call means no reads here. I read the definition in cgatools but did not completely get it

Thank you

WGS Coverage • 1.7k views
ADD COMMENT
0
Entering edit mode

Would have been good to mention this is data from Complete Genomics (right?).

ADD REPLY
0
Entering edit mode

I think yes. It is old data in the lab, have to post-process and just have the output files from cgatools.

ADD REPLY
1
Entering edit mode

From the Complete Genomics (now BGI) pipeline, a 'no-call' indicates that they are "uncertain as to whether the genome contains this variant" - this is pulled from documentation tat I had when I was last analysing Complete Genomics data.

Your particuar call is 'no-call-rc', which is 'no call, reference consistent':

Please explain “no-call-ri”, “no-call-rc”, “ref-consistent” and “ref-inconsistent” in the var file. How should I use these?

All no-call variant types indicate that the sequence could not be fully resolved, either because of limited or no information, or because of contradictory information. When some portions of the allele sequence can be called but others not, we indicate this as “no-call-rc” (no-call, reference-consistent) if those called portions are the same as the reference. We use no-call-ri (no-call, referenceinconsistent) if they are not. Ref-consistent and ref-inconsistent are the names for no-call-rc and nocall-ri, respectively, used by versions of Complete Genomics pipeline versions prior to 1.7. We changed the names to highlight the fact that these alleles contain no-calls. In some cases, one may wish to be conservative and consider any such region entirely no-called, and thus neither a match nor a mismatch between sample and reference.

[source: http://www.completegenomics.com/documents/Small+Variants+FAQ.pdf]

The coverage information should be found in one of the many files that cgatools produces, if not the var-GSxx-ASM.tsv file

ADD REPLY
0
Entering edit mode

Yes, I read this kevin, thank you :) I just want to understand more what uncertain mean. Like there is no reads covering this region at all, or there are reads but low confidence. Like uncertain is too open to know what is going on here.

The reason is I am checking a varaint. it is found in the exome data of the effected. But the proband, we have the whole genome. I really want to validate if the variant is really missing here or there are no reads, ..etc.

ADD REPLY
1
Entering edit mode

The idea that I have in my head is that these no-call tags are more related to low base qualities and inconsistent base calls, as opposed to low depth of coverage. In this sense, my feeling is that they cannot be trusted. It has been quite some time since I last looked at CG data, though. We were interacting with them long before they were even purchased by BGI in China.

ADD REPLY
3
Entering edit mode
6.5 years ago
Sharon ▴ 610

Hi Tania

This is how I turn around this issue. I used evidence2sam from cgatools to convert the evidence file output from cgatools to sorted.bam and looked up this bam in IGV. Here is my command:

chr1-GSNN.sorted.bam:
        ${CGATOOLS}/cgatools evidence2sam \
        --beta \
        --evidence-dnbs=evidenceDnbs-chr1-GSNN-ASM.tsv.bz2 \
        --reference=build37.crr | samtools view -uS - | \
        samtools sort -T gsnn_1.sorted -o chr1-GSNN.sorted.bam && samtools index chr1-GSNN.sorted.bam
ADD COMMENT
1
Entering edit mode

Good answer, Sharon

ADD REPLY
1
Entering edit mode

I am your student Kevin :) :)

ADD REPLY
1
Entering edit mode

Thanks Sharon and Kevin a lot. I will try that !

ADD REPLY

Login before adding your answer.

Traffic: 804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6