Question: Single Cell Sequencing Quality
0
gravatar for hshah
10 weeks ago by
hshah20
hshah20 wrote:

I have some single cell sequencing data and I did some quick statistics on them to give feedback about the sequencing kits used (RepliG and PicoPlex). For some samples, I'm getting some discrepancies in the numbers and I was wondering if that's expected or is it an issue.

Here's some data for one of the sample:

samtools flagstat results

66783127 + 0 in total (QC-passed reads + QC-failed reads) 
8141047 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
66760206 + 0 mapped (99.97% : N/A)
58642080 + 0 paired in sequencing
29321040 + 0 read1
29321040 + 0 read2
53618530 + 0 properly paired (91.43% : N/A)
58597314 + 0 with itself and mate mapped
21845 + 0 singletons (0.04% : N/A)
274662 + 0 with mate mapped to a different chr
190889 + 0 with mate mapped to a different chr (mapQ>=5)
  1. According to samtools stats read length is 151 for 58,642,080 reads. So total bases = 151 * 58642080 = 8,854,954,080

  2. Breadth and depth calculated based on samtools depth :

    breadth = 35.44%
    depth = 26.49
    
  3. Total bytes / sum of all depths as per samtools depth for all bases = 2,234,214,697

  4. Total bytes / sum of all depths as per bedtools genomecov for all bases = 7,909,149,477

  5. Breadth and depth as per bedtools coverage (depth is higher than before when calculated using samtools depth which is expected)

    breadth = 35.34%
    depth = 123.304
    

As per my understanding, samtools depth only counts the reads with primary alignment and ignore secondary/supplementary alignments whereas bedtools genomecov counts everything that's in the bam file.

Now, my question is this, since there's a huge difference between the numbers in points 3 & 4 and point 4 is closer to point 1, does this mean this sample only has about 1/4th reads (designated as primary alignments) which are useful? I'm still trying to figure out if we need to eliminate the secondary/supplementary alignments and just keep primary ones or use all of them.

Plotting % targeted bases v read depth shows only about 5% bases with read depths 10+. Do you have any suggestions how this data can be improved or what is causing such issues? I see this in few single cell samples and not all of them.

TIA

single cell samtools bedtools • 139 views
ADD COMMENTlink modified 10 weeks ago by RamRS19k • written 10 weeks ago by hshah20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1661 users visited in the last hour