Single Cell Sequencing Quality
0
0
Entering edit mode
5.6 years ago
hsbinf ▴ 30

I have some single cell sequencing data and I did some quick statistics on them to give feedback about the sequencing kits used (RepliG and PicoPlex). For some samples, I'm getting some discrepancies in the numbers and I was wondering if that's expected or is it an issue.

Here's some data for one of the sample:

samtools flagstat results

66783127 + 0 in total (QC-passed reads + QC-failed reads) 
8141047 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
66760206 + 0 mapped (99.97% : N/A)
58642080 + 0 paired in sequencing
29321040 + 0 read1
29321040 + 0 read2
53618530 + 0 properly paired (91.43% : N/A)
58597314 + 0 with itself and mate mapped
21845 + 0 singletons (0.04% : N/A)
274662 + 0 with mate mapped to a different chr
190889 + 0 with mate mapped to a different chr (mapQ>=5)
  1. According to samtools stats read length is 151 for 58,642,080 reads. So total bases = 151 * 58642080 = 8,854,954,080

  2. Breadth and depth calculated based on samtools depth :

    breadth = 35.44%
    depth = 26.49
    
  3. Total bytes / sum of all depths as per samtools depth for all bases = 2,234,214,697

  4. Total bytes / sum of all depths as per bedtools genomecov for all bases = 7,909,149,477

  5. Breadth and depth as per bedtools coverage (depth is higher than before when calculated using samtools depth which is expected)

    breadth = 35.34%
    depth = 123.304
    

As per my understanding, samtools depth only counts the reads with primary alignment and ignore secondary/supplementary alignments whereas bedtools genomecov counts everything that's in the bam file.

Now, my question is this, since there's a huge difference between the numbers in points 3 & 4 and point 4 is closer to point 1, does this mean this sample only has about 1/4th reads (designated as primary alignments) which are useful? I'm still trying to figure out if we need to eliminate the secondary/supplementary alignments and just keep primary ones or use all of them.

Plotting % targeted bases v read depth shows only about 5% bases with read depths 10+. Do you have any suggestions how this data can be improved or what is causing such issues? I see this in few single cell samples and not all of them.

TIA

single cell samtools bedtools • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 2510 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6