Using samtools with Long Read RNASeq data
1
0
Entering edit mode
2.3 years ago
Joshi • 0

Hi - Would appreciate help with this one ..

• The original file size for ENCFF653FOQ.bam is 300Mb
• To view the RNASeq file in IGV, I first needed to index it
• When I tried to index this using samtools index, it notified me that the BAM file wasn't sorted
• After sorting, the size of ENCFF653FOQ.sorted.bam is 88Mb

I ran samtools flagstat on both the original and sorted bam files; and see no difference.

What is being lost or removed when sorting the Long Read RNASeq file? Is samtools the right tool for handling long read rna-seq data?

$samtools flagstat ENCFF653FOQ.bam 647063 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 647063 + 0 mapped (100.00% : N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)$ samtools flagstat ENCFF653FOQ.sorted.bam
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
647063 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

$ls -l ENCFF653FOQ.bam ENCFF653FOQ.sorted.bam -rw-r--r-- 1 287M Apr 28 14:09 ENCFF653FOQ.bam -rw-r--r-- 1 84M Apr 28 19:10 ENCFF653FOQ.sorted.bam$ samtools --version
samtools 1.9
Using htslib 1.9
Copyright (C) 2018 Genome Research Ltd.

RNA-Seq Long read samtools • 747 views
2
Entering edit mode
2.3 years ago
GenoMax 118k

What is being lost or removed when sorting the Long Read RNASeq file?

Nothing is being lost or gained. When files are sorted similar sequences may be brought next to each other. Similar sequences compress better so that is one likely reason the size of your sorted file is smaller.

As a general suggestion, do not use file sizes as a metric, unless it is to ensure that the file is non-zero bytes i.e. a tool ran and produced output.