I am trying to generate fastqs ( to realign them with different parameters) from RNA-seq bams aligned using STAR (run with --outSAMunmapped Within flag). The original fastq was paired end and stranded and I don't have access to that.
I used bedtools bam2fastq (
bedtools bamtofastq -i $bam -fq R1.fq -fq R2.fq ) to get the fastqs.
I got 113,214,136 reads for each fastq file. No. of reads in the bam file also matches this as the output from samtools view -c $bam is 226,428,272. (2*113,214,136).
samtools flagstat $bam 226428272 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 223692746 + 0 mapped (98.79%:-nan%) 226428272 + 0 paired in sequencing 113214136 + 0 read1 113214136 + 0 read2 223692746 + 0 properly paired (98.79%:-nan%) 223692746 + 0 with itself and mate mapped 0 + 0 singletons (0.00%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
However, this is strange because I have the fastqc files and the STAR log of these runs and both showed no. of input reads as 107,979,993. The STAR *.log.final.out is as --
Number of input reads 107,979,993 Average input read length 150 Uniquely mapped reads number 103,168,845 Uniquely mapped reads % 95.54% Number of reads mapped to multiple loci 3,443,385 % of reads mapped to multiple loci 3.19% Number of reads mapped to too many loci 36,808 % of reads mapped to too many loci 0.03% % of reads unmapped: too many mismatches 0.00% % of reads unmapped: too short 1.16% % of reads unmapped: other 0.07%
I can't figure what I am I missing here that no of input reads is less than the no. of reads in the BAM file and regenerated fastq ?!