Question: Mismatch number of reads after bedtools bam2fastq on RNA-seq data
0
gravatar for aditi.qamra
17 months ago by
aditi.qamra260
Toronto
aditi.qamra260 wrote:

Hi,

I am trying to generate fastqs ( to realign them with different parameters) from RNA-seq bams aligned using STAR (run with --outSAMunmapped Within flag). The original fastq was paired end and stranded and I don't have access to that.

I used bedtools bam2fastq ( bedtools bamtofastq -i $bam -fq R1.fq -fq R2.fq ) to get the fastqs.

I got 113,214,136 reads for each fastq file. No. of reads in the bam file also matches this as the output from samtools view -c $bam is 226,428,272. (2*113,214,136).

samtools flagstat $bam

226428272 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
223692746 + 0 mapped (98.79%:-nan%)
226428272 + 0 paired in sequencing
113214136 + 0 read1
113214136 + 0 read2
223692746 + 0 properly paired (98.79%:-nan%)
223692746 + 0 with itself and mate mapped
0 + 0 singletons (0.00%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

However, this is strange because I have the fastqc files and the STAR log of these runs and both showed no. of input reads as 107,979,993. The STAR *.log.final.out is as --

Number of input reads        107,979,993
Average input read length       150
Uniquely mapped reads number        103,168,845
Uniquely mapped reads %         95.54%
Number of reads mapped to multiple loci          3,443,385
% of reads mapped to multiple loci      3.19%
Number of reads mapped to too many loci          36,808 
% of reads mapped to too many loci      0.03%
% of reads unmapped: too many mismatches        0.00%
% of reads unmapped: too short      1.16%
% of reads unmapped: other      0.07%

I can't figure what I am I missing here that no of input reads is less than the no. of reads in the BAM file and regenerated fastq ?!

Thanks!

rna-seq alignment • 669 views
ADD COMMENTlink modified 17 months ago by lakhujanivijay4.6k • written 17 months ago by aditi.qamra260

Hello aditi.qamra,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLYlink written 17 months ago by lakhujanivijay4.6k

Thanks Vijay! Sorry I dint know about this option before. Will take care

ADD REPLYlink written 17 months ago by aditi.qamra260
1
gravatar for h.mon
17 months ago by
h.mon28k
Brazil
h.mon28k wrote:

Probably the extra reads are due to multi-mappers or other secondary alignments being pulled from the bam more than once. If this is undesired, you have to filter the bam to contain only primary alignments, or you may use picard SamToFastq, which by default includes only primary alignments into the output.

Also, did you sort the bam file by name before bedtools bamtofastq? The online documentation states:

When using this option, it is required that the BAM file is sorted/grouped by the read name. This keeps the resulting records in the two output FASTQ files in the same order. One can sort the BAM file by query name with samtools sort -n aln.bam aln.qsort.

ADD COMMENTlink modified 17 months ago • written 17 months ago by h.mon28k

Thanks h.mon. I did sort them before using bedtools. But you are right it is because of multimappers in the bam.

ADD REPLYlink written 17 months ago by aditi.qamra260
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 899 users visited in the last hour