Question: merging fastq, sam or bam?
gravatar for ceboral
2.3 years ago by
ceboral0 wrote:

Hi all! I have some RNA-seq (single-read) datasets divided in two different SRA, one with ~30 million reads and the other with ~15 million reads. I have been reading that I could merge the fastq files, sam or bam files and I would like to know if there is any differences regarding the quality of the final dataset. Thanks!!

sam bam fastq • 1.2k views
ADD COMMENTlink modified 2.3 years ago by ATpoint36k • written 2.3 years ago by ceboral0

There should not be as long as you process them identically before merging the BAM files.

ADD REPLYlink written 2.3 years ago by genomax85k
gravatar for ATpoint
2.3 years ago by
ATpoint36k wrote:

I recommend to quality-trim & align them independently, with the aligner directly piped into SAMtools sort (that avoids the unnecessary SAM files). Then check the alignment rate for every file and keep only those that you feel comfortable with. I had it before that technical replicates (same library over multiple lanes over several years as part of a large published study) had strikingly different quality, with the first replicate showing like 95% alignment rate, and the last one like 40% with a lot of trash reads (maybe sample got degraded over time in the freezer, I don't know). In any case, do not merge too early as you may lose the ability to discard bad samples if necessary. Do not trust that published data are always good quality, there are a lot of junk datasets out there in the SRA.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by ATpoint36k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1055 users visited in the last hour