Question: merging fastq, sam or bam?
13 months ago
ceboral wrote:

Hi all! I have some RNA-seq (single-read) datasets divided in two different SRA, one with ~30 million reads and the other with ~15 million reads. I have been reading that I could merge the fastq files, sam or bam files and I would like to know if there is any differences regarding the quality of the final dataset. Thanks!!

ADD COMMENTlink modified 13 months ago by ATpoint15k • written 13 months ago by ceboral0

There should not be as long as you process them identically before merging the BAM files.

ADD REPLYlink written 13 months ago by genomax65k
13 months ago
ATpoint wrote:

I recommend to quality-trim & align them independently, with the aligner directly piped into SAMtools sort (that avoids the unnecessary SAM files). Then check the alignment rate for every file and keep only those that you feel comfortable with. I had it before that technical replicates (same library over multiple lanes over several years as part of a large published study) had strikingly different quality, with the first replicate showing like 95% alignment rate, and the last one like 40% with a lot of trash reads (maybe sample got degraded over time in the freezer, I don't know). In any case, do not merge too early as you may lose the ability to discard bad samples if necessary. Do not trust that published data are always good quality, there are a lot of junk datasets out there in the SRA.

13 months ago by ATpoint
