For a set of downloaded bam files from PRJNA625920 in SRA, I used 10x Genomics' "bam to fastq" tool but got 25 fastq files per sample per lane per read like this (the same goes for R1 and R2):
Donor10OC.bam_S1_L001_I1_001.fastq.gz
Donor10OC.bam_S1_L001_I1_001.fastq_2.gz
Donor10OC.bam_S1_L001_I1_001.fastq_3.gz
..
..
Donor10OC.bam_S1_L001_I1_001.fastq_24.gz
Donor10OC.bam_S1_L001_I1_001.fastq_25.gz
I assume that these are technical replicates as they represent the same sample (S1) and the same lane (L001).
Is my assumption correct? If yes, merging them by simply concatenating them does the job or something else should be done? If no, how to handle this situation?
Your expert advice is highly appreciated.
It looks like
bamtofastq
defaults toHave you checked to see how many reads there are in each file? 25 is a large number of files if you did not change the default above.
Thanks for your quick reply GenoMax! I checked the read counts for the largest and smallest files using the code shared here and got the following numbers:
As long as the files came from one BAM that you know belonged to one sample it should be fine to
cat
the files together. Are you planning to runcellranger
? It may understand the file pieces so you may not need to do anything.