Entering edit mode
8 months ago
kcarey
•
0
Hey! Could use some insights, I am changing my BAM 2 fq to align to a new genome and I wanted to make sure I extract the R1, R2 and unmapped reads, so I can use to re-align to new genome. I have sorted and indexed my files already. I am not sure what I am doing wrong
#extract paired end reads
samtools view -b -f 1 -o paired_end_reads.bam input_sorted.bam
#Get files
java -jar picard.jar SamToFastq -I file_sorted.bam -F file_R1.fastq -F2 file_R2.fastq -FU file_unpaired.fastq -TMP_DIR /projects/home/fastq
Can someone help?
Use
samtools collate
followed bysamtools fastq
in the same pipe like shown here: samtools collateHey ! Thank you for your response.
Giving this a try now. I have been trying to read to understand the difference between collate and then using samtools fastq, vs samtools sort, index, then Picard. So far I only see collate is the faster option to getting similar read names, ut not similar amongst groups. Would this pose a problem down the line?
Could you explain the difference, or why collate is more well suited for this?
My fear has been that I will not get the corrected paired end read data. I am using BAM files from a collaborator who does not have the original fastq reads still stored. I wanted to use Picard to account for singletons that I could then merge with my file to get all the possible reads, however, I was told this may be useless because singletons from the original alignment is only about 3% of the file.
I appreciate any information or links you can add! Thanks