Hello All,
I used the code below to align my sample reads, running 10 parallel jobs and converting the aligned reads directly into sorted BAM files.
ls *_R1_val_1.fq.gz | sed 's/_R1_val_1.fq.gz//g' | sort -u | parallel -j 10 '
hisat2 --summary-file "'"$output_mapped"'"/{1}.hisat2.summary \
--dta -x /home/grch38_tran/genome_tran \
-q -1 {1}_R1_val_1.fq.gz -2 {1}_R2_val_2.fq.gz \
| samtools sort -o "'"$output_mapped"'"/{1}_sorted.bam
'
While reviewing the output for my first sample, I noticed that initially 32 temporary BAM files were generated. However, upon checking the directory again, I found that temporary files 1 to 30 had been removed, and only temporary files 31 to 36 remained. I'm aware that samtools removes intermediate files during the merging process, but I want to ensure that no data was lost during alignment—particularly due to potential memory issues.
Given that no index files were generated, could you please advise how I can verify the integrity and completeness of the resulting BAM files?
Thanks
I'd even run rather a
for
loop instead of aprarallel
job, but using hisat2's and samtools' multi-threading capabilities.Thanks for your response. I performed alignment summary
I ran this code to check on the stats of the bam file
Considering this result, is it possible that I still have truncation?
While the read numbers seem to match, I would be weary of using data where the analysis step did not complete normally.
Thank you for your information, I'll follow along and make justifications in my code.