Question

Converting BAM to Fastq with HTSlib htscmd

0

Entering edit mode

10.9 years ago

trakhtenberg ▴ 160

I found 2 posts recommending to use HTSlib htscmd for converting BAM to Fastq, but both output only interleaved file paired reads. I used this tool to deinterleave.

After deinterleaving the paired.fq from this command: htscmd bamshuf -Ou input.bam tmp-prefix | htscmd bam2fq -s se.fq.gz - | gzip > pe.fq.gz (Source) I got only 41k pairs.

After deinterleaving output of this command htscmd bamshuf -uOn 128 aln_reads.bam tmp | htscmd bam2fq -a - | gzip > interleaved_reads.fq.gz (Source), I got 232.7k reads for one end and 214.4k reads for the other end.

Is there a parameter in HTSlib htscmd which could instruct to output dedeinterleaved reads? Or is there a more fitting tool for deinterleaving?

My Bam file is from Tophat, and I would like to re-analyze these reads after filtering again with Tophat. Is it important to integrate them back with paired reads for re-analysis? It appears from here that singletons are not passed on to Cufflinks by Tophat for FPKM, but since they mapped, I would think that the Tophat/Cufflinks pipeline would make use of them? Are singletons tend to be splice-junction reads?

Fastq Bam2Fastq Tophat BAM HTSlib • 5.5k views

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by trakhtenberg ▴ 160

Ram · Answer 1 · 2014-09-03

0

Entering edit mode

10.9 years ago

aniketd86 ▴ 150

Q] Or is there a more fitting tool for deinterleaving?

Apart from HTSlib, you could use Picard's SamToFastq command to create 2 paired end fastq files from your bam file.

For example:

java  -jar <path_to>/SamToFastq.jar \
    INPUT=<Input_file.bam> \
    FASTQ=<output_pe1.fastq> \
    SECOND_END_FASTQ=<output_pe2.fastq>  \
    UNPAIRED_FASTQ=<output_up.fastq> \
    VALIDATION_STRINGENCY=SILENT \

ADD COMMENT • link updated 5.8 years ago by Ram 45k • written 10.9 years ago by aniketd86 ▴ 150

0

Entering edit mode

I encountered issue with this approach too (even using VALIDATION_STRINGENCY=SILENT), see here Picard error Illegal Mate State in converting BAM to Fastq Let me know if you have a solution for this. thanks

ADD REPLY • link 10.9 years ago by trakhtenberg ▴ 160

0

Entering edit mode

Hi, @aniketd86. I tried your code, the warning of "Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Found 52323 unpaired mates" did disappear, but I think it is probably due to the parameter "VALIDATION_STRINGENCY=SILENT", and the unpaired reads were still discarded because the output_up.fastq was empty, and the reads in output_pe1.fastq and output_pe2.fastq were same as my previous command (without the last two lines in yours).

ADD REPLY • link 4.8 years ago by greekkey ▴ 30

score 0 · Answer 2 · 2014-10-18

0

Entering edit mode

10.7 years ago

piet ★ 1.9k

You may use bedtools for extracting paired reads from a BAM file. Bedtools is able to write out pairs into separate FASTQ files. Before extracting the reads, the BAM file must be sorted by read-name (samtools option -n):

samtools sort -m 6000000000 -n myfile.bam myfile_resorted
bedtools bamtofastq -i myfile_resorted.bam -fq reads_1.fastq -fq2 reads_2.fastq