Question: Converting BAM to Fastq with HTSlib htscmd
0
gravatar for trakhtenberg
5.0 years ago by
trakhtenberg150
United States
trakhtenberg150 wrote:

I found 2 posts recommending to use HTSlib htscmd for converting BAM to Fastq, but both output only interleaved file paired reads. I used this tool to deinterleave http://code.google.com/p/popgentools/source/browse/trunk/misc/split-interleaved-fastq.pl

After deinterleaving the paired.fq from this command: htscmd bamshuf -Ou input.bam tmp-prefix | htscmd bam2fq -s se.fq.gz - | gzip > pe.fq.gz A: Samtofastq: Net.Sf.Picard.Picardexception: "Found N Unpaired Mates " I got only 41k pairs.

After deinterleaving output of this command htscmd bamshuf -uOn 128 aln_reads.bam tmp | htscmd bam2fq -a - | gzip > interleaved_reads.fq.gz http://gatkforums.broadinstitute.org/discussion/2908/howto-revert-a-bam-file-to-fastq-format I got 232.7k reads for one end and 214.4k reads for the other end.

Is there a parameter in HTSlib htscmd which could instruct to output dedeinterleaved reads? Or is there a more fitting tool for deinterleaving?

My Bam file is from Tophat, and I would like to re-analyze these reads after filtering again with Tophat. Is it important to integrate them back with paired reads for re-analysis? It appears from here http://www.arrayserver.com/wiki/index.php?title=FPKM_Transcript that singletons are not passed on to Cufflinks by Tophat for FPKM, but since they mapped, I would think that the Tophat/Cufflinks pipeline would make use of them? Are singletons tend to be splice-junction reads?

 

htslib bam2fastq fastq bam tophat • 3.2k views
ADD COMMENTlink modified 4.9 years ago by piet1.7k • written 5.0 years ago by trakhtenberg150
0
gravatar for aniketd86
5.0 years ago by
aniketd86150
Cambridge, MA
aniketd86150 wrote:

Q] Or is there a more fitting tool for deinterleaving?

Apart from HTSlib, you could use Picard's SamToFastq command to create 2 paired end fastq files from your bam file.

For example:

java  -jar <path_to>/SamToFastq.jar \

INPUT=<Input_file.bam> \

FASTQ=<output_pe1.fastq> \

SECOND_END_FASTQ=<output_pe2.fastq>  \

UNPAIRED_FASTQ=<output_up.fastq> \

VALIDATION_STRINGENCY=SILENT \

 

ADD COMMENTlink written 5.0 years ago by aniketd86150

I encountered issue with this approach too (even using VALIDATION_STRINGENCY=SILENT), see here Picard error Illegal Mate State in converting BAM to Fastq Let me know if you have a solution for this. thanks

ADD REPLYlink written 5.0 years ago by trakhtenberg150
0
gravatar for piet
4.9 years ago by
piet1.7k
planet earth
piet1.7k wrote:

You may use bedtools for extracting paired reads from a BAM file. Bedtools is able to write out pairs into separate FASTQ files. Before extracting the reads, the BAM file must be sorted by read-name (samtools option -n):

samtools sort -m 6000000000 -n myfile.bam myfile_resorted
bedtools bamtofastq -i myfile_resorted.bam -fq reads_1.fastq -fq2 reads_2.fastq

See also:

ADD COMMENTlink modified 4.9 years ago • written 4.9 years ago by piet1.7k

samtools sort -m 6000000000 -n myfile.bam myfile_resorted

What is the meaning of -m 6000000000.

ADD REPLYlink written 2.8 years ago by Bioinfonext160

http://www.htslib.org/doc/samtools.html

ADD REPLYlink written 2.8 years ago by piet1.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1695 users visited in the last hour