Question: Error when trying to convert BAM file to FASTQ.
gravatar for DNAngel
24 days ago by
DNAngel100 wrote:

I thought I was finally in the stretch but lo and behold a new issue arises. Here is my whole pipeline that I used:

bwa index refgenome.fa
bwa mem -B 2 -M refgenome.fa cert1.R1.fastq cert1.R2.fastq > cert1_aln.sam
samtools view -Sb cert1_aln.sam > cert1_aln.bam

# I do a sort here but maybe I don't need to? I have just seen many pipelines that sort and index at this point so I did it too.
samtools sort cert1_aln.bam > cert1_aln_sorted.bam | rm -f cert1_aln.bam
samtools index cert1_aln_sorted.bam

# Gather some stats
samtools flagstat cert1_aln_sorted.bam > cert1_stats.txt

# Extract unmapped read whose mate is mapped
samtools view -b -f 4 -F 264 cert1_aln_sorted.bam > cert1_tmp1_unmapped.bam
# Extract mapped read whose mate is unmapped
samtools view -b -f 8 -F 260 cert1_aln_sorted.bam > cert1_tmp2_unmapped.bam
# Extract unmapped read with unmapped mate
samtools view -b -f 12 -F 256 cert1_aln_sorted.bam > cert1_tmp3_unmapped.bam

# Merge the three tmp files into 1
samtools merge cert1_unmapped.bam cert1_tmp*_unmapped.bam

# Extract mapped reads from BAM file
samtools view -b -F 12 cert1_aln_sorted.bam > cert1_mapped.bam

# Sort the BAM files by name
samtools sort -n cert1_unmapped.bam > cert1_unmapped_sorted.bam
samtools sort -n cert1_mapped.bam > cert1_mapped_sorted.bam

# Finally, convert to Fastq
bamToFastq -i cert1_unmapped_sorted.bam -fq cert1_unmapped.R1.fastq -fq2 cert1_unmapped.R2.fastq
bamToFastq -i cert1_mapped_sorted.bam -fq cert1_mapped.R1.fastq -fq2 cert1_mapped.R2.fastq

The error I get towards the end is:

seqname is marked as paired, but its mate does not occur next to it in your BAM file. Skipping

It spams my terminal and it's endless. My fastq files are tiny and clearly missing sequences. Most biostars posts on this say the issue is with sorting without -n flag, but I tried sorting with it and without it and I get the same error regardless. I just want one fastq file with paired mapped reads, and then another fastq file with all unmapped reads/mates.

samtools bedtools • 145 views
ADD COMMENTlink modified 23 days ago • written 24 days ago by DNAngel100

what is the output of file *.bam ?

ADD REPLYlink written 24 days ago by Pierre Lindenbaum134k

While you are checking it also show us output of samtools view -H cert1.bam | head -8.

ADD REPLYlink written 24 days ago by GenoMax96k

The tmp#_unmapped.bam files give me "gzip compressed data, extra field". For cert1_unmapped.bam I just get "empty". Seems like the the merge step is what is giving me issues actually.

ADD REPLYlink written 24 days ago by DNAngel100
gravatar for swbarnes2
24 days ago by
United States
swbarnes29.6k wrote:

I don't think your merge command is right. You've only got three files, something isn't working so just do it the easy way, and the way the software expects...with the name of the output file listed first, then the three input files. If that works, you can see if cert1_tmp*_unmapped.bam will work

samtools merge cert1_unmapped.bam cert1_tmp1_unmapped.bam cert1_tmp2_unmapped.bam cert1_tmp3_unmapped.bam
ADD COMMENTlink written 24 days ago by swbarnes29.6k

Eh that seemed to have worked! And it also worked using tmp* instead of having to list all the files out.

Thank you so much! I guess samtools merge has updated since the tutorial I am using? Tricky to catch these things.

ADD REPLYlink written 24 days ago by DNAngel100

It's always wise to just run the command once with --help so you can understand the command line you are putting together.

ADD REPLYlink written 23 days ago by swbarnes29.6k

I have - but sometimes the help language is not very easy to understand for someone not experienced with bioinformatics. I have run into a new problem. I constantly get the warning "seq## is marked as paired, but its mate does not occur next to it in your BAM file. Skipping"

I am at the last couple of steps - it seems to be an issue with sorting. I've tried samtools sort with the -n flag and without it on cert_unmapped.bam. Again just reading --help doesn't really provide any clarity on this. Most tutorials/code I've seen on biostars use -n flag but it's not working. Might have to make this a separate post so I can post my whole code again.

ADD REPLYlink written 23 days ago by DNAngel100

Well, you sliced and diced your input bam, so it's no surprise that you've got loose ends. You are going to have to investigate those reads for yourself.

ADD REPLYlink written 23 days ago by swbarnes29.6k

But extracting unmapped reads into separate files was how most tutorials online show it to work (like no novocraft). But none of them seem to report an issue once using bamToFastq after merging them

ADD REPLYlink written 23 days ago by DNAngel100

One thread, one question.

ADD REPLYlink written 23 days ago by swbarnes29.6k

My new question got removed because they said it was too similar to this. So I just reposted it here.

ADD REPLYlink written 22 days ago by DNAngel100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1790 users visited in the last hour