Question: How to convert realign BAM file into fastq file
0
gravatar for Daier
7 weeks ago by
Daier0
Daier0 wrote:

Hi At present, I only have realign BAM files, not fastq files. Those BAM files contain the whole genome information, and what we need now is mitochondrial genes. So I want to ask if I can convert realign BAM file into fastq file and extract mitochondrial gene? I've tried the following:

samtools sort -n input.bam -o input.name.bam

bedtools bamtofastq -i input.name.bam \
-fq out.R1.fq \
-fq2 out.R2.fq

The results are as follows:

*****WARNING: Query D3NJ6HQ1:424:HA49WADXX:1:1103:21194:65285 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping.
*****WARNING: Query D3NJ6HQ1:424:HA49WADXX:1:1107:12296:11434 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping.
*****WARNING: Query D3NJ6HQ1:424:HA49WADXX:1:1113:14106:44374 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping.

But there are fastq files in the folder, and the file size is not small. I don't know how I should change, in each forum did not find the answer, I hope you give some suggestions, thank you!

alignment software error • 207 views
ADD COMMENTlink modified 6 weeks ago by opplatek60 • written 7 weeks ago by Daier0

it's just a warning. samtools cannot find some reads associated to a pair of reads. It can happen if it's a sub region of the bam , or some reads were removed.

ADD REPLYlink written 7 weeks ago by Pierre Lindenbaum133k

I'm sorry. Do you mean to ignore the warning? But when I do bwa mem after conversion, it can't run. Is there any way to solve this problem?

ADD REPLYlink written 7 weeks ago by Daier0

Maybe try samtools fixmate first, and then bamtofastq?

ADD REPLYlink written 7 weeks ago by Qiongyi100

Thank you! I've tried your method, but there is still warning.

ADD REPLYlink modified 6 weeks ago by _r_am32k • written 7 weeks ago by Daier0

Has there been any kind of filtering applied to the bam file that you want to convert?

ADD REPLYlink written 7 weeks ago by ATpoint44k

Sorry, I don't quite understand what you mean. I used the above method.

ADD REPLYlink written 7 weeks ago by Daier0

Yes, but after initial alignment of the data to produce that bam file, has there been any filtering applied that might explain why certain mates are missing.

ADD REPLYlink written 7 weeks ago by ATpoint44k

I use BWA and samtools software to get the initial bam file, and then use Picard (RG, mark duplicate, build BAM index) and gatk (I use BWA and sampools software to get the initial BAM file, and then use Picard (RG, mark duplicate, build BAM index) and gatk ( RealignerTargetCreator, IndelRealigner) to get the final BAM file. Now I want to convert the final BAM file to fastq file (the previous fastq file has been lost), so I used the above method. Other filters are not used.

ADD REPLYlink written 7 weeks ago by Daier0

You obviously lost some of the mates during your processing. My guess is that one of the steps you've used filtered out one of the mates but kept the other one. Without having the whole code it's impossible to know where they got lost. And since it seems you don't have any of the previous BAMs you will have to go with fixing the mate pairs as commented below.

ADD REPLYlink written 6 weeks ago by opplatek60
0
gravatar for Qiongyi
6 weeks ago by
Qiongyi100
The University of Queensland
Qiongyi100 wrote:

Hi Daier,

You may use my custom PERL script to fix your problem. After you performed the "bamtofastq", you can run the below command to clean your fastq files.

fastq_check.pl inf_read1 inf_read2 outf_read1 outf_read2

# You can download the script @ https://github.com/Qiongyi/custom_PERL_scripts/ 
# This script is used to check the paired-end fastq files, 
#     - to remove the read if its mate is missing;
#     - to remove both read1 and read2 if one read's sequence is empty or the length of sequence doesn't match the length of quality;
#     - to remove both read1 and read2 if duplicate read IDs exist in either the read1 file or the read2 file.

Cheers,

Qiongyi

ADD COMMENTlink written 6 weeks ago by Qiongyi100
0
gravatar for opplatek
6 weeks ago by
opplatek60
USA
opplatek60 wrote:

Based on the other comments samtools fixmate didn't help so it seems you have some singleton reads (reads without a mate). That the most likely cause of BWA failing afterward.

To synchronize them back (=remove singleton reads) You can also use repair.sh from BBMap. For more information, you can check these links How to resync paired-end data? and How to use BBtools repair.sh on multiple files.

repair.sh in1=out.R1.fq in2=out.R2.fq out1=out.R1.paired.fq out2=out.R2.paired.fq outsingle=singleton.fq

The out.R1.paired.fq and out.R2.paired.fq are your resynchronized fastq files which you can use for the additional steps you want to run. The singletons will be placed in singleton.fq.

ADD COMMENTlink written 6 weeks ago by opplatek60

Thank you for your explanation. But I tried the method you said and got the following results. I don't know whether this result is right or not or some sequences will be lost.

Set INTERLEAVED to false
Started output stream.
Input:                      93882272 reads      13772616099 bases.
Result:                     93882272 reads (100.00%)    13772616099 bases (100.00%)
 Pairs:                     93882272 reads (100.00%)    13772616099 bases (100.00%)
 Singletons:                0 reads (0.00%)     0 bases (0.00%)

 Time:                          156.499 seconds.
 Reads Processed:      93882k   599.89k reads/sec
 Bases Processed:      13772m   88.00m bases/sec
ADD REPLYlink written 6 weeks ago by Daier0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1757 users visited in the last hour
_