How to convert realign BAM file into fastq file
2
0
Entering edit mode
3.4 years ago
Daier ▴ 20

Hi At present, I only have realign BAM files, not fastq files. Those BAM files contain the whole genome information, and what we need now is mitochondrial genes. So I want to ask if I can convert realign BAM file into fastq file and extract mitochondrial gene? I've tried the following:

samtools sort -n input.bam -o input.name.bam

bedtools bamtofastq -i input.name.bam \
-fq out.R1.fq \
-fq2 out.R2.fq

The results are as follows:

*****WARNING: Query D3NJ6HQ1:424:HA49WADXX:1:1103:21194:65285 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping.
*****WARNING: Query D3NJ6HQ1:424:HA49WADXX:1:1107:12296:11434 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping.
*****WARNING: Query D3NJ6HQ1:424:HA49WADXX:1:1113:14106:44374 is marked as paired, but it's mate does not occur next to it in your BAM file.  Skipping.

But there are fastq files in the folder, and the file size is not small. I don't know how I should change, in each forum did not find the answer, I hope you give some suggestions, thank you!

software error alignment • 1.6k views
ADD COMMENT
0
Entering edit mode

it's just a warning. samtools cannot find some reads associated to a pair of reads. It can happen if it's a sub region of the bam , or some reads were removed.

ADD REPLY
0
Entering edit mode

I'm sorry. Do you mean to ignore the warning? But when I do bwa mem after conversion, it can't run. Is there any way to solve this problem?

ADD REPLY
0
Entering edit mode

Maybe try samtools fixmate first, and then bamtofastq?

ADD REPLY
0
Entering edit mode

Thank you! I've tried your method, but there is still warning.

ADD REPLY
0
Entering edit mode

Has there been any kind of filtering applied to the bam file that you want to convert?

ADD REPLY
0
Entering edit mode

Sorry, I don't quite understand what you mean. I used the above method.

ADD REPLY
0
Entering edit mode

Yes, but after initial alignment of the data to produce that bam file, has there been any filtering applied that might explain why certain mates are missing.

ADD REPLY
0
Entering edit mode

I use BWA and samtools software to get the initial bam file, and then use Picard (RG, mark duplicate, build BAM index) and gatk (I use BWA and sampools software to get the initial BAM file, and then use Picard (RG, mark duplicate, build BAM index) and gatk ( RealignerTargetCreator, IndelRealigner) to get the final BAM file. Now I want to convert the final BAM file to fastq file (the previous fastq file has been lost), so I used the above method. Other filters are not used.

ADD REPLY
0
Entering edit mode

You obviously lost some of the mates during your processing. My guess is that one of the steps you've used filtered out one of the mates but kept the other one. Without having the whole code it's impossible to know where they got lost. And since it seems you don't have any of the previous BAMs you will have to go with fixing the mate pairs as commented below.

ADD REPLY
0
Entering edit mode
3.4 years ago
Qiongyi ▴ 180

Hi Daier,

You may use my custom PERL script to fix your problem. After you performed the "bamtofastq", you can run the below command to clean your fastq files.

fastq_check.pl inf_read1 inf_read2 outf_read1 outf_read2

# You can download the script @ https://github.com/Qiongyi/custom_PERL_scripts/ 
# This script is used to check the paired-end fastq files, 
#     - to remove the read if its mate is missing;
#     - to remove both read1 and read2 if one read's sequence is empty or the length of sequence doesn't match the length of quality;
#     - to remove both read1 and read2 if duplicate read IDs exist in either the read1 file or the read2 file.

Cheers,

Qiongyi

ADD COMMENT
0
Entering edit mode
3.4 years ago
opplatek ▴ 290

Based on the other comments samtools fixmate didn't help so it seems you have some singleton reads (reads without a mate). That the most likely cause of BWA failing afterward.

To synchronize them back (=remove singleton reads) You can also use repair.sh from BBMap. For more information, you can check these links How to resync paired-end data? and How to use BBtools repair.sh on multiple files.

repair.sh in1=out.R1.fq in2=out.R2.fq out1=out.R1.paired.fq out2=out.R2.paired.fq outsingle=singleton.fq

The out.R1.paired.fq and out.R2.paired.fq are your resynchronized fastq files which you can use for the additional steps you want to run. The singletons will be placed in singleton.fq.

ADD COMMENT
0
Entering edit mode

Thank you for your explanation. But I tried the method you said and got the following results. I don't know whether this result is right or not or some sequences will be lost.

Set INTERLEAVED to false
Started output stream.
Input:                      93882272 reads      13772616099 bases.
Result:                     93882272 reads (100.00%)    13772616099 bases (100.00%)
 Pairs:                     93882272 reads (100.00%)    13772616099 bases (100.00%)
 Singletons:                0 reads (0.00%)     0 bases (0.00%)

 Time:                          156.499 seconds.
 Reads Processed:      93882k   599.89k reads/sec
 Bases Processed:      13772m   88.00m bases/sec
ADD REPLY
0
Entering edit mode

This looks like all your reads were properly paired and no singletons were present. Have you tried to use the output R1 and R2 fastq files? Did the same command fail again? If yes, the problem is somewhere else.

ADD REPLY

Login before adding your answer.

Traffic: 2071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6