samtools -F 0x2 corrupts the file
1
0
Entering edit mode
5.2 years ago
aleka ▴ 110

Hello,

I have used the following command to extract the properly paired ends from a bam file: samtools view -F 0x2 file.bam > proper_paired.bam

but then when I used validateSamFile with picard to validate the bam file with the following command: java -jar /apps/picard-tools/2.1.0/picard.jar ValidateSamFile I=proper_paired.bam MODE=SUMMARY

The dictionary for the fasta file exists in my folder It seems that samtools view -F removes the sam header

The initial file.bam is just fine, double checked. I just need only the properly paired reads in the bam file.

Thank you.

next-gen alignment genome sequence • 2.3k views
0
Entering edit mode

Thanks both. It have now the correct format. However, I was wondering you might be able to help me clarify something. the results of the samtools flagstat command on the original bam file is:

748518727 + 0 in total (QC-passed reads + QC-failed reads)
47793497 + 0 secondary
0 + 0 supplementary
163410882 + 0 duplicates
745241565 + 0 mapped (99.56% : N/A)
700725230 + 0 paired in sequencing
**673521832 + 0 properly paired (96.12% : N/A)**
695574366 + 0 with itself and mate mapped
1873702 + 0 singletons (0.27% : N/A)
13092690 + 0 with mate mapped to a different chr
10005816 + 0 with mate mapped to a different chr (mapQ>=5)


and when I run the samtools view -h -b -F 0x2 original_file.bam > proper_paired.bam Then the results of the samtools flagstat on the proper paired bam file is:

36070927 + 0 in total (QC-passed reads + QC-failed reads)
8867529 + 0 secondary
0 + 0 supplementary
4099574 + 0 duplicates
32793765 + 0 mapped (90.91% : N/A)
27203398 + 0 paired in sequencing
**0 + 0 properly paired (0.00% : N/A)**
22052534 + 0 with itself and mate mapped
1873702 + 0 singletons (6.89% : N/A)
13092690 + 0 with mate mapped to a different chr
10005816 + 0 with mate mapped to a different chr (mapQ>=5)


How is it possible after the filter 0x2 to have 0 properly paired reads? 0x2 is to keep only the properly paired reads in the bam file. Do I miss something?

2
Entering edit mode

little f is to keep, big F is to remove.

0
Entering edit mode

I also tried

samtools view -h -b -f 0x2 file.bam > new_file.bam

samtools view -h -b -F 0x4 -F 0x8 -F 0x400 -F 0x200 file.bam > new_file.bam
but the result that I get from flagstat is:
parse error at line 1
[bam_flagstat_core] Truncated file? Continue anyway.
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (N/A : N/A)
0 + 0 paired in sequencing
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


so the -f or the multiple -F option corrupts the file. Let me know how I can get the only properly paired reads from a bam file if you know. Thank you.

0
Entering edit mode

Use the ADD COMMENT/ADD REPLY buttons on previous posts to add additional information like this. Don't add "New answers".

0
Entering edit mode

Hehehe, corrupts the file. I think "it corrupts the file!" will be my new go-to phrase for when people ask me a difficult question which has a long, complicated and boring answer.

You can't chain -F flags up like that. You need to add the 0x4 and x8 and 0x400 and 0x200 up, which as we all know is 0x60c. Obviously. You could also do -F 1548, but you can't use 11000001100, because that would be too easy.

You can have an -f and an -F at the same time though.

1
Entering edit mode

I can't think/calculate in hexadecimal, so no, I did not know it would be 0x60c. I am sure a lot of people on this forum may not either :-)

1
Entering edit mode

<sarcasm /> :P hehehe, sorry. I think making people use hex or even base 10 to talk/think about flags was a huge user interface mistake for samtools. Actually i've gone on record saying it a lot stronger than that, and in all of my tools I use letters instead of flags because honestly, summing numbers is what computers should do, not humans.

0
Entering edit mode

Many (including me) would totally agree with you.
Is that an irreversible decision? It would be a big help to have the ability of being able to specify letter flags.
Is samtools flags command only meant for translation of flags? If it does that then surely letter/word options can be easily enabled for -f and -F.

1
Entering edit mode
5.2 years ago

You forgot -h or -b. If you make a SAM file, you need to include the header (-h). Presumably you wanted a BAM file, though (-b).