samtools view -bT hg19.fa OR -bhs -q 30 -F1548 A.sam > A.bam, which one is better and what's the difference btw them?
2
0
Entering edit mode
6.4 years ago
Ming Lu ▴ 30

I have seen two methods to transform .sam to .bam, wonder which one is better, "ref to genome sequence" OR "directly set -q and -F".

              samtools view -bT hg19.fa  A.sam > A.bam
              samtools view -bhs -q30 -F1548  A.sam > A.bam

which one is better?

sequence next-gen alignment • 2.7k views
ADD COMMENT
1
Entering edit mode
6.4 years ago

Step 1:

Check if the SAM file contains headers:

head A.sam

The first 10 lines on your terminal after typing "head test.sam" will be lines starting with the "@" symbol which is an indicator for a header line. If you don't see lines starting with the "@" sign, the header information is most likely missing.

The headers look like this:

@HD     VN:1.0  SO:coordinate
@SQ     SN:1    LN:248956422
@SQ     SN:2    LN:242193529
@SQ     SN:3    LN:198295559
@SQ     SN:4    LN:190214555
@SQ     SN:5    LN:181538259
@SQ     SN:6    LN:170805979
@SQ     SN:7    LN:159345973
@SQ     SN:8    LN:145138636
@SQ     SN:9    LN:138394717
@SQ     SN:10   LN:133797422
@SQ     SN:11   LN:135086622
@SQ     SN:12   LN:133275309

Step 2:

If the header is absent then run the command below, where hg19.fa is the reference fasta file used to map the reads:

samtools view -bT hg19.fa  A.sam > A.bam

If header is available:

samtools view -bS A.sam > A.bam
ADD COMMENT
0
Entering edit mode

Thanks a lot but still wonder when head available, why not samtools view -bhs -q30 -F1548 A.sam > A.bam

ADD REPLY
1
Entering edit mode

From samtools help

-f INT   only include reads with all  of the FLAGs in INT present [0]
-F INT   only include reads with none of the FLAGS in INT present [0]
-q INT   only include reads with mapping quality >= INT [0]

From this link samtools flags explained

-q 30 - include reads with mapping quality equal to or greater than 30

-F 154 - only include reads with none of the FLAGS in the image

I hope this helps !!

ADD REPLY
1
Entering edit mode
6.4 years ago
ATpoint 81k

Both commands produce a proper bam file. The first command uses all entries in the sam. The second one filters out (-F 1548) unmapped reads and mates as well as reads that failed the quality control of the sequencer and marked duplicates. Also, -q 30 only takes reads qith a mapping quality above 30.

ADD COMMENT

Login before adding your answer.

Traffic: 2704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6