Question: Tophat 2 - Both Pairs Map Concordantly
2
gravatar for AW
7.2 years ago by
AW350
United Kingdom
AW350 wrote:

Hi,

I have just run the following command

tophat2 --solexa1.3-quals -p 12 -r 80 --max-multihits 1 --no-mixed --no-discordant /home/Turkey/Index/turkeyindex /home/Turkey/WTCHG24920061sequence1.fa /home/Turkey/WTCHG24920062sequence_2.fa

I understand this is instructing Tophat 2 to map the reads so that no multiple hits are allowed, and both pairs of reads have to map concordantly.

However, when I examine the stats of the output in Bamtools I get the following

Total reads: 33102389
Mapped reads: 33102389 (100%)
Forward strand: 16600010 (50.1475%)
Reverse strand: 16502379 (49.8525%)
Failed QC: 0 (0%)
Duplicates: 0 (0%)
Paired-end reads: 33102389 (100%)
'Proper-pairs': 28865628 (87.201%)
Both pairs mapped: 30696760 (92.7328%)
Read 1: 16559354
Read 2: 16543035
Singletons: 2405629 (7.26724%)

Why do only 92.7% of the reads fall under both pairs are mapped and 7.2% are singletons? I would expect there to be no singleton reads and 100% where both pairs have mapped, given the options I specified?

Any help would be much appreciated.

Thanks

read tophat paired-end • 3.7k views
ADD COMMENTlink modified 7.2 years ago by Istvan Albert ♦♦ 81k • written 7.2 years ago by AW350

I don't know the answer, but are you sure that you want to eliminate unpaired reads? When you're using tophat, even single ends will be split and mapped and will help define exon-exon boundaries (or may span fusions)

ADD REPLYlink written 7.2 years ago by Chris Miller21k

that's a question for the developers, but in any case, you can filter the singletons:

samtools view file.bam | perl -lane 'print if ($F[6] eq "=")' > filtered.sam
ADD REPLYlink modified 7.2 years ago • written 7.2 years ago by JC8.8k

thank you very much for all the helpful comments. When I try this command, and try to view the output in samtools it says it is not a value sam file as it cannot detect headers? Why might this be?

Thanks

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by AW350

That's because the header is omitted, to keep the header you need to do this:

  samtools view -h file.bam | perl -lane 'print if ($F[6] eq "=" or m/^\@/)' > filtered.sam

The -h flag shows the header and the "m/^\@/" will keep those lines from the filtering.

This output is a simple text SAM, to convert againt to BAM:

  samtools view -bS filtered.sam > filtered.bam
ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by JC8.8k

The stats indicate 100% of reads mapped but some mapped as singletons. Perhaps this has to do with the definitions of 'concordant', 'discordant', and 'proper pair'. Perhaps some alignments are being considered concordant (same chromosome) but are not properly paired according to orientation and insert size and are being counted as singletons for that reason? Maybe view and post the alignments for some of these reads...

ADD REPLYlink written 7.2 years ago by Malachi Griffith17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2662 users visited in the last hour