Question: sam file not showing multi mapped reads
1
gravatar for Moses
2.1 years ago by
Moses120
united states/ Bloomingtion/ Indiana University Bloomington
Moses120 wrote:

Hi,

I have paired end reads Illumina 1.9 encoding. I am trying to figure out a way to see the reads that are multimapped, from the sam file.

To test this I created a database (an index file) using bowtie2 from a fasta file that contains a whole genome for Bacteroides vulgatus ATCC 8482 species but I copied the same genome twice in the same fasta file before creating the index file. I did this intentionally to check for duplicate reads. I deally any read that will be mapped to this genome should be mapped twice since I have the exact same sequence in my database twice. To create the index file I issued the following command:

$BT2_HOME/bowtie2-build /home/mstambou/represantative_gut_species/duplidate_genomes.fna duplicate_genomes_indices/duplicate_genomes_db

and then to map the reads to the created database I issued the following command:

$BT2_HOME/bowtie2 --threads 24 -x duplicate_genomes_indices/duplicate_genomes_db -1 /home/team/hutternhower14/info/trimmed_1/gut_DNA/SRR769540-trimmed.1.fq -2 /home/team/hutternhower14/info/trimmed_1/gut_DNA/SRR769540-trimmed.2.fq -S X317802115/mapping_duplicate/SRR769540_duplicate.sam

the output of my last command is the following:

4280027 reads; of these:
  4280027 (100.00%) were paired; of these:
    4144369 (96.83%) aligned concordantly 0 times
    2 (0.00%) aligned concordantly exactly 1 time
    135656 (3.17%) aligned concordantly >1 times
    ----
    4144369 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    4144369 pairs aligned 0 times concordantly or discordantly; of these:
      8288738 mates make up the pairs; of these:
        8251346 (99.55%) aligned 0 times
        16 (0.00%) aligned exactly 1 time
        37376 (0.45%) aligned >1 times
3.61% overall alignment rate

This output is in agreement of what I expect to see, it's saying non of the reads (other than 2 for some reason) are mapped 1 time, and most of them (the ones that are mapped) are mapped more than 1 time. But when I later check the sam output file I only see each read ID once (i.e. each read only has one row in the sam output). I thought I should see more than one occurrence of the read IDs that are multi mapped. I really don't understand the reason behind this. I am pasting a sample of my sam file just for reference (the rows are sorted alphabetically using the read IDs).

@HD VN:1.0  SO:unsorted
@SQ SN:CP000139.1   LN:5163189
 @SQ    SN:copy_2   LN:5163189
@PG ID:bowtie2  PN:bowtie2  VN:2.3.4.2  CL:"/home/mstambou/bowtie2/bowtie2-2.3.4.2/bowtie2-align-s --wrapper basic-0 --threads 24 -x duplicate_genomes_indices/duplicate_genomes_db -S X317802115/mapping_duplicate/SRR769540_duplicate.sam -1 /home/team/hutternhower14/info/trimmed_1/gut_DNA/SRR769540-trimmed.1.fq -2 /home/team/hutternhower14/info/trimmed_1/gut_DNA/SRR769540-trimmed.2.fq"
SRR769540.1001033.1 89  copy_2  4066024 0   99M =   4066024 0   GTATCACATTTATCACAGTAGGTCTGATGGCAATCGCATTTATGTGTTTCTCTGGTTTGAACATCTAAAAAGAAAGGAATAAGACAATGGATATGAGCA     
SRR769540.1001033.2 165 copy_2  4066024 0   *   =   4066024 0   TGGCTTACTCTGACGTGCCTGCTCCGTTGAAGGGCTTGGGTATCACATTTATCACAGTAGGTCTGATGGCAATCGCATTTATGTGTTTCTCTGGTTTGAAC   B   
SRR769540.1001041.1 99  copy_2  2179441 1   101M    =   2179495 155 TATCTCCATTAGCGAAAACAGGCGCATGATACATCTTTCCACAGGCCTTTGCATAAGCAATCTGAGTGGACTCTTCATCCAAAGCCTCCCAGCGAGCTTTT   :??BDB:2C<<A;EEEAECD??+1???;B4?B>B*BA   
SRR769540.1001041.2 147 copy_2  2179495 1   101M    =   2179441 -155    AAGCAATCTGAGTGGACTCTTCATCCAAAGCCTCCCAGCGAGCTTTTACACGCATCTCCTGCACTGCTTCATAAGCTAACTTCTGTACGTGAAAGCGATCG   CCAC>A>>>A>C    
SRR769540.1001043.1 101 copy_2  2179939 0   *   =   2179939 0   TAGCGGTTCTGAAGACCAGCTAAGTTCTGGAACACGCATGACATGGACAGATTTCTGGAACGAATGACCGAAAT  7>=AA=7<=B<4,<<7+233+?>=2?= 
SRR769540.1001043.2 153 copy_2  2179939 1   101M    =   2179939 0   CCAATAAATCCCCGTCAACTCCAGATAAAACACCTAAGAGCTTGCAACTGATCGGATAGCTATCAATATAGTTCTTTTAAAAAAGACGCAAAATCTTGCGT       
SRR769540.1001566.1 89  copy_2  4992489 0   7M1I5M1D88M =   4992489 0   GATTTCTCTTTCTCTTTTTAGTTTTCTTAGTCAAAATATGACTATGAAAAGCGTGCTTTCTTTTGATTTTACCTGTTCCGGTAAGGGTAAACCTTTTTTTA   DCDCDCDDCC?CDDDDCDDCDDDDCCDDDDDDEECEC   
SRR769540.1001566.2 165 copy_2  4992489 0   *   =   4992489 0   GTTCCTTAACCTGGCTTAGATTTGTTGTATCAACAGTTGTAGAGTAGCACAGATTTCTCTTTCTCTTTTTAGTTTTCTTAGTCAAAATATGACTATGAAAA       
SRR769540.1001567.1 89  copy_2  4992500 0   101M    =   4992500 0   CTCTTCTTAGTTTTCTTAGTCAAAATGTGACTGTGAAAAGCGTGCTTTCTTTTGATTTTACCTGTTCCGGTAAGGGTAAACCTTTTTTTAGAACCGGAGTT   A?>CCCCACDCCCCACDDDDCDBCC:CAC   
SRR769540.1001567.2 165 copy_2  4992500 0   *   =   4992500 0   TTTACGCTTTTGATTACTTCATTGCTAAGAGTTCCTTAACCTGGCTTACATTTGTTGTATCAACAGTTGTAGAGTAGCACAGATTTCTTTTTCTCTTCTTA       
SRR769540.1001573.1 69  CP000139.1  4992236 0   *   =   4992236 0   CGCTGCGTGAAGTAGCGGCGCTAACATTCAAAAAAAGTAAAACTATGCCAAGATCAGTAAATCATGTTGCTTCAAAAGCAAGAAGAAAGAAAATTTTGAAA       
SRR769540.1001573.2 137 CP000139.1  4992236 0   84M1D17M    =   4992236 0   AAGTAACCTCTGGTTAATTTCAAAATTTTCTTTCTTCTTGCTTTTGAAGCAACATGATTTACTGATCTTGGCATAGTTTTACTTTTTTTGAATGTTAGCGC   C

As you can see each read ID is only present once. Am I issuing the wrong command in bowtie? that maybe it's only showing me the primary hits? Im very new to this field and am sorry in advance for the novice question.

thanks.

sam bowtie2 • 966 views
ADD COMMENTlink modified 24 months ago by Biostar ♦♦ 20 • written 2.1 years ago by Moses120
3
gravatar for Devon Ryan
2.1 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

The default is to show only the primary hits (it's usually what you want). If you want it to report the secondary alignments then use -k, though you may need to filter groups of reads afterward and you're not guaranteed that the primary hit will be the best possible hit that could be reported (though in practice it's highly likely to be).

ADD COMMENTlink written 2.1 years ago by Devon Ryan97k

Thank you for your answer. Yeah right after the post I read more about bowtie2 and found that out. I used -a parameter that reports all the alignments that it finds for the reads. So your saying if I leave it to the default settings then bowtie2 will guarantee to return the best possible hit for that read?

ADD REPLYlink written 2.1 years ago by Moses120
2

Yes, by default it'll return the best possible hit (with a few caveats, having to do with seed mismatches and the -D and -R options).

ADD REPLYlink written 2.1 years ago by Devon Ryan97k

I see. Thanks for all the input

ADD REPLYlink written 2.1 years ago by Moses120

How would it determine what the 'best' mapping is if both regions are (as in this toy model) exactly the same? My initial guess is that it would simply report the first mapping if there are two equally good mappings. But in the above example, copy 2 is reported. Maybe after indexing, copy 2 is placed ahead of copy 1? Or maybe bowtie2 does not report the first mapping but the last mapping?

ADD REPLYlink modified 20 months ago • written 20 months ago by BioinformaticsLad160
1

In the case of equally high scoring alignments the result is pseudo-random.

ADD REPLYlink written 20 months ago by Devon Ryan97k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1652 users visited in the last hour