extract unique sequences on bam file
1
0
Entering edit mode
4.8 years ago
SeaStar ▴ 50

I ve aligned a list of sequence to a transcript to observe if this sequences are present in the transcript. I produced a BAM file. This is the result of my Alignement.

Left reads:
          Input     :  27849344
           Mapped   :     16442 ( 0.1% of input)
            of these:      7259 (44.1%) have multiple alignments (56 have >100)
Right reads:
          Input     :  27849344
           Mapped   :     15211 ( 0.1% of input)
            of these:      8390 (55.2%) have multiple alignments (56 have >100)
 0.1% overall read mapping rate.

Aligned pairs:      5975
     of these:      3776 (63.2%) have multiple alignments
                     194 ( 3.2%) are discordant alignments
 0.0% concordant pair alignment rate.
~

Now I have to see that I don't count the same read twice, because I have changed the multiple alignment parameters. If a read maps on two different sequences, it is always the same read. I have to count the reads as unique. So how can I made this check?

sequence bam r align • 1.8k views
ADD COMMENT
4
Entering edit mode

Now I have to see that I don't count the same read twice

Before doing that, I would worry about the 0.1% overall read mapping rate.... Do you think it is normal to have such a low alignment rate ?

ADD REPLY
1
Entering edit mode

I ve aligned a list of sequence to a transcript

This is likely because full dataset is being aligned to a transcript. Not generally recommended but probable explanation for that 0.1% figure in this case.

ADD REPLY
0
Entering edit mode

I'VE aligned a set of transposon to a transcript to observe if these transposable elements are presents. It is a experimental test. I've used tophat

ADD REPLY
0
Entering edit mode

What exactly are you trying to show? Is the following interpretation correct.

  1. Threre are reads that align to the transcript of interest.
  2. In case where 1 is true the read aligns only once?
ADD REPLY
0
Entering edit mode

Yes, it is correct. I want to check if the reads don't align 2 or more times the same sequence

ADD REPLY
0
Entering edit mode

Depends on the alignment tool you used and whether or it includes such tags.

Also please use the search function.

ADD REPLY
0
Entering edit mode

I'VE used tophat. I would make this analysis on R

ADD REPLY
0
Entering edit mode

I'VE used tophat. I would make this analysis on R

There is no logical overlap between those two. One is a splice-aware alignment program and other is a statistical programming environment.

ADD REPLY
0
Entering edit mode

Sorry, I would to analize the bam file produced with tophat in r environment

ADD REPLY
0
Entering edit mode

What kind of analysis?

ADD REPLY
0
Entering edit mode

A kind of analysis that allows me to count the repetitive elements in the trascripts. Anyway I ve tried a solution. I opened the bam with samtools, and I made unique() to count the unique reads that drop in some elements of interest. For example I made unique to all the reads presents in SINEs

ADD REPLY
1
Entering edit mode
4.8 years ago
GenoMax 141k

I assume you used tophat2 for this alignment. Refer to this post which explains mapping qualities for tophat2: Tophat2 Mapping Qualities

I suggest that you try to see how many reads show up with following commands (you are just getting counts):

$ samtools view -c -q 30 your.bam
$ samtools view -c -q 40 your.bam
$ samtools view -c -q 45 your.bam
$ samtools view -c -q 50 your.bam

That last command will likely show reads that are uniquely mapped. You can remove -c if you want to capture those to a file.

ADD COMMENT

Login before adding your answer.

Traffic: 2485 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6