extract unique sequences on bam file
1
0
Entering edit mode
2.4 years ago
SeaStar ▴ 50

I ve aligned a list of sequence to a transcript to observe if this sequences are present in the transcript. I produced a BAM file. This is the result of my Alignement.

Left reads:
Input     :  27849344
Mapped   :     16442 ( 0.1% of input)
of these:      7259 (44.1%) have multiple alignments (56 have >100)
Input     :  27849344
Mapped   :     15211 ( 0.1% of input)
of these:      8390 (55.2%) have multiple alignments (56 have >100)

Aligned pairs:      5975
of these:      3776 (63.2%) have multiple alignments
194 ( 3.2%) are discordant alignments
0.0% concordant pair alignment rate.
~


Now I have to see that I don't count the same read twice, because I have changed the multiple alignment parameters. If a read maps on two different sequences, it is always the same read. I have to count the reads as unique. So how can I made this check?

sequence bam r align • 861 views
4
Entering edit mode

Now I have to see that I don't count the same read twice

Before doing that, I would worry about the 0.1% overall read mapping rate.... Do you think it is normal to have such a low alignment rate ?

1
Entering edit mode

I ve aligned a list of sequence to a transcript

This is likely because full dataset is being aligned to a transcript. Not generally recommended but probable explanation for that 0.1% figure in this case.

0
Entering edit mode

I'VE aligned a set of transposon to a transcript to observe if these transposable elements are presents. It is a experimental test. I've used tophat

0
Entering edit mode

What exactly are you trying to show? Is the following interpretation correct.

1. Threre are reads that align to the transcript of interest.
2. In case where 1 is true the read aligns only once?
0
Entering edit mode

Yes, it is correct. I want to check if the reads don't align 2 or more times the same sequence

0
Entering edit mode

Depends on the alignment tool you used and whether or it includes such tags.

Also please use the search function.

0
Entering edit mode

I'VE used tophat. I would make this analysis on R

0
Entering edit mode

I'VE used tophat. I would make this analysis on R

There is no logical overlap between those two. One is a splice-aware alignment program and other is a statistical programming environment.

0
Entering edit mode

Sorry, I would to analize the bam file produced with tophat in r environment

0
Entering edit mode

What kind of analysis?

0
Entering edit mode

A kind of analysis that allows me to count the repetitive elements in the trascripts. Anyway I ve tried a solution. I opened the bam with samtools, and I made unique() to count the unique reads that drop in some elements of interest. For example I made unique to all the reads presents in SINEs

1
Entering edit mode
2.4 years ago
GenoMax 110k

I assume you used tophat2 for this alignment. Refer to this post which explains mapping qualities for tophat2: Tophat2 Mapping Qualities

I suggest that you try to see how many reads show up with following commands (you are just getting counts):

$samtools view -c -q 30 your.bam$ samtools view -c -q 40 your.bam
$samtools view -c -q 45 your.bam$ samtools view -c -q 50 your.bam


That last command will likely show reads that are uniquely mapped. You can remove -c if you want to capture those to a file.