Question: extract unique sequences on bam file
0
gravatar for SeaStar
12 months ago by
SeaStar20
Ocean
SeaStar20 wrote:

I ve aligned a list of sequence to a transcript to observe if this sequences are present in the transcript. I produced a BAM file. This is the result of my Alignement.

Left reads:
          Input     :  27849344
           Mapped   :     16442 ( 0.1% of input)
            of these:      7259 (44.1%) have multiple alignments (56 have >100)
Right reads:
          Input     :  27849344
           Mapped   :     15211 ( 0.1% of input)
            of these:      8390 (55.2%) have multiple alignments (56 have >100)
 0.1% overall read mapping rate.

Aligned pairs:      5975
     of these:      3776 (63.2%) have multiple alignments
                     194 ( 3.2%) are discordant alignments
 0.0% concordant pair alignment rate.
~

Now I have to see that I don't count the same read twice, because I have changed the multiple alignment parameters. If a read maps on two different sequences, it is always the same read. I have to count the reads as unique. So how can I made this check?

bam align sequence R • 426 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by SeaStar20
4

Now I have to see that I don't count the same read twice

Before doing that, I would worry about the 0.1% overall read mapping rate.... Do you think it is normal to have such a low alignment rate ?

ADD REPLYlink written 12 months ago by Carlo Yague5.0k
1

I ve aligned a list of sequence to a transcript

This is likely because full dataset is being aligned to a transcript. Not generally recommended but probable explanation for that 0.1% figure in this case.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax85k

I'VE aligned a set of transposon to a transcript to observe if these transposable elements are presents. It is a experimental test. I've used tophat

ADD REPLYlink written 12 months ago by SeaStar20

What exactly are you trying to show? Is the following interpretation correct.

  1. Threre are reads that align to the transcript of interest.
  2. In case where 1 is true the read aligns only once?
ADD REPLYlink written 12 months ago by genomax85k

Yes, it is correct. I want to check if the reads don't align 2 or more times the same sequence

ADD REPLYlink written 12 months ago by SeaStar20

Depends on the alignment tool you used and whether or it includes such tags.

Also please use the search function.

ADD REPLYlink written 12 months ago by benformatics1.6k

I'VE used tophat. I would make this analysis on R

ADD REPLYlink written 12 months ago by SeaStar20

I'VE used tophat. I would make this analysis on R

There is no logical overlap between those two. One is a splice-aware alignment program and other is a statistical programming environment.

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax85k

Sorry, I would to analize the bam file produced with tophat in r environment

ADD REPLYlink written 12 months ago by SeaStar20

What kind of analysis?

ADD REPLYlink written 12 months ago by genomax85k

A kind of analysis that allows me to count the repetitive elements in the trascripts. Anyway I ve tried a solution. I opened the bam with samtools, and I made unique() to count the unique reads that drop in some elements of interest. For example I made unique to all the reads presents in SINEs

ADD REPLYlink modified 11 months ago • written 11 months ago by SeaStar20
1
gravatar for genomax
12 months ago by
genomax85k
United States
genomax85k wrote:

I assume you used tophat2 for this alignment. Refer to this post which explains mapping qualities for tophat2: Tophat2 Mapping Qualities

I suggest that you try to see how many reads show up with following commands (you are just getting counts):

$ samtools view -c -q 30 your.bam
$ samtools view -c -q 40 your.bam
$ samtools view -c -q 45 your.bam
$ samtools view -c -q 50 your.bam

That last command will likely show reads that are uniquely mapped. You can remove -c if you want to capture those to a file.

ADD COMMENTlink modified 12 months ago • written 12 months ago by genomax85k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 963 users visited in the last hour