Question: if i am right to remove the contamination?
0
gravatar for Fereshteh
2.3 years ago by
Fereshteh2.8k
Fereshteh2.8k wrote:

hello all, 

my adviser asked me like this,

i downloaded the fasta file containing rRNA-genes in my interest organism, made genome indexing with rRNA-genes as reference and mapped the timmed-fastq on indexed genome then i indexed genome by coding-genes sequence and mapped the unmapped reads resulted from the previous step on newly indexed genome...

do you think if i am waisting time and there another way to get rid of rRNA contamination????

thank you

 

rrna myposts ribo-seq • 1.2k views
ADD COMMENTlink modified 2.2 years ago by frank1987lee198710 • written 2.3 years ago by Fereshteh2.8k
5
gravatar for seta
2.3 years ago by
seta920
Sweden
seta920 wrote:

Try SortMerna tool, it is definitely easier than yours.

ADD COMMENTlink written 2.3 years ago by seta920
0
gravatar for geek_y
2.3 years ago by
geek_y8.1k
Barcelona/London
geek_y8.1k wrote:

If you want to know % of rRNA contamination, one approach would be to include everything in the reference and later count and remove the reads mapping to rRNA from SAM/BAM file.

If you just want to get rid of rRNA reads, just don't include the rRNA in the reference genome. They will remain as unmapped.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by geek_y8.1k

thank you...

ADD REPLYlink written 2.3 years ago by Fereshteh2.8k

Can you please elaborate more on how to calculate contamination? I have a file that shows the intersect between TSS CAGE data and rRNA overlaps. I am not sure how to measure rRNA contamination from this. I would like to do so in R. 

ADD REPLYlink written 2.3 years ago by espop2350

show the first few lines of the file.

ADD REPLYlink written 2.3 years ago by geek_y8.1k

          V1        V2        V3                         V4 V5 V6
      1  chr1 108113121 108113122 chr1:108113121-108113122,-  3  -
     2  chr1 108113470 108113471 chr1:108113470-108113471,-  1  -
     3  chr1 237766677 237766678 chr1:237766677-237766678,+  1  +
     4  chr1  91853110  91853111   chr1:91853110-91853111,-  1  -

ADD REPLYlink written 2.3 years ago by espop2350

$BT2/bowtie2 -N 0 -L 15 -x rRNA --un SRR1211041_trimmed_unmapped.fastq -U SRR1211041_trimmed.fastq -S mapped_and_unmapped.sam

using above command first i mapped the reads on rRNA then i will have SRR1211041_trimmed.fastq which i aligned with indexed genome by coding-gene sequence using this syntax 

$BT2/bowtie2  -x rRNA -U SRR1211041_trimmed_unmapped.fastq -S my.sam

ADD REPLYlink modified 22 months ago • written 2.3 years ago by Fereshteh2.8k

Hello Goutham, I am new to bioinformatics field. I want to know % of reads matching rRNA genes. Can you please give me the steps involved in it and how to do it?

Also can I get % of reads from any specific gene as well?

ADD REPLYlink written 2.2 years ago by frank1987lee198710

for % of reads matching rRNA genes, u first indecize the whole rRNA.fasta (you can get this fasta file from ensembl) then mapped your reads against them and from result you can find the percent of reads mapped on the rRNA genes

ADD REPLYlink written 2.2 years ago by Fereshteh2.8k

Dear Fereshteh, below is the bowtie2 output I received after running the following command.

What is aligned concordantly 0,1, >1 times?

0.12% overall alignment rate - Does this 0.12% refer to percentage of reads mapping to rRNA genes?

Frank$ bowtie2 -N 0 -L 15 -x rRNA_genes -1 Project/Sample/DH558-1_GTGGCC_L005_R1.all.fastq.gz -2 Project/Sample/DH558-1_GTGGCC_L005_R2.all.fastq.gz -S Project/DH558-1.sam

66113117 reads; of these:

  66113117 (100.00%) were paired; of these:

    66061268 (99.92%) aligned concordantly 0 times

    58 (0.00%) aligned concordantly exactly 1 time

    51791 (0.08%) aligned concordantly >1 times

    ----

    66061268 pairs aligned concordantly 0 times; of these:

      13 (0.00%) aligned discordantly 1 time

    ----

    66061255 pairs aligned 0 times concordantly or discordantly; of these:

      132122510 mates make up the pairs; of these:

        132068318 (99.96%) aligned 0 times

        13629 (0.01%) aligned exactly 1 time

        40563 (0.03%) aligned >1 times

0.12% overall alignment rate

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by frank1987lee198710

you know Frank, actually me also new in NGS but i think you right, totally 0.12% of the reads have been mapped on the rRNA genes...i think 0, 1,.. times (concordantly maybe means all these reads harmoniously)means that if a read has been mapped only once or twice, etc...something like multimapping that is more common in eukaryotes because of repeatation in genome, introns...anyway short reads tend to be mapped on some other  places in the genome especially in eukaryotes.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Fereshteh2.8k
1

Thanks Fereshteh for your suggestions. I have created a new post based on this to confirm our understanding.

Hopefully somebody confirms it :)

ADD REPLYlink written 2.2 years ago by frank1987lee198710
0
gravatar for frank1987lee1987
2.2 years ago by
United States
frank1987lee198710 wrote:

Thanks Fereshteh for your help. Currently, I am running the bowtie2 alignment between my fastq reads and rRNA gene index. Once it is completed, will go through the output and get back to you.

 

ADD COMMENTlink written 2.2 years ago by frank1987lee198710

great job Frank

ADD REPLYlink written 2.2 years ago by Fereshteh2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1481 users visited in the last hour