Question

Suggestion on Improving mRNA and sRNA mapping

0

Entering edit mode

6.1 years ago

kamel ▴ 70

Hello friends I am new to the RNA-seq anaIyse have fastq files single end (1x50) of mRNA and sRNA (each file contains the reads of two organisms) ad and I want to study the expression of genes in each sample.

I made a mapping by STAR for the first sample train with the following command:

$ Linux_x86_64/STAR --runThreadN 12 --genomeDir  /index --sjdbOverhang 49 --readFilesIn /sample1.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix  /output -outSAMtype BAM SortedByCoordinate

but I got :

Uniquely mapped reads% | 58.83%

% of reads mapped to multiple loci | 38.77%

% of reads mapped to too many loci | 0.46%

I do not understand why I got this high percentage of reads mapped to multiple loci. Do you have an idea to improve the result of the mapping of mRNA in the STAR command that I used ???????

What is the best way to map sRNA reads to the reference genome ??

Thank you in advance

RNA-Seq rna-seq alignment • 2.0k views

ADD COMMENT • link updated 6.0 years ago by ataulhaleem • 0 • written 6.1 years ago by kamel ▴ 70

0

Entering edit mode

Start troubleshooting your reads. Check for rRNA contamination, and check which genes have a particularly high number of multi-mappers.

ADD REPLY • link 6.1 years ago by h.mon 35k

0

Entering edit mode

can you give me more precision for the methodology I'm going to do

ADD REPLY • link 6.1 years ago by kamel ▴ 70

0

Entering edit mode

Examine the alignments with IGV, or filter the multi-mappers with samtools (many methods for doing so, e.g. here, here, here and here) and examine the multi-mappers alignment with IGV.

Use bbduk with the ribokmers.fa.gz file to check for rRNA contamination.

ADD REPLY • link 6.1 years ago by h.mon 35k

0

Entering edit mode

Another question Plz. For sRNA mapping, I find somebody who selects reads from 18-30 and others from 18-26 before mapping. I want to know what size to select??

ADD REPLY • link 6.1 years ago by kamel ▴ 70

0

Entering edit mode

You have mentioned that each file contains reads from two organisms. What are these organisms? If their genomes are similar, then you will get high number of mulitmaps.

ADD REPLY • link 6.1 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

these two organisms are not similar, it is a human genome infected by a pathogenic bacterium and therefore the mRNA has been sequenced it contains reads of bacteria and genome huamain (of course reads the human genome is more than bacterial genome ). I do not know is it normal to have the multimapped reads ?? or for a study of expression must I ask for sequencing paired end with a lognueur more than 50bp?

ADD REPLY • link 6.1 years ago by kamel ▴ 70

score 0 · Answer 1 · 2018-03-18

0

Entering edit mode

6.1 years ago

kamel ▴ 70

Another question Plz. For sRNA mapping, I find somebody who selects reads from 18-30 and others from 18-26 before mapping. I want to know what size to select??

ADD COMMENT • link 6.1 years ago by kamel ▴ 70

0

Entering edit mode

I don't think you need to select at all. Are your sRNAs from human or bacteria? If from bacteria then they should be long (up to ~150 nt)

ADD REPLY • link 6.1 years ago by Asaf 10k

score 0 · Answer 2 · 2018-05-01

0

Entering edit mode

6.0 years ago

ataulhaleem • 0

sRNA less than 15bp dont make any sense. reads smaller than 15bp mostly might be part of RNA-degradom

ADD COMMENT • link 6.0 years ago by ataulhaleem • 0