Suggestion on Improving mRNA and sRNA mapping
2
0
Entering edit mode
6.1 years ago
kamel ▴ 70

Hello friends I am new to the RNA-seq anaIyse have fastq files single end (1x50) of mRNA and sRNA (each file contains the reads of two organisms) ad and I want to study the expression of genes in each sample.

I made a mapping by STAR for the first sample train with the following command:

$ Linux_x86_64/STAR --runThreadN 12 --genomeDir  /index --sjdbOverhang 49 --readFilesIn /sample1.fastq.gz --readFilesCommand gunzip -c --outFileNamePrefix  /output -outSAMtype BAM SortedByCoordinate

but I got :

Uniquely mapped reads% | 58.83%

% of reads mapped to multiple loci | 38.77%

% of reads mapped to too many loci | 0.46%

I do not understand why I got this high percentage of reads mapped to multiple loci. Do you have an idea to improve the result of the mapping of mRNA in the STAR command that I used ???????

What is the best way to map sRNA reads to the reference genome ??

Thank you in advance

RNA-Seq rna-seq alignment • 2.0k views
ADD COMMENT
0
Entering edit mode

Start troubleshooting your reads. Check for rRNA contamination, and check which genes have a particularly high number of multi-mappers.

ADD REPLY
0
Entering edit mode

can you give me more precision for the methodology I'm going to do

ADD REPLY
0
Entering edit mode

Examine the alignments with IGV, or filter the multi-mappers with samtools (many methods for doing so, e.g. here, here, here and here) and examine the multi-mappers alignment with IGV.

Use bbduk with the ribokmers.fa.gz file to check for rRNA contamination.

ADD REPLY
0
Entering edit mode

Another question Plz. For sRNA mapping, I find somebody who selects reads from 18-30 and others from 18-26 before mapping. I want to know what size to select??

ADD REPLY
0
Entering edit mode

You have mentioned that each file contains reads from two organisms. What are these organisms? If their genomes are similar, then you will get high number of mulitmaps.

ADD REPLY
0
Entering edit mode

these two organisms are not similar, it is a human genome infected by a pathogenic bacterium and therefore the mRNA has been sequenced it contains reads of bacteria and genome huamain (of course reads the human genome is more than bacterial genome ). I do not know is it normal to have the multimapped reads ?? or for a study of expression must I ask for sequencing paired end with a lognueur more than 50bp?

ADD REPLY
0
Entering edit mode
6.1 years ago
kamel ▴ 70

Another question Plz. For sRNA mapping, I find somebody who selects reads from 18-30 and others from 18-26 before mapping. I want to know what size to select??

ADD COMMENT
0
Entering edit mode

I don't think you need to select at all. Are your sRNAs from human or bacteria? If from bacteria then they should be long (up to ~150 nt)

ADD REPLY
0
Entering edit mode
6.0 years ago

sRNA less than 15bp dont make any sense. reads smaller than 15bp mostly might be part of RNA-degradom

ADD COMMENT

Login before adding your answer.

Traffic: 3031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6