Question: Low mapping percentage after mapping RNA-seq reads to a closely related species
0
gravatar for unawaz
16 months ago by
unawaz40
Australia
unawaz40 wrote:

Hi,

I have Illumina 100 bp paired end RNA-Seq data from a non-model species. I mapped it to the closely related genome available and I used STAR to do this task. I mainly did this to see if I could use the genome of the organism for a genome guided assembly. It turns out that I got an overall alignment rate of 0.93%. I used default parameters for this, however working around the parameters only increased the results to 1-2%. The species I'm working with is a cephalopod.

I'm not really interested in increasing the mapping rate at this point (since this was mainly for exploratory analysis). However I wanted to know what other downstream analysis can I do on the reads that actually did map (I'm assuming these would be tRNAs, histones etc). Basically I want to be able to make plots for the sequences that did align, but not sure what programs I can use to represent my data. Any ideas would be great :)

I also wanted to know if others have done a similar analysis and got similar results? What conclusions did you derive from this sort of analysis?

rna-seq alignment genome • 703 views
ADD COMMENTlink modified 14 months ago by Friederike4.9k • written 16 months ago by unawaz40
2

use the genome of the organism for a genome guided assembly

Guided assembly from RNA-seq? for what?, However, if the mapping percentage to related specie is low, you can perform a de novo assembly (transcripts), predict orfs and blast them to predict functions... etc etc. I think that plot your actual results does not have any sense because because they may be related to sequencing noise, however you can try using the .sam file and htseq or even samtools -view.

ADD REPLYlink modified 16 months ago • written 16 months ago by Buffo1.6k
5

I think the first thing to do when you get a very low mapping percentage, is to take some of your reads and do blast against the ncbi nr database, and see what kind of organisms you get hits to. Your reads may not be what you expected them to be.

ADD REPLYlink written 14 months ago by mastal5112.0k
1

Cant up vote what mastal511 said enough. You literally have less than 1% idea of what your data is. For all you know it could be contaminated.

On a side note - If there is no reference genome, why don't you make an attempt at de-novo assembly and try to get it published?

ADD REPLYlink written 14 months ago by YaGalbi1.4k
2
gravatar for Friederike
14 months ago by
Friederike4.9k
United States
Friederike4.9k wrote:

However I wanted to know what other downstream analysis can I do on the reads that actually did map (I'm assuming these would be tRNAs, histones etc)

There is no one-size-fits-all solution to the question "what are my genes of interest?". I would probably go about this by pretending you're looking at a normal RNA-seq data set using the annotation of the model species whose transcriptome you used with STAR. (This would be the GTF file that you presumably used with STAR in addition to the fasta file that contains the genome sequence.) You could use that with featureCounts (of the subread package) to get the genes with non-zero coverage. To find out more about the genes (such as GO terms), you could, for example, follow the descriptions in Chapter 7 of this bioconductor workflow.

ADD COMMENTlink written 14 months ago by Friederike4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 662 users visited in the last hour