Question: Genome type to build transcript reference with RSEM?
gravatar for rna-seq_researcher
5.4 years ago by
rna-seq_researcher50 wrote:

Hi all, 

Currently I am preparing the reference transcriptome used by RSEM in RNA-seq experiments. For this, I use rsem-prepare-reference function with .GTF and .fasta files downloaded from Ensembl (latest release, v.80).

However, I have some questions regarding the masking level of the genome (which can be complete genome, as well as soft- or hard- masked for repetitive sequences). Is there any influence of the masking level when I build the transcript reference? For example, if I use a hard masked genome instead of a complete genome, will that have a huge impact on my final transcript set (considering that I will be using the same GTF coordinates in both scenarios)?

I ask that because I saw that the human transcriptome may have some level of repetitive sequences and I don't know if these sequences are completely lost in the hard-masked genome.

Does anyone have some insight on that matter?



rna-seq rsem alignment genome • 2.0k views
ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by rna-seq_researcher50

True! I just checked my transcripts.fa file and there are some small sequences (~10-20) full of Ns...

Thank you very very much!

ADD REPLYlink written 5.4 years ago by rna-seq_researcher50
gravatar for Devon Ryan
5.4 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

I would strongly encourage you to not use the hard-masked genomes for this. You're pretty much guaranteed to have a bunch of excess Ns in the resulting sequence if you were to use the hard-masked version. Either the soft-masked or plain fasta files will work fine (they should produce equivalent results in fact).

ADD COMMENTlink written 5.4 years ago by Devon Ryan97k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 854 users visited in the last hour