Question: Problems with the reference genome in Stringtie
gravatar for iraia.munoa
6 months ago by
iraia.munoa70 wrote:

Hi everybody! I am using RNA-Seq protocol for identifying differentially expressed lncRNAs. I have used the reference genome from gencode: gencode.vM20.lncRNA_transcripts.fa I have build the index and then run hisat2:

> hisat2 --dta -q -x mm10_lncRNA_genome -U C-P1_28454_ACAGTG_trimmed.fq.gz -S C-P1_54_L4.sam

Then I have converted sam files to bam, and then sorted them and created the bai index.

> samtools view -bS -o C-P1_54_L4_lncRNA.bam C-P1_54_L4_lncRNA.sam
> samtools sort -o C-P1_54_L4_lncRNA_sorted.bam C-P1_54_L4_lncRNA.bam
> samtools index -b C-P1_54_L4_lncRNA_sorted.bam C-P1_54_L4_lncRNA_sorted.bai

Finally I have tryied to use stringtie with the gtf file which is also available in gencode for lncRNA: gencode.vM20.long_noncoding_RNAs.gtf

But when running stringtie I have a WARNING mesage:

> stringtie -G gencode.vM20.long_noncoding_RNAs.gtf -l C-P1_54_lncRNA_sorted -B -C C-P1_54_lncRNA_cov.gtf -o C-P1_54_lncRNA_transcripts.gtf -A C-P1_54_lncRNA_gene-abundance.tsv C-P1_54_lncRNA_sorted.bam

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped!
Please make sure the -G annotation file uses the same naming convention for the genome sequences.

I don't understand why i am having this problem as I am using both reference files (.fa and .gtf) from the same source.

Can someone help me?

Thanks in advance,


ADD COMMENTlink modified 6 months ago by Devon Ryan91k • written 6 months ago by iraia.munoa70
gravatar for Devon Ryan
6 months ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

You need to download this fasta file and redo the mapping. The results will make vastly more sense then.

ADD COMMENTlink written 6 months ago by Devon Ryan91k

Thanks Devon for your answerd, So this is the general reference genome? The last option in gencode werb page?(Genome sequence, primary assembly (GRCm38), Nucleotide sequence of the GRCm38 primary genome assembly (chromosomes and scaffolds)) This one? I use the lncRNA reference genome described in the question in reference to a comment from here in biostar: A: Any One please provide protocol for Analysing long noncoding RNA illumina NGS da

If someone could tell me why it didn't work or an explanation for that?

Thanks again Devon, I will try it!

ADD REPLYlink written 6 months ago by iraia.munoa70

The links in that answer are to the lincRNA annotation file. As a rule, annotation files refer to genomes rather than transcriptomes.

ADD REPLYlink written 6 months ago by Devon Ryan91k

Well, when doing the mapping with your file and then stringtie with the lincRNA annotation file, the warning disapears. But, if I look to the output of stringtie, I only see the name of the genes of the lincRNA annotation file. And my question, which maybe cames from some fault on understanding the file what I am working with, is if all the genes that match the annotation are lncRNAs. I mean when I open the annotation file in gtf, there is an attribute "gene_type" that gives the information of TEC, lincRNA, antisense, procesed transcript.... Is there an option in stringtie to maintain this information in the output file?

Another thing is that I have a bed file from a lncRNA database (coordinates and feature description), Is there a way to use it for annotation in RNA-seq pipeline?, maybe when identifing the DEGs to performe a bedtools intersect between coordinates of this genes and the lncRNA bed file? Or this is not the correct way to have differentially expressed lncRNAs, and the correct way is doing it as described at the first question of the post.

ADD REPLYlink written 6 months ago by iraia.munoa70

Is there an option in stringtie to maintain this information in the output file?

I don't think such an option exists

You can parse the GTF file to just subset it for lincRNAs. That's easier if transcripts/exons also have the gene_type annotation, of course.

ADD REPLYlink written 6 months ago by Devon Ryan91k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1670 users visited in the last hour