Stringtie output shows no expression for reference genes
0
0
Entering edit mode
3.1 years ago
samhairle ▴ 20

Hi all,

I am using the stringtie - ballgown pipeline from https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de and I am running into an issue with the outputs.

My resulting ballgown object has the reference genes present with 0 coverage in any sample, and 'new' transcripts with Stringtie IDs with coverage. Crucially, these do not overlap with the reference genes.

I know the reference transcripts are present because BUSCO can identify the expected single copy orthologues from the gtf file. I do not understand why Stringtie is not accepting them, or detecting any overlap.

I have read other stringtie annotation issues but the lack of overlap between the identified transcript and the reference doesn't seem to have come up before. At this point I am lost - is there some step I am missing or have overlooked? I would greatly appreciate any advice.

Code is below:

  1. for each RNA-Seq sample, map the reads to the genome with HISAT2 using the --dta option (used a reference genome (.fa) index built using HiSat2)

hisat2 -p 10 --no-discordant --no-mixed --dta --rna-strandness RF --mp 4,2 --rdg 5,3 -x hisat2_gen_index -q -1 1_1.Q20.fastq -2 1_2.Q20.fastq -S 1_AN_.sam 2> 1_AN_report.txt

samtools sort -@ 8 -o 1_AN_.sorted.bam 1_AN_.sam

  1. for each RNA-Seq sample, run StringTie to assemble the read alignments obtained in the previous step (used an annotation file; same chromosome naming convention as the reference genome. I have tried with this file as gff and as gtf, neither worked)

stringtie 1_AN_.sorted.bam -G gtf_anno.gtf -A 1_gtf_stringtie_assembly_abundances.tab -f 0.005 -p 10 -o 1_gtf_stringtie_assembly.gtf

  1. ran StringTie with --merge in order to generate a non-redundant set of transcripts observed in any of the RNA-Seq samples assembled previously. (used the annotation file again)

stringtie --merge -m 20 -p 10 -f 0.005 -G gtf_anno.gtf -o merged_gtf_stringtie_assembly.gtf gtf_mergelist.txt

4 . for each RNA-Seq sample, run StringTie using the -B/-b options in order to estimate transcript abundances and generate read coverage tables for Ballgown.

stringtie -B -f 0.005 -p 10 1_AN_loomismatch_defgap.sorted.bam -G merged_gtf_stringtie_assembly.gtf -A ballgown_gtf/1_gtf/1_gtf_stringtie_merged_assembly_abundances.tab -o ballgown_gtf/1_gtf/1_gtf_stringtie_merged_assembly.gtf

stringtie programming annotation transcriptome rna • 574 views
ADD COMMENT

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6