Finding Ribo-seq reads in annotated UTRs
0
0
Entering edit mode
4.9 years ago
jpcchoy • 0

Hi, everyone.

I would like to work out which of my Ribo-seq reads map to UTRs, but haven't been able to find any annotation files (GTF/GFF3) including UTRs or ORFs.

I was hoping to align my FASTQ reads against a Gencode.v30 reference, then take the SAM file and use htseq-count on it with a Gencode.v30 GTF/GFF3, but it looks as if those annotations don't include UTRs/ORFs. Have I misunderstood anything?

I would love if anyone could provide me with some pointers! I'm so sorry if I haven't found a piece of germane documentation. I've searched around a lot and can't find any tips.

Thanks so much.

Ribo-Seq UTR Alignment HTSeq Annotation • 1.2k views
ADD COMMENT
1
Entering edit mode

Are you sure Gencode annotation doesn't include UTRs?

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_30/gencode.v30.annotation.gtf.gz
zcat gencode.v30.annotation.gtf.gz | grep "UTR" | head -n5
chr1  HAVANA  UTR 65419   65433   .   +   .   gene_id "ENSG00000186092.6"; transcript_id "ENST00000641515.2"; gene_type "protein_coding"; gene_name "OR4F5"; transcript_type "protein_coding"; transcript_name "OR4F5-202"; exon_number 1; exon_id "ENSE00003812156.1"; level 2; protein_id "ENSP00000493376.2"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001094.4"; havana_transcript "OTTHUMT00000003223.4";
chr1  HAVANA  UTR 65520   65564   .   +   .   gene_id "ENSG00000186092.6"; transcript_id "ENST00000641515.2"; gene_type "protein_coding"; gene_name "OR4F5"; transcript_type "protein_coding"; transcript_name "OR4F5-202"; exon_number 2; exon_id "ENSE00003813641.1"; level 2; protein_id "ENSP00000493376.2"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001094.4"; havana_transcript "OTTHUMT00000003223.4";
chr1  HAVANA  UTR 70006   71585   .   +   .   gene_id "ENSG00000186092.6"; transcript_id "ENST00000641515.2"; gene_type "protein_coding"; gene_name "OR4F5"; transcript_type "protein_coding"; transcript_name "OR4F5-202"; exon_number 3; exon_id "ENSE00003813949.1"; level 2; protein_id "ENSP00000493376.2"; tag "RNA_Seq_supported_partial"; tag "basic"; havana_gene "OTTHUMG00000001094.4"; havana_transcript "OTTHUMT00000003223.4";
chr1  ENSEMBL UTR 69055   69090   .   +   .   gene_id "ENSG00000186092.6"; transcript_id "ENST00000335137.4"; gene_type "protein_coding"; gene_name "OR4F5"; transcript_type "protein_coding"; transcript_name "OR4F5-201"; exon_number 1; exon_id "ENSE00002319515.2"; level 3; protein_id "ENSP00000334393.3"; transcript_support_level "NA"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS30547.1"; havana_gene "OTTHUMG00000001094.4";
chr1  ENSEMBL UTR 70006   70108   .   +   .   gene_id "ENSG00000186092.6"; transcript_id "ENST00000335137.4"; gene_type "protein_coding"; gene_name "OR4F5"; transcript_type "protein_coding"; transcript_name "OR4F5-201"; exon_number 1; exon_id "ENSE00002319515.2"; level 3; protein_id "ENSP00000334393.3"; transcript_support_level "NA"; tag "basic"; tag "appris_principal_1"; tag "CCDS"; ccdsid "CCDS30547.1"; havana_gene "OTTHUMG00000001094.4";
  
ADD REPLY
0
Entering edit mode

Yes! I had not noticed that h.mon, but you're absolutely right. Unfortunately, I've run htseq-count in the following way and all of my sequences map to 0 features...

htseq-count -f bam -r pos sorted.bam gencode.v30.annotation.gtf

All I get is this:

__no_feature 1993081 
__ambiguous 0 
__too_low_aQual 31427753 
__not_aligned 2662011 
__alignment_not_unique 0

I got my sorted.bam from rsemoutput.transcript.bam, which I sorted with samtools sort. Any idea what might be going wrong? I haven't found any hint of what's going on in RSEM/SAMtools/HTSeq documentation, and would hugely appreciate any help.

ADD REPLY
1
Entering edit mode

If you want to count reads over utr region, try featureCounts -t UTR -f. I don't know if it will work, but its worth a shot.

ADD REPLY
0
Entering edit mode

I just tried this a few times with different parameters, and every time I get a

Segmentation fault (core dumped)

Do you know why this might be?

ADD REPLY
0
Entering edit mode

Context:

Process BAM file rsemout.transcript.sorted.bam...    
Single-end reads are included.  
Assign alignments to features...        
Segmentation fault (core dumped)
ADD REPLY
0
Entering edit mode

And I can provide more details if helpful!

ADD REPLY

Login before adding your answer.

Traffic: 2653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6