Question: Collecting 3' UTR and neighboring exonic sequence from GENCODE GTF files
0
gravatar for mdwain.uw
2.7 years ago by
mdwain.uw0
mdwain.uw0 wrote:

Hi,

I performed RNA sequencing using a poly A 3' tagging/sequencing approach. I therefore expect the sensible reads to map to only the 3' end of the transcripts in my sample. I want to subset the GENCODE gene definitions to only include the UTR + 1kb of exon. What is the best way to do this? I had the following ideas:

1) "grep" the GENCODE definition files for "UTR' lines, then find exons whose coordinates are immediately adjacent, and keep going backwards (to get more neighboring exons) until i get my 1kb.

2) "grep" the GENCODE files for "stop_codon" lines, then keep getting exons whose coordinates are immediately adjacent to the "stop_codon" coordinates, until I get my 1kb.

3) find the "transcript" lines of the GENCODE file, try to match them to the "UTR" lines, then select the last 1kb of the 'transcript' definitions (and add on the coordinates for the UTR).

besides trying to figure out what the best way to get these 3' end coordinates, I also had the following question:

1) should all transcript definitions have a "UTR" line in the GENCODE definition files? 2) should all "UTR" definitions have adjacent "exons"?

Thanks!

gencode utr gtf • 1.2k views
ADD COMMENTlink modified 2.6 years ago by Emily_Ensembl16k • written 2.7 years ago by mdwain.uw0

My experience is that UTR annotation is not as good as you would hope. Are you using Lexogen Quantseq by any chance?

ADD REPLYlink written 2.7 years ago by WouterDeCoster35k

nope, new tech. What would you suggest?

ADD REPLYlink written 2.7 years ago by mdwain.uw0

Nothing conclusive yet, but I'm trying things like extending my UTR sequences (1kb) starting from the stop codon... (my sequencing is stranded so that's quite safe).

ADD REPLYlink written 2.6 years ago by WouterDeCoster35k
0
gravatar for Emily_Ensembl
2.6 years ago by
Emily_Ensembl16k
EMBL-EBI
Emily_Ensembl16k wrote:

Not sure about the best tactics, but I can tell you about GENCODE genes:

1) should all transcript definitions have a "UTR" line in the GENCODE definition files?

No. UTRs are only annotated if there is evidence for the UTRs for that transcripts. Many transcripts are annotated based only on protein data, so no UTRs. There are also loads of non-coding transcripts, which of course have no UTRs.

2) should all "UTR" definitions have adjacent "exons"?

The UTRs are part of the exons. If there is a UTR, it should have an adjacent CDS.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Emily_Ensembl16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1414 users visited in the last hour