Question: Probably very simple question for gffread
0
gravatar for Beginner
11 days ago by
Beginner30
Beginner30 wrote:

Hi everyone,

I need to re-analyze some old data with a previous protocol that is not easily understandable as a beginner. In it it says: "We used the EnsEMBL mouse genome assembly GRCm38.p6, where all non- coding regions were excluded, and all fully contained shorter coding sequences were collapsed (gffread -C - M -K)".

Since it says that the genome assembly was used, I am a little bit confused if this relates to the GTF file or the FAST sequence.

Am I right that it belongs to the GTF file and I can simply run:

gffread GRCm38.p6File.gtf -C -M -K -o Modified_GRCm38.p6File.gtf

and I correctly excluded the non-coding regions and collapsed all fully contained shorter coding sequences?

Thank you

gffread • 81 views
ADD COMMENTlink modified 11 days ago by Dave Carlson120 • written 11 days ago by Beginner30

Do you know why this was done? Why would you collapse the "fully contained shorter coding sequences" of the GTF file and exlude the non-coding regions?

Not knowing the aims of the study in question, it's hard to say for sure. Presumably they were only interested in analyzing coding regions. As for reasons for collapsing...perhaps to avoid processing duplicate and/or spurious annotations that are fully encompassed by other annotations.

ADD REPLYlink written 11 days ago by Dave Carlson120

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized. This comment should go below @Beginnners below.

ADD REPLYlink written 11 days ago by genomax71k
1
gravatar for Dave Carlson
11 days ago by
Dave Carlson120
Stony Brook University, NY
Dave Carlson120 wrote:

Usually, each fasta assembly will be released with its own set of annotations (in GTF of GFF format), so I suspect that the protocol is simply specifying which annotation/assembly version they're talking about.

Your command looks fine, except there is an extra space between "-" and "M" (should be "-M" without a space).

Edit: Just noticed a second typo. There shouldn't be a space after the "_" in your output filename.

ADD COMMENTlink modified 11 days ago • written 11 days ago by Dave Carlson120

Thank you for the fast answer! I think the spaced arose from a copy past issue and I corrected it.

Do you know why this was done? Why would you collapse the "fully contained shorter coding sequences" of the GTF file and exlude the non-coding regions?

ADD REPLYlink written 11 days ago by Beginner30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 947 users visited in the last hour