Question: Probably very simple question for gffread
0
gravatar for Beginner
10 months ago by
Beginner40
Beginner40 wrote:

Hi everyone,

I need to re-analyze some old data with a previous protocol that is not easily understandable as a beginner. In it it says: "We used the EnsEMBL mouse genome assembly GRCm38.p6, where all non- coding regions were excluded, and all fully contained shorter coding sequences were collapsed (gffread -C - M -K)".

Since it says that the genome assembly was used, I am a little bit confused if this relates to the GTF file or the FAST sequence.

Am I right that it belongs to the GTF file and I can simply run:

gffread GRCm38.p6File.gtf -C -M -K -o Modified_GRCm38.p6File.gtf

and I correctly excluded the non-coding regions and collapsed all fully contained shorter coding sequences?

Thank you

gffread • 177 views
ADD COMMENTlink modified 10 months ago by Dave Carlson320 • written 10 months ago by Beginner40

Do you know why this was done? Why would you collapse the "fully contained shorter coding sequences" of the GTF file and exlude the non-coding regions?

Not knowing the aims of the study in question, it's hard to say for sure. Presumably they were only interested in analyzing coding regions. As for reasons for collapsing...perhaps to avoid processing duplicate and/or spurious annotations that are fully encompassed by other annotations.

ADD REPLYlink written 10 months ago by Dave Carlson320

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized. This comment should go below @Beginnners below.

ADD REPLYlink written 10 months ago by genomax86k
1
gravatar for Dave Carlson
10 months ago by
Dave Carlson320
Stony Brook University, NY
Dave Carlson320 wrote:

Usually, each fasta assembly will be released with its own set of annotations (in GTF of GFF format), so I suspect that the protocol is simply specifying which annotation/assembly version they're talking about.

Your command looks fine, except there is an extra space between "-" and "M" (should be "-M" without a space).

Edit: Just noticed a second typo. There shouldn't be a space after the "_" in your output filename.

ADD COMMENTlink modified 10 months ago • written 10 months ago by Dave Carlson320

Thank you for the fast answer! I think the spaced arose from a copy past issue and I corrected it.

Do you know why this was done? Why would you collapse the "fully contained shorter coding sequences" of the GTF file and exlude the non-coding regions?

ADD REPLYlink written 10 months ago by Beginner40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2045 users visited in the last hour