Probably very simple question for gffread
1
0
Entering edit mode
2.2 years ago
Beginner ▴ 50

Hi everyone,

I need to re-analyze some old data with a previous protocol that is not easily understandable as a beginner. In it it says: "We used the EnsEMBL mouse genome assembly GRCm38.p6, where all non- coding regions were excluded, and all fully contained shorter coding sequences were collapsed (gffread -C - M -K)".

Since it says that the genome assembly was used, I am a little bit confused if this relates to the GTF file or the FAST sequence.

Am I right that it belongs to the GTF file and I can simply run:

gffread GRCm38.p6File.gtf -C -M -K -o Modified_GRCm38.p6File.gtf

and I correctly excluded the non-coding regions and collapsed all fully contained shorter coding sequences?

Thank you

gffread • 386 views
ADD COMMENT
0
Entering edit mode

Do you know why this was done? Why would you collapse the "fully contained shorter coding sequences" of the GTF file and exlude the non-coding regions?

Not knowing the aims of the study in question, it's hard to say for sure. Presumably they were only interested in analyzing coding regions. As for reasons for collapsing...perhaps to avoid processing duplicate and/or spurious annotations that are fully encompassed by other annotations.

ADD REPLY
0
Entering edit mode

Please use ADD REPLY/ADD COMMENT when responding to existing posts to keep threads logically organized. This comment should go below @Beginnners below.

ADD REPLY
1
Entering edit mode
2.2 years ago
Dave Carlson ▴ 640

Usually, each fasta assembly will be released with its own set of annotations (in GTF of GFF format), so I suspect that the protocol is simply specifying which annotation/assembly version they're talking about.

Your command looks fine, except there is an extra space between "-" and "M" (should be "-M" without a space).

Edit: Just noticed a second typo. There shouldn't be a space after the "_" in your output filename.

ADD COMMENT
0
Entering edit mode

Thank you for the fast answer! I think the spaced arose from a copy past issue and I corrected it.

Do you know why this was done? Why would you collapse the "fully contained shorter coding sequences" of the GTF file and exlude the non-coding regions?

ADD REPLY

Login before adding your answer.

Traffic: 2290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6