I need to re-analyze some old data with a previous protocol that is not easily understandable as a beginner. In it it says: "We used the EnsEMBL mouse genome assembly GRCm38.p6, where all non- coding regions were excluded, and all fully contained shorter coding sequences were collapsed (gffread -C - M -K)".
Since it says that the genome assembly was used, I am a little bit confused if this relates to the GTF file or the FAST sequence.
Am I right that it belongs to the GTF file and I can simply run:
gffread GRCm38.p6File.gtf -C -M -K -o Modified_GRCm38.p6File.gtf
and I correctly excluded the non-coding regions and collapsed all fully contained shorter coding sequences?