Entering edit mode
8 weeks ago
ki
•
0
Hi all,
I'm working with a GTF file of new species where some entries have a "?" in the 7th column, which represents the strand. For example:
NC_011033.1 RefSeq transcript 11024 315294 . ? . gene_id "OrsajM_p01"; transcript_id "unassigned_transcript_653"; db_xref "GeneID:6450162"; exception "trans-splicing, RNA editing"; gbkey "mRNA"; gene "nad1"; locus_tag "OrsajM_p01"; transcript_biotype "mRNA";
My questions:
What is the best practice for handling "?" in the strand column of GTF files?
Should I: Remove those entries but I am particularly interested in this gene
OR Replace "?" with a default value (like "+"),
Any advice or experience on this would be greatly appreciated.
Thanks!
K
I think it is allowed to use a dot ( . ) in that column as a sort of "unknown strand".
Personally I would replace the ? by a . thus.
.
would indicate a feature that is not stranded. It would likely not be so (would that mean expression occurs on both strands).Since you included
STAR
as a tag, you intend to use the file for RNAseq read counting?Having a
?
in that column is valid per GTF spec (strandedness relevant but unknown). You could see how your reads align in that region and their orientation. Then taking into account library type may need to change the strand value to+ or -
, in case reads there are not counted by default.Thank you all for great suggestions.
Yes, I am using STAR for Alignment and after that salmon for counting, I usually remove "?" but this time this gene is imp for analysis and my library type is unstranded So, will it make any impact if I replace it with
+/-/.
?If your library is unstranded then replacing the
?
with.
may be a good place to start.BTW: Have you tried to count with
?
in the file, what happens in that case?