Initial post title : How does cufflinks find the strand of a novel transcript?
I am using
cufflinks to create a RABT assembly of a genome.
I have my newly created
Most of new transcripts found by
cufflinks are present on both strands. I mean that very close transcripts (in term of sequence) are reported twice in the gtf once with strand + and once with strand -
How does cufflinks finds the strand of each novel transcript? If he doesn't know, is there a way to report "unknown" and to write only one transcript instead of both ?
I found that my transcripts strand was determined by the
XS field of my SAM input.
I also found that I had unstranded data, and that I chose during my alignment a stranded mode, explaining why I have transcripts on both strands in the end.
I would like to run my alignment with unstranded mode and to run
lib-type unstranded. But
Cufflinks requires a mandatory
XS field in the
SAM for the spliced alignments.
How can I get the strand (
XS field) assuming my data is unstranded ?
Why does cufflinks requires a value in XS file only for spliced alignments ?
EDIT 2 :
Aligner used :