Question: How to get an XS field in SAM with unstranded data ?
1
gravatar for corend
16 months ago by
corend70
corend70 wrote:

Initial post title : How does cufflinks find the strand of a novel transcript?

I am using cufflinks to create a RABT assembly of a genome.

I have my newly created merged.gtf file.

Most of new transcripts found by cufflinks are present on both strands. I mean that very close transcripts (in term of sequence) are reported twice in the gtf once with strand + and once with strand -

How does cufflinks finds the strand of each novel transcript? If he doesn't know, is there a way to report "unknown" and to write only one transcript instead of both ?

EDIT :

I found that my transcripts strand was determined by the XS field of my SAM input.

I also found that I had unstranded data, and that I chose during my alignment a stranded mode, explaining why I have transcripts on both strands in the end.

I would like to run my alignment with unstranded mode and to run Cufflinks with lib-type unstranded. But Cufflinks requires a mandatory XS field in the SAM for the spliced alignments.

How can I get the strand (XS field) assuming my data is unstranded ?

Why does cufflinks requires a value in XS file only for spliced alignments ?

EDIT 2 : Aligner used : Hisat2

rna-seq alignment cufflinks • 520 views
ADD COMMENTlink modified 16 months ago by Friederike3.6k • written 16 months ago by corend70
1

Prokaryote / Bacterial species?

When aligning, Bowtie/TopHat will attempt to align each read as it appears in your input file. If it doesn't align, it will see if the reverse complement of your read aligns. In this way, it can infer the original strand (+/-, plus/minus, coding/non-coding, sense/anti-sense) from which each read derived. If a read does not align, it is then not providing any information on strand, and, thus, there is no 'in between' level where we have a read and don't also have strand information.

You can choose 'unstranded' in TopHat, in which case strand orientation will not be given and reads are instead piled up indiscriminately over each genomic loci whether they are sense/anti-sense.

ADD REPLYlink modified 16 months ago • written 16 months ago by Kevin Blighe41k

That was helpful, thanks, I just edited my post

ADD REPLYlink written 16 months ago by corend70

which aligner have you used for the unstranded alignment?

ADD REPLYlink written 16 months ago by Friederike3.6k

I used Hisat2 for the unstranded alignment

ADD REPLYlink written 16 months ago by corend70
1
gravatar for Friederike
16 months ago by
Friederike3.6k
United States
Friederike3.6k wrote:

If you're using HISAT2, it seems that you need to set the dta flag. Disclaimer: haven't used HISAT2 myself.

Why does cufflinks requires a value in XS file only for spliced alignments?

Presumably (and based on the comments in the reference in my first line) because it seems to have become a convention for spliced-read aligners to store information that's valuable for the transcript assemblers in the XS field. Generally, the XS field is one of those optional and only loosely defined fields in SAM files, which is why you'll see all sorts of values there, including the strand (TopHat's choice) or the number of alignments (BWA).

ADD COMMENTlink written 16 months ago by Friederike3.6k

Thanks a lot, I'll try this option, I see that there is also a dta-cufflinks option. I could be what I am looking for.

Still, I don't understand why does cufflinks requires a strandness information, when I don't have the strand information in my data. Also, we if I use lib-type unstranded, why would cufflinks need a strand ?

PS: I see here that dta could change the number of aligned reads, I'll see if it really is a problem.

ADD REPLYlink written 16 months ago by corend70
1

I did not understand that HISAT stores information about the strand in the XS tag, but rather some information about the spliced alignment details.

ADD REPLYlink written 16 months ago by Friederike3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1347 users visited in the last hour