How To Obtain Position, Strand Information And Whether A Sequence Is Transcriptionally Active Or Not
2
1
Entering edit mode
10.3 years ago
roll ▴ 350

I have a bunch of short sequences (60 bp) and for each sequence I would like to extract the position of this sequence, the strand information (whether it is on positive or negative strand) and whether this sequence is transcriptionally active or not.

Is there a way to extract these information?

short sequence position strand transcription • 4.6k views
ADD COMMENT
2
Entering edit mode
10.3 years ago
Pavel Senin ★ 1.9k

you can use bwa to align the set of your sequences onto the reference - this will tell you the strand and the position, next by using Bedtools Compare Multiple Bed Files?, you will get the information on sequences activity.

ADD COMMENT
0
Entering edit mode

Thanks Pavel, What about bowtie? Is there any difference whether to use bwa or bowtie for this purpose? I am more familiar with bowtie but not much with the bwa.

ADD REPLY
1
Entering edit mode

well, I guess it really doesn't matter which way you'll get the bed files.

ADD REPLY
0
Entering edit mode

i think i am making some progress. I am running the bwa index now. While waiting for that, I went ahead and run bowtie and converted the sam to bam and then to bed files (using bedtools). My question at this stage is, is there a way to keep the original sequence in the bed file as an extra information? I am going to do this for many sequence and it would be nice to know which one is which.

ADD REPLY
0
Entering edit mode

If you mean like FASTA-formatted sequence? Personally, I don't think so, but you can always extract it from fasta. I guess that you almost there by now. I am not sure how you will run the alignment and if you know that, but bwa will report secondary alignments as well, you may want to remove those before making bed files.

ADD REPLY
0
Entering edit mode

Hi Pavel, I am not sure if i understand you right. I managed to run the bowtie and extracted the chr:start-end and positive and negative strand information as bed file (from sam file). But I do not know which of my sequence corresponds to which entry in the bed file. That is what i meant to ask. I now have the sequences on a different file and information about the sequence in another file but no connection between them. Do you know how to do it?

ADD REPLY
1
Entering edit mode

Wait a second, I though that the fourth column of the bed file actually is the name of your sequence. Did you generate that bed by yourself? Then you can fix that, I guess. I just went to see the what the documentation says about bam2bed and it seems like that column is populated (and fasta is there too?).

ADD REPLY
0
Entering edit mode

no i did not create the bed files myself as i dont know how to extract the strand information from sam files. First i converted the sam into bam using samtools and then bam files to bed using bedtools. I ended up getting bed files with the following format chr1 123 456 0 255 +

ADD REPLY
0
Entering edit mode

I am sorry, I haven't done that particular task you are working with by myself, so I can't provide something like a shrink-wrap script. Hope that now everything is resolved. If not, let's see some of your data and get things working.

ADD REPLY
0
Entering edit mode
10.3 years ago

"Transitionally active" is not something you can determine computationally. Someone needs to have done an experiment on your tissue to determine if that bit of sequence is transcribed in your tissue.

Bowtie or bwa will tell you if your read runs forward or reverse with regard to your reference sequence, but what reference sequence are you using? If your reads are from spliced transcripts, you either need to be aligning to a file of spiced transcripts, or to the genome using an aligner that is "splice-aware" and will try to align reads with giant gaps where introns are. Bowtie's sister program TopHat will do this, bwa will not. If you align to genome, and find that your read is in a transcript, you will need to determine whether that gene runs forward or backwards with respect to the genome.

ADD COMMENT

Login before adding your answer.

Traffic: 2611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6