Question: How To Obtain Position, Strand Information And Whether A Sequence Is Transcriptionally Active Or Not
1
gravatar for roll
5.7 years ago by
roll290
United Kingdom
roll290 wrote:

I have a bunch of short sequences (60 bp) and for each sequence I would like to extract the position of this sequence, the strand information (whether it is on positive or negative strand) and whether this sequence is transcriptionally active or not.

Is there a way to extract these information?

ADD COMMENTlink modified 5.7 years ago by swbarnes26.5k • written 5.7 years ago by roll290
1
gravatar for Pavel Senin
5.7 years ago by
Pavel Senin1.9k
Los Alamos, NM
Pavel Senin1.9k wrote:

you can use bwa to align the set of your sequences onto the reference - this will tell you the strand and the position, next by using Bedtools Compare Multiple Bed Files?, you will get the information on sequences activity.

ADD COMMENTlink modified 5.7 years ago • written 5.7 years ago by Pavel Senin1.9k

Thanks Pavel, What about bowtie? Is there any difference whether to use bwa or bowtie for this purpose? I am more familiar with bowtie but not much with the bwa.

ADD REPLYlink written 5.7 years ago by roll290
1

well, I guess it really doesn't matter which way you'll get the bed files.

ADD REPLYlink written 5.7 years ago by Pavel Senin1.9k

i think i am making some progress. I am running the bwa index now. While waiting for that, I went ahead and run bowtie and converted the sam to bam and then to bed files (using bedtools). My question at this stage is, is there a way to keep the original sequence in the bed file as an extra information? I am going to do this for many sequence and it would be nice to know which one is which.

ADD REPLYlink written 5.7 years ago by roll290

If you mean like FASTA-formatted sequence? Personally, I don't think so, but you can always extract it from fasta. I guess that you almost there by now. I am not sure how you will run the alignment and if you know that, but bwa will report secondary alignments as well, you may want to remove those before making bed files.

ADD REPLYlink written 5.7 years ago by Pavel Senin1.9k

Hi Pavel, I am not sure if i understand you right. I managed to run the bowtie and extracted the chr:start-end and positive and negative strand information as bed file (from sam file). But I do not know which of my sequence corresponds to which entry in the bed file. That is what i meant to ask. I now have the sequences on a different file and information about the sequence in another file but no connection between them. Do you know how to do it?

ADD REPLYlink written 5.7 years ago by roll290
1

Wait a second, I though that the fourth column of the bed file actually is the name of your sequence. Did you generate that bed by yourself? Then you can fix that, I guess. I just went to see the what the documentation says about bam2bed and it seems like that column is populated (and fasta is there too?).

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by Pavel Senin1.9k

no i did not create the bed files myself as i dont know how to extract the strand information from sam files. First i converted the sam into bam using samtools and then bam files to bed using bedtools. I ended up getting bed files with the following format chr1 123 456 0 255 +

ADD REPLYlink written 5.7 years ago by roll290

I am sorry, I haven't done that particular task you are working with by myself, so I can't provide something like a shrink-wrap script. Hope that now everything is resolved. If not, let's see some of your data and get things working.

ADD REPLYlink written 5.7 years ago by Pavel Senin1.9k
0
gravatar for swbarnes2
5.7 years ago by
swbarnes26.5k
United States
swbarnes26.5k wrote:

"Transitionally active" is not something you can determine computationally. Someone needs to have done an experiment on your tissue to determine if that bit of sequence is transcribed in your tissue.

Bowtie or bwa will tell you if your read runs forward or reverse with regard to your reference sequence, but what reference sequence are you using? If your reads are from spliced transcripts, you either need to be aligning to a file of spiced transcripts, or to the genome using an aligner that is "splice-aware" and will try to align reads with giant gaps where introns are. Bowtie's sister program TopHat will do this, bwa will not. If you align to genome, and find that your read is in a transcript, you will need to determine whether that gene runs forward or backwards with respect to the genome.

ADD COMMENTlink written 5.7 years ago by swbarnes26.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1938 users visited in the last hour