Multiple alignment: find and reverse all complement strands
0
0
Entering edit mode
5.8 years ago

Hi,

I have 100.000 16S rRNA sequences (about 1500 bp) in one fasta file. About half are written as forward, and the rest are written in the reverse strand. What is the fastest way to find all the complements and reverse them, so that the single fasta file will have all the sequences written as forward.

Thank you for all the suggestions. Tomaž

alignment reverse complement • 2.0k views
ADD COMMENT
1
Entering edit mode

Blast them all against a reference sequence that is in 5'-3', use

-outfmt '6 qacc sstart send' | awk 'BEGIN{FS="\t"}{if($2>$3){print $1}}' > all_3_5.txt
ADD REPLY
0
Entering edit mode

Hello Tomaž,

are there any information in the header of each sequence whether this is a reverse or forward sequence? Is each sequence in a single line or is it a multi-line fasta? Are the forward and reverse sequences paires or are they completly independend?

It would help if you could show as an extract of your file.

fin swimmer

ADD REPLY
0
Entering edit mode

Hi, The header does not give any information. Each sequence is independent and these are not reverse pairs. The fasta is in in single line. Here is the extract:


>a0bda06a-5edc-410d-8997-c6df46553218 runid=a2d5899809987b8927384e3d9a3a2f5fbebdb425 read=813 ch=186 start_time=2018-06-03T20:45:25Z barcode=barcode12
GGGTTTAACCGGATAATAAGCCTACCGTGACCAGGTAAGAAGAAGCCTTGCTGCAGATCGGTTGGTTTACCTTCGTGGCGTCCCGCTCAGCAGGATTGAACTACCGACTTCAG...
>26739b86-ea20-416d-af6a-5612770ac1c4 runid=a2d5899809987b8927384e3d9a3a2f5fbebdb425 read=8776 ch=12 start_time=2018-06-03T21:08:13Z barcode=barcode12
TTCGTTCCAGTTCGTTTGGGTGTTTTAGCAGGTTAGCCGCTACCGTGACCAGGTAGAAAGAAGCCCAAGAATCAGGAAGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGG...
>51ab6aa3-a17e-45f6-88dc-5f38498ce729 runid=a2d5899809987b8927384e3d9a3a2f5fbebdb425 read=9438 ch=331 start_time=2018-06-03T21:08:10Z barcode=barcode12
TTGGGTGTTTAGCCAGTTCGCCACCGTGACGGTAGAAGCAGAATCGGACGGTTACCTTGTTGGGCTTCCCCAGTCATTGGTCTTGCCTTAGACGGCTGGCTCCTTGCGAG...
ADD REPLY
0
Entering edit mode

Hi,

thanks for answering my questions. It is better to format file contents by using the code button (the one 101 010). I did it for you now.

So, if there are no informations in the header about the strand and the reads are all independent: How do you know that have of them are reverse? I guess some kind of mapping is needed. But this leads us to the most important question: What are you trying do to?

fin swimmer

ADD REPLY
0
Entering edit mode

Blasting against 16S database shows me I have sequences in both directions. I want to reverse complement half of them so that all of them are in the same forward direction, which I need for downstream processing.

ADD REPLY
0
Entering edit mode

well, if you have the blast result already, then select the IDs for which end < start (== the complement hits), extract those from your orginal fasta and then pass those through a rev-comp tool (EMBOSS? linux/bash? ....)

ADD REPLY
0
Entering edit mode

Can I also advise to re-name your post title?

It's a bit misleading and you might miss out on people who could contribute if the title would be more accurate

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6