convert bam to clustal
0
0
Entering edit mode
4.2 years ago
bioinfo8 ▴ 180

Hi,

I have multiple paired-end bam files (aligned with reference) and bed file of multiple genes (chr, start, end). I would like to visualize the alignment (I know it can be done in IGV), but I want to get alignment in clustal format for each gene, so that I can know the location of forward and reverse reads and how they mapped to the reference from different samples.

Any guidance would be appreciated.

Thanks.

alignment bam clustal R bioconductor • 2.2k views
ADD COMMENT
2
Entering edit mode

Export a smaller BAM in the regions you are interested in using samtools view region and then convert it to fasta using (should be possible using pipes) samtools fasta. Follow this up by a regular MSA with the reference gene sequence.

ADD REPLY
0
Entering edit mode

That is what I did and thought of MSA:

samtools view -b 1_sorted.bam "1:1000-2000" > 1_sorted_gene.bam
samtools fasta 1_sorted_gene.bam > 1_sorted_gene.fasta

But the resulting fasta file looks like this (including forward and reverse reads):

>AH24_16925:7:1114:8023:93573#7/1
TTTATCAGTGTTGTTTCTTTTCCTAACGATGACGGACTGCTGCTGTTTTTGTTCCTAATATTATAATCGGTTTTA
>AH24_16925:7:1114:8023:93573#7/2
CTCTCCTCTTCTTTACGCAAATCTCACTAAACTCACTCAAAACTTTATGTTTGTCCGCTCACTTTTTAAAACCGA
>AH24_16925:7:2108:20218:73346#7/1
GGTTTTGGTCAAATATCTCTCTTGGTTGTGCCTCAGGTGAGGGGCAGCAGACTAATAAAGCATGTTGTGTCTCAG
>AH24_16925:7:2108:20218:73346#7/2
GGACAAACATACAGGCAGGAGAGCATTTCTGCAGACCTCCTTCATCTCTATTTGACATGTGCTGACTCTCTGCTC

I want to get the reads in sequential order in one file so that I can go for MSA with reference gene. I am stuck here. Also, I don't know whether I should make two separate files for forward and reverse reads. How should I proceed?

ADD REPLY
0
Entering edit mode

I want to get the reads in sequential order in one file

What does that exactly mean?

ADD REPLY
0
Entering edit mode

In the order of their alignment with the reference as well as I should know if there are overlaps and remove those regions to get the final sequence.

ADD REPLY
0
Entering edit mode

I think the reads should be extracted in order (as long as your BAM is sorted). You can easily check that.

I should know if there are overlaps and remove those regions to get the final sequence.

That makes it sound like you are not looking for reads but a consensus. That would be a different operation (BAM format file to FASTA alignment file )

ADD REPLY
0
Entering edit mode
samtools bam2fq 1_sorted_gene.bam > 1_sorted_gene.fq

gives almost similar to that of fasta, so not helpful. :(

@AH24_16923:7:1114:8023:93573#7/1
TTTATCAGTGTTGTTTCTTTTCCTAACGATGACGGACTGCTGCTGTTTTTGTTCCTAATATTATAATCGGTTTTA
+
@@CFFFDDDFFHHIIIFHIJIIJIGIJEBGHIIIIGIIIJJGIJJHGIIIIIIJGIIGHGGIIIIGIGEHA>>DF
@AH24_16923:7:1114:8023:93573#7/2
CTCTCCTCTTCTTTACGCAAATCTCACTAAACTCACTCAAAACTTTATGTTTGTCCGCTCACTTTTTAAAACCGA
+
@CCFFFDFHHGHHJIJJJJJIIGHGIIJIJGHGIIIJIJIJIJGIHIJIIEHGIIIIIIIJJJJJJHHHHGHFFD
@AH24_16923:7:2108:20218:73346#7/1
GGTTTTGGTCAAATATCTCTCTTGGTTGTGCCTCAGGTGAGGGGCAGCAGACTAATAAAGCATGTTGTGTCTCAG
+
@@@DFFFFHHHHHJIJJJJJJJJIJFHIJIIGJJJJJ4?GHIJJIIJGIJJIIJIJJIGJIIHHHHH?BFFFFEE
@AH24_16923:7:2108:20218:73346#7/2
GGACAAACATACAGGCAGGAGAGCATTTCTGCAGACCTCCTTCATCTCTATTTGACATGTGCTGACTCTCTGCTC
+
ADD REPLY
0
Entering edit mode

Just take the fasta file you obtained above and run an MSA with clustal (or any other tool you like) and see what you get. Do not forget to include the reference in fasta format.

ADD REPLY
0
Entering edit mode

Yes, I did and it showed alignments and file is quite big (as there are multiple reads >1000). I realized that I need consensus sequence from all the reads for that region and then can align consensus sequence with reference gene. How to get consensus sequence from all reads?

ADD REPLY
0
Entering edit mode

See this: How to get the consensus sequence from a BAM alignment and follow the post linked in @Istvan's answer.

ADD REPLY
0
Entering edit mode

Also see if this shortens the process: https://support.bioconductor.org/p/64748/#64755

ADD REPLY
0
Entering edit mode

Thanks, but its not that helpful.

ADD REPLY
0
Entering edit mode

Because you changed your requirements.

ADD REPLY

Login before adding your answer.

Traffic: 1030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6