Question

Is there a way I can upload a reference sequence to Clustal Omega to get alligned protein sequences /or a different way of getting the seqeunces

0

Entering edit mode

5.7 years ago

vellryba • 0

Hello.

My aim is to find out correlated mutations within a single paired reads. For example, I need to know if the sequence ID X, that has mutation at position lets say 800, also has a mutation at position at 1100. So I managed to get bam and sam files containing only reads that span the regions I am interested in. I have the fasta sequences and I used Translator X to translate those into protein fasta.

Now I know what I was expecting to get back and when I loaded these into Clustal Omega to get an alignment. This doesnt work that well. There are gaps and sequenced that were just badly translated. I looked at the badly translated sequences in the fasta file I get from the Translator X and they are already there. When I looked at the nucleotide fasta, these are fine. Is there a way I can feed my reference sequence into an alignment tool so I can get the protein sequences translated and aligned correctly?

Does anybody have any experience with this type of analysis?

alignment sequencing • 1.3k views

ADD COMMENT • link 5.7 years ago by vellryba • 0

1

Entering edit mode

I don't fully understand your question.

If you have a reference sequence and your reads are covering the region you are interested in completely why is there a need to look at protein translations?

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

Hi, I know there is a mutation present (sometimes) in some of the reads. I also know that there is a mutation (sometimes again) a bit further down the genome. I want to see if that second mutation is only present when the first one is present. In other words, these mutations are hierarchical. I have the sam and bam file that only contains the reads that span both of the regions.

Now I just want to somehow count either nucleotide (or protein) variants in those reads. Something like this:

1position A 2nd position C - 1200
1 position A 2nd position T - 800

etc.

I am just not sure how to go about it

ADD REPLY • link updated 5.7 years ago by Ram 43k • written 5.7 years ago by vellryba • 0

0

Entering edit mode

Use bam-readcount to get this information.

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

Hi,

this only gives me a count at each position. I need to see if they are correlated. Like this:

first position 800   second position 1000 count: 
AT 1000
CT 800
AG 600

etc.

ADD REPLY • link updated 5.7 years ago by Ram 43k • written 5.7 years ago by vellryba • 0

0

Entering edit mode

Sorry to bother you, but do you have any other suggestion? This one wont work due to the reasons below.

ADD REPLY • link 5.7 years ago by vellryba • 0

0

Entering edit mode

You can probably do LD/Correlation analysis using PLINK (not my area of strength). This is only a pointer for you to consider.

ADD REPLY • link 5.7 years ago by GenoMax 141k

0

Entering edit mode

Do you specifically want to find reads which contain multiple mutations, or are you just interested in co-localised mutations?

ADD REPLY • link 5.7 years ago by Joe 21k

0

Entering edit mode

Hi, I need to know that the mutations came from a single paired read. There are particular regions I have in mind.

ADD REPLY • link 5.7 years ago by vellryba • 0

0

Entering edit mode

If the pair of reads you are looking at flanks the regions of interest then they represent a fragment that spans the region. Unless you have reads that go through the region of interest you have not way of confirming that a particular mutation is present in those fragments.

You will need to use sanger sequencing to confirm that the mutation exists using the original sample.

ADD REPLY • link 5.7 years ago by GenoMax 141k