Help with NGS
7.9 years ago

Hello everybody,

I'm new in NGS area and I would like to solve some questions.

One of them is:

I have many NGS sequences in a BAM format:

For example, for a same sequence, I have 4M1D1M1D4M2I2M114S

CTTCCCCCACCACCGTTGGCACAGCGCCCCCGGGAACACCCTCCCACCACCACCGTCGGCACAGCGCCCCGGGGAGCACCCCAGCCCAGCTGCACCAGGGCTCTCTGAAGGAGGTGGTGGTCCGGTT
CTTC-C-CCCAAC


I don't understand why a great part of sequence was missed when it was converted CIGAR to aligned sequence. Can anyone help me?

Other question is: which sequence I use to convert it into an amino acid sequence?

CTTCCCCCACCACCGTTGGCACAGCGCCCCCGGGAACACCCTCCCACCACCACCGTCGGCACAGCGCCCCGGGGAGCACCCCAGCCCAGCTGCACCAGGGCTCTCTGAAGGAGGTGGTGGTCCGGTT or CTTC-C-CCCAAC?

In fact, I want to compare amino acids mutations in relation to the reference sequence.

Sorry for my English and thanks so much for the answers.

What aligner did you use to produce the BAM file? There are many reasons that 114 bases could have been soft-clipped, but it's rather odd that anything would have produced an 13 base alignment from that still have such a large edit distance.

Regarding amino acids, there may not be any part of that that's even transcribed, let alone translated.

Thanks so much for your considerations.

The aligner is TMAP. And about amino acids, this sequence that I showed as an example was aligned with a reference region that is transcribed.

7.8 years ago

To address your second question, one normally does not use aligned reads directly, but, instead, calls variants with respect to the reference. With those variants, tools such as Annovar, Ensembl Variant Effect Predictor, and snpEff are useful for annotating variant-associate protein-coding changes, etc.