How to get the position of a single base in a sequencing read in respect to the reference
2
0
Entering edit mode
4.6 years ago
va90 ▴ 50

Hi,

I have a bam file generated from analysis of nanopore data. I also have another file which includes the read IDs and the position of cytosine bases that predicted to be methylated. Now, I want to find the position of that methylated cytosines in the reference genome.
Let say I have a read name @adads-asdasd-asdasd, and I have the position based methylated index bases file like this:
@adads-asdasd-asdasd 2345
@adads-asdasd-asdasd 5676
@adads-asdasd-asdasd 8978
@adads-asdasd-asdasd 9878
The numbers represent the position of predicted methylated cytosines in the read. Now I want to find the position of these bases in the reference genome.
I now that I can get the start and end position of a read in the genome from a bam file using bamtobed and generating a bed file. But this utility accounts for CIGAR string and the start and end is not equal to the length of the read. Therefore I cannot just sum the index number of the base with the start position.

Any help would be much appreciated. Thanks, Vahid.

sequencing • 1.2k views
ADD COMMENT
2
Entering edit mode
4.6 years ago

use sam2tsv : http://lindenb.github.io/jvarkit/Sam2Tsv.html which display the mapping information for each base.

ADD COMMENT
0
Entering edit mode

Thank you, @Pierre Lindenbaum. It seems this tool can satisfy my need. Many Thanks, Vahid.

ADD REPLY
0
Entering edit mode

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. (green mark on the left)

ADD REPLY
0
Entering edit mode

Hi @Pierre Lindenbaum,
I had a question regarding this software. Does it consider the + or - strand? and if I want to combine the extracted results of this software with a bed file which has positions based on + and - strand, how can I do that?
I would be grateful if you could instruct me with this matter.
Thanks,
Vahid.

ADD REPLY
0
Entering edit mode

filter the sam input using 'samtools view -f/-F' or use the sam flag in the sam2tsv output.

ADD REPLY
0
Entering edit mode
4.6 years ago
juanjo75es ▴ 130

I guess you can find an alignment of your sequence to the reference genome and then add these numbers to the initial position of the matches. You need to find alignments that match your complete sequence. You can try with a BLAST search (for example here) but it could happen that it finds a subsequence that does not really match your full sequence.

You can also use this tool that guarantees you to find full matches in the reference genome. But only works for now for a few reference genomes.

ADD COMMENT

Login before adding your answer.

Traffic: 2721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6