Question: How to get the position of a single base in a sequencing read in respect to the reference
0
gravatar for va90
16 days ago by
va9010
va9010 wrote:

Hi,

I have a bam file generated from analysis of nanopore data. I also have another file which includes the read IDs and the position of cytosine bases that predicted to be methylated. Now, I want to find the position of that methylated cytosines in the reference genome.
Let say I have a read name @adads-asdasd-asdasd, and I have the position based methylated index bases file like this:
@adads-asdasd-asdasd 2345
@adads-asdasd-asdasd 5676
@adads-asdasd-asdasd 8978
@adads-asdasd-asdasd 9878
The numbers represent the position of predicted methylated cytosines in the read. Now I want to find the position of these bases in the reference genome.
I now that I can get the start and end position of a read in the genome from a bam file using bamtobed and generating a bed file. But this utility accounts for CIGAR string and the start and end is not equal to the length of the read. Therefore I cannot just sum the index number of the base with the start position.

Any help would be much appreciated. Thanks, Vahid.

sequencing • 153 views
ADD COMMENTlink modified 15 days ago by juanjo75es60 • written 16 days ago by va9010
2
gravatar for Pierre Lindenbaum
16 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

use sam2tsv : http://lindenb.github.io/jvarkit/Sam2Tsv.html which display the mapping information for each base.

ADD COMMENTlink written 16 days ago by Pierre Lindenbaum123k

Thank you, @Pierre Lindenbaum. It seems this tool can satisfy my need. Many Thanks, Vahid.

ADD REPLYlink written 13 days ago by va9010

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. (green mark on the left)

ADD REPLYlink written 13 days ago by Pierre Lindenbaum123k

Hi @Pierre Lindenbaum,
I had a question regarding this software. Does it consider the + or - strand? and if I want to combine the extracted results of this software with a bed file which has positions based on + and - strand, how can I do that?
I would be grateful if you could instruct me with this matter.
Thanks,
Vahid.

ADD REPLYlink modified 11 days ago • written 11 days ago by va9010

filter the sam input using 'samtools view -f/-F' or use the sam flag in the sam2tsv output.

ADD REPLYlink written 11 days ago by Pierre Lindenbaum123k
0
gravatar for juanjo75es
15 days ago by
juanjo75es60
juanjo75es60 wrote:

I guess you can find an alignment of your sequence to the reference genome and then add these numbers to the initial position of the matches. You need to find alignments that match your complete sequence. You can try with a BLAST search (for example here) but it could happen that it finds a subsequence that does not really match your full sequence.

You can also use this tool that guarantees you to find full matches in the reference genome. But only works for now for a few reference genomes.

ADD COMMENTlink written 15 days ago by juanjo75es60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1930 users visited in the last hour