Tracking The Reads In A Deno Assembly
2
0
Entering edit mode
11.5 years ago
rehma.ar ▴ 290

hi all!

i have a very simple question which may not be very simple,

i have extracted some reads from a bam file, and have made a denovo assembly using soapdenovo. then i blast these contigs against my reference genome. i got some hits. but the reads belong to different positions in the bam file. so i know the position of the hit in the reference genome but i also want to know the position in the bam file. but soapdenovo does not keep track of the reads in the assembly.

if i could get the information that during assembly, which read went to which contig that would solve my problem.

i was wondering if there is another assembler that keep tracks of the reads in the assembly would be great or any other indirect solution.

• 2.9k views
ADD COMMENT
1
Entering edit mode
11.5 years ago
lelle ▴ 830

Velvet can be configured to do read tracking.

Look in the manual section 3.2.4

ADD COMMENT
0
Entering edit mode
11.5 years ago
rehma.ar ▴ 290

thanks alot for the response but i am facing a problem here i have used velvet to make the assembly using this command

./velveth outputdirectory 25 -bam -shortPaired .bam

and then

./velvetg outputdirectory/ -mincontiglgth 100 -inslengthlong 300 -readtrkg yes -amosfile yes

and i got the the result. the read traking should be in the file "LastGraph" a very small portion of the file looks like this.

NR 29 113

1294 52 0

1299 49 0

which should be innterpretted as

NR $NODEID $NUMBEROFSHORTREADS $

READID $OFFSETFROMSTARTOFNODE $STARTCOORD

$READ_ID2 etc

. but "1294" and "1299" are not exactly the read ids and i don't have them in my .bam file. what do you think is going wrong in here.

ADD COMMENT
0
Entering edit mode

It has been a while since I worked with this, but I think these are the read IDs velvet is using internally.

In the output folder there should be a file called "Sequences". It is basically a fasta file with the input reads. The fasta header should look like this:

> M00298:13:000000000-A1BJE:1:1101:17340:1510 2:N:0:1    6       1

It starts with the original read ID followed by the vlevets internals read ID (6 in this case). I do not know at the moment what the last number means.

ADD REPLY
0
Entering edit mode

it was really helpful. actually there was no description of the "Sequences" file in the manual.

thanks alot

ADD REPLY

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6