I am trying to select some reference genome region of a bam file, but got an error
0
0
Entering edit mode
3.7 years ago
schlogl ▴ 160

Hi there, I am trying to use this command:

samtools view -b sorted_SRR1972739.bwa.bam AF086833:3050-3199 > selected.bam

And got this error:

[main_samview] region "AF086833:3050" specifies an unknown reference name. Continue anyway.

Using this command:

samtools view -c selected.bam
0

The file I used as ref genoma was AF086833 (AF086833.2 Ebola virus - Mayinga, Zaire, 1976, complete genome).

All the anterior commands worked well and were:

bwa mem AF086833.fa SRR1972739_1.fastq SRR1972739_2.fastq  > SRR1972739.bwa.sam
samtools view -S -b SRR1972739.bwa.sam > SRR1972739.bwa.bam
samtools sort SRR1972739.bwa.bam -o sorted_SRR1972739.bwa.bam
samtools index sorted_SRR1972739.bwa.bam

Do you have any ideia what I am doing wrong to get this error.

Thank you,

Paulo.

SAMTOOLS • 1.5k views
ADD COMMENT
1
Entering edit mode

no need to shout (== caps in the title, I changed it for you this time)

ADD REPLY
0
Entering edit mode

sorry. Thank you. Paulo

ADD REPLY
1
Entering edit mode

did you double check the correct syntax for querying a specific region?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

can you post the first line of the fasta file you're using? (= the header line)

ADD REPLY
0
Entering edit mode

AF086833.2 Ebola virus - Mayinga, Zaire, 1976, complete genome

ADD REPLY
1
Entering edit mode

as you figured out by now, you were not using the correct sequence name ;)

ADD REPLY
1
Entering edit mode

What's the output of samtools idxstats sorted_SRR1972739.bwa.bam?

By the way, your command could be a lot shorter, avoiding unnecessary intermediate files. This assumes you are using a fairly recent version of samtools, but you should make sure to work with up-to-date tools anyway.

You can replace the following

bwa mem AF086833.fa SRR1972739_1.fastq SRR1972739_2.fastq  > SRR1972739.bwa.sam
samtools view -S -b SRR1972739.bwa.sam > SRR1972739.bwa.bam
samtools sort SRR1972739.bwa.bam -o sorted_SRR1972739.bwa.bam

by

bwa mem AF086833.fa SRR1972739_1.fastq SRR1972739_2.fastq  | samtools sort -o sorted_SRR1972739.bwa.bam
ADD REPLY
0
Entering edit mode

The result is:

samtools idxstats sorted_SRR1972739.bwa.bam

AF086833.2  18959   15279   11
*   0   0   5450
ADD REPLY
2
Entering edit mode

AF086833.2 =\= AF086833

ADD REPLY
0
Entering edit mode

You got it bro. Thank you so much! 8)

ADD REPLY
1
Entering edit mode

see my post above, the "correct" name of your reference is AF086833.2 (note the .2)

ADD REPLY
0
Entering edit mode

still not worked even with the correct name! I don't get thet error, but also get a empty file!

ADD REPLY
1
Entering edit mode

it's a fairly small region you're querying, perhaps there are just no reads mapped in that region?

try querying a bigger region (or one that you now for sure has reads mapped) to verify that your command is working (which it should as far as I can see)

ADD REPLY
1
Entering edit mode

First thing to check if the chromosome name matches in all locations. If that checks out try

samtools view sorted_SRR1972739.bwa.bam "AF086833:3050-3199" > selected.bam
ADD REPLY
0
Entering edit mode

Hi bro...well! I tried that but didn't work too Genomax! 8(

ADD REPLY

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6