Question: trouble understanding what a warning means while building a .bcf file with samtools
0
gravatar for ricfoz
17 months ago by
ricfoz20
National School of Antropology and History, Mexico city, Mexico
ricfoz20 wrote:

Hello there,

I am trying to transform a .bam file with the lenght of a genic region into .fasta format, as i've been reading, y may have to transform it into .vcf file to see the variant calls, and then create a consensus using bcf tools.

the pipeline seems very straightforward when transforming into vcf format, first i need to run mpileup from samtools to the original .bam file, then call command with bcftools, to call the variants.

Now, my problem comes when i run the mpileup, i ran the next argument:

samtools mpileup -g -f chr19ref.fa filename.bam > desiredfile.bcf

it actually gave me a desidedfile.bcf , but on the terminal, it gave me loads of lines saying:

[E: :faidx_fetch_seq] The sequence "6" not found

this message filled the screen, and at the end i got a .bcf file, which when converting to .vcf gave an empty document, i guess there's an issue with the conversion to .bcf, but i can't find out what it is, it has something to do with the warning message i describe, can someone help me with that?

greetings.

ADD COMMENTlink modified 17 months ago by Devon Ryan89k • written 17 months ago by ricfoz20

Hello everybody, i already solved the problem that i had. as Pierre writes lines bellow, my problem was that the column on my .bam file described the chromosome number as "6", and the .fasta reference file i provided had the "chr6" notation, thus ,sending the error described in my original post. I tried to realize the hack i was adviced, but as it seems, it is not possible (as far as i know), to create a symbolic link running a virtual machine, which was my case, so i dug out into some forums and found the next solution, which allows you to write whatever script you want in the column of the chromosome number. In this case i just added the preffix "chr", which was lacking:

samtools view -H strand.bam | sed -e 's/SN:SN:chr/' | samtools reheader - your_troubled_file_name.bam > output_file_name.bam

i hope it helps you out people with similar problems.

Cheers.

ADD REPLYlink written 17 months ago by ricfoz20
1
gravatar for Pierre Lindenbaum
17 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

in your sam header I would say that there is a dictionary (lines starting with @SQ) about the chromosome "6" while your chr19ref.fa uses a different notation ("chr6").

ADD COMMENTlink written 17 months ago by Pierre Lindenbaum119k

that makes a lot of sense ! ... when i started working with "samtools view" i had trouble running it because i tried writing the region like: "chr6:xxx-xxx+1", and it just ran until i changed the notation to: " 6:xxx-xxx+1"

how can i solve that? ... changing headers on the sam file would be difficult i guess, don't know, any suggestions to analogize the notations on both files?

ADD REPLYlink written 17 months ago by ricfoz20
1

an ugly hack : create a symbolic link to an alternate reference sequence: http://plindenbaum.blogspot.fr/2011/10/reference-genome-with-or-without-chr.html

ADD REPLYlink written 17 months ago by Pierre Lindenbaum119k

Ugly hack for ugly trouble... fair enough!

I‘ll try that as soon as i reach my pc

Cheers!

ADD REPLYlink written 17 months ago by ricfoz20
1
gravatar for Devon Ryan
17 months ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

It's highly likely that your BAM file does not match your fasta file. In particular, it looks like your bam file has a chromosome 6, but I'm guessing that your fasta file only has a chromosome chr19.

ADD COMMENTlink written 17 months ago by Devon Ryan89k

hello, it is something like that ... but i can't sort out how to make the samtools work correctly, i have built my original .bam myself by running the "samtools view" command, then i downloaded a fasta file of the chr6 of the hg19 assembly, which i have in my working folder.

When i run my script, i give "samtools mpileup" the reference genome (which is the one i described lines ago), and it still gives the error.

for sake of clarity, in the script i described in the original post, i wrote chr19, when i should've written hg19.

To me, it seems like when creating the bcf, the .bam file is lacking some info or something, maybe i have give a certain flag when "cutting" the desired genomic region when using "samtools view Region:initial-final" ?

ADD REPLYlink written 17 months ago by ricfoz20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1180 users visited in the last hour