Question: vcfutils.pl issue, all nnnnn in fastq file
0
gravatar for duoduoo
3.7 years ago by
duoduoo0
United States
duoduoo0 wrote:

Hi,

I'm using samtools/1.2 and bcftools/1.2

I'm having the similar issue with https://github.com/samtools/bcftools/issues/50 : (non of the replies solves my problem...)
samtools mpileup -uf ref.fa my.bam | bcftools call -c - | vcfutils.pl vcf2fq > my.fq 

I'm getting all nnnnnnnnn and !!!!!!!!!!!!!!!!!! in the final fq file.

Is this something wrong with "vcfutils.pl" itself? I googled around, it seems people have same question, but no solution.

How can I get a correct fast file now?

P.S. Besides vcfutils.pl, I did try bcftools consensus, it worked fine for me. But my problem is, in my bam file, there are supposed to be some missing data. Since the consensus sequence was mapped to human reference genome, I guess all the missing/low quality sites are taken as the same as human reference genome? (even if this works, dead-end? and I have the vcf file I want, I don't need to generate them from bam file by myself.)

Thanks a lot!!!

 

genome bcftools samtools vcfutils • 2.1k views
ADD COMMENTlink modified 3.6 years ago by Biostar ♦♦ 20 • written 3.7 years ago by duoduoo0

did you check the output from bcftools call -c ? (something like samtools mpileup -uf ref.fa my.bam | bcftools call -c - -o  output.vcf -O v)

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by cpad011211k

Hi I checked the output vcf from bcftools, it looks fine. But indeed, it didn't distinguish between missing data from others. (Or this is it? it is basically like this?) All non-alternative allele sites showed as they are reference alleles. So I was thinking if I should add "-g INT", but then it only output variable sites, but still, it doesn't solve the problem. 

ADD REPLYlink written 3.7 years ago by duoduoo0

well, i guess you need to look at your file again. You should be seeing sequences interspersed among Ns. Last ?! are quality scores.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by cpad011211k

No, it's not like there are sequences between N and ?!, I checked how a normal fastq file should look like, it's not like that. The generated fastq file is like:

@1

nnnnnnnnnnnn

nnnnnnnnnnnn

nnnnnnnnnnnn

nnnnnnnnnnnn

and

!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!!!!!!

!!!!!!!!!!!!!!!!!!!!!!!!!

 

 

ADD REPLYlink written 3.7 years ago by duoduoo0

only Ns in entire file? What I got were Ns, contiguous sequences  and quality scores in between and !! ?? at the end . Because this fastq is built from VCF, I expected fastq to have Ns and low scores, in addition to bases in VCF. Following is that command I ran and it seems working for me:

samtools mpileup -uf rnaseq/reference/chr12.fa rnaseq/MeOH_REP1_picard/q20.cutadapt.sorted.dedup.rg.bam  | bcftools call -c - | vcfutils.pl vcf2fq > meoh.rep1.fq

let me update on this again. Fastq validation is failing. I guess perl script is writing entire sequence and statistics into two lines instead of 4.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by cpad011211k

Yes, I'm getting all N, all "!" and all "~". It must be something wrong with either the vcfutils.pl itself, or my input bam file or bcf file generated from mpileup.

And this command is the same as what I ran, would you mind tell me the version of your samtools? Thanks~

ADD REPLYlink written 3.7 years ago by duoduoo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1086 users visited in the last hour