Entering edit mode
5.7 years ago
miqrom
•
0
First, thanks to Kevin for his answer to my last question (I can't answer my post, I tried 3 times) about grouping columns from different samples (same individual) in a merged VCF.
The only command that put all positions in same column is bcftools roh
although output is a plain text file with 2 columns to remove, first ST
and second sampleName
.
I think it can be converted to VCF with bcftools convert --tsv2vcf
EDIT (@RamRS): Added context.
Can you paste an example of your input and what you would like your output to be?
bcftools concat
doesn't work with samples aligned to different fasta (I indexed every GRCh37 chromosome fasta withbwa index
andsamtools faidx
. I don't know if GATK CatVariants could run in my PC (3 Gb RAM). I have all my snps-indels variants by chromosome withbcftools view -v snps,indels -o output.vcf input.bcf
and then I get GTs withbcftools call -m --ploidy GRCh37 -Oz -o outputGT.vcf.gz input.vcf
.When Kevin talked me about
bcftools norm
I was frustrated because of I haven't found any option to place all loci in same column. Then I checkedbcftools roh merged.vcf
and I get almost all I want: all positions in same column (I have to remove 2,000 regions or RT). Python have commands to remove columns I don't need (samples, sites-regions) and I can merge GT info after converting text file to VCF withbcftools isec -p mergedROH roh.vcf.gz outputGT.vcf.gz
Do you understand what each of those bcftools sub-programs do?
roh
is not a formatter. Whatnorm
does can be called formatting if you stretch the meaning of the word, butroh
does not merge anything. It detects runs of homozygosity. You might see output in a format you wish to see but that does not mean you're getting an accurate result. I don't see how your use ofbcftools isec
is valid either.Also, when you say
I don't see that in the manual. Can you show me an error message? And please use the code formatting option to format your posts better.
miqrom, you have never definitively explained what you want. Some test input and expected output would be a great help for both Ram and I. We are volunteers and are aiming to help you on our own free time.
bcftools concat always shows: Different sample names in input.chr2.vcf.gz. Perhaps "bcftools merge" is what you are looking for? I checked "bcftools norm" with my merged vcf (bcftools merge). After I run again "bcftools roh merged.norm.vcf" and all regions have desapeared (I have 113,888,324 ST lines in 22 merged autosomal chromosomes). I know that runs of autozigoticy is not a right option to make a GTs VCF although I can get an intersection after converting to VCF: all variants shared with merged.VCF (bcftool isec) can be easely obtain their GT in a new column.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.If you are posting this from China and are not able to use the
ADD COMMENT/ADD REPLY
buttons (ones with gray background highlight) then switch to chrome browser. People have said that works in the past.Randomly throwing bcftools sub-programs will not get you anywhere. Define exactly what you need, and we can help you find the tool for the task. To define what you need, you have to give us an overview of the salient points in your workflow, the exact step where you're facing problems in the workflow, sample input (that included a few entries where you face a problem) and expected output. Actual output that you're getting at the moment would also help. Please give us these instead of telling us how you threw every bcftools sub-program at your files.
I think that you need to do the following (assuming that you have separate VCF files):
bcftools reheader
)bcftools concat VCF1.vcf.gz VCF2.vcf.gz VCF3.vcf.gz ... --allow-overlaps --remove-duplicates
)Thanks, Kevin, though many problems to do it. First, I tried with a vcf.gz imput like console's exemple says but it didn't run right. Second,
bcftools -h <file>
is not an accepted command ("can't read any -h file to substitute header). Finally, I runbcftools -s <sample file>
successfully. My sample file was only a text line with first sample name: chr1.sorted.bam I change every chr.vcf and runbcftools concat
with uncompressed vcf inputs and-Oz -o output.vcf.gz
Thanks againOnce again, use the code formatting option to format your posts. See the changes I made to your comment above to know how you can format your posts better.
I mean this in the best possible way, but do you have any idea what you're trying to do?
This sentence makes no sense. What does "console's example" mean? What does "it didn't run right" mean?
Where did you get the idea to run
bcftools -h <file>
? Did anyone suggest it?Again, what are you doing? Are you just typing random characters with bcftools as the first word? I hope you understand how command line programs work.
@OP, just because a command completes without an error, doesnt mean it ran correctly or successfully...