Entering edit mode
22 months ago
rheab1230 ▴ 140
I am trying to filter .vcf files to contain only single point SNP and remove indel.
I have remove indel from the .vcf file by using the following command:
grep -v $'\t-\t' in.vcf > no_indels.vcf
I am able to remove indel from .vcf file.
But I also want to remove any SNP corresponding to CAT --> C and C --> CTATGG
I just want my .vcf file to contain only A-T, or T-A or G-C.
I mean just one single point mutation only.
Can anyone please help me with this?
I have use bcftools as mentioned to filter out SNP from .vcf files.
I got the output file. but I also received some message after running the command.
These warnings can be ignored most of the time, but it's usually better to fix them. For example, your header seems to be improper. If you're working on uncompressed
.vcffiles directly, try running
after installing htslib. This will make sure your VCF header is proper.
Okay, I am able to do it. Thank You. Do you also know how to update id coloumn in .vcf file with rsid in dbSNP?
Your question on this topic was already answered by Wouter, I believe. Did that solution not work for you? Explore
bcftools annotateif you'd like another option.
No, Its not working. the output file is not as required by predixcan. I am exploring bcftools annotate option But i want to know how should i download dbSNP142.vcf file. bcftools annotate function require dbSNP.vcf file but I have dbSNP.txt file.
I actually use bcftools to remove indels from .vcf files.as you mentioned. It worked. now my .vcf files contain only SNP. Then i thought of using annovar software to annotate my .vcf file with dbSNP142. but its getting annotated and its updated with rsid. but now indel is getting incorporated in my file
That does not make sense - if you're trying to annotate an SNPs only input file, the annotation process cannot add new entries (which is what would need to happen for you to see indels again).
You can annotate using a text file as well. NCBI's FTP site will have the dbSNP VCF file, but you should be able to use a tab delimited file with
I am using bcftools. the code is:
the error is:
So then i created the file using bgzip.
the code is :
then i run the code again:
Now i am getting this error:
tabix -p vcfinstead of just
Please see how I've combined and formatted your comments so it's easier to read - you can also use the formatting bar (especially the
codeoption) to present your post better. You can use backticks for inline code (`text` becomes
text), or select a chunk of text and use the highlighted button to format it as a code block. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted.
I now use
tabix -p GEUVADIS.chr1.genotype.vcf.gzbut still getting error: unrecognized preset 'GEUVADIS.chr1.genotype.vcf.gz'
Please pay close attention to the command I wrote initially. The correct command is
tabix -p vcf GEUVADIS.chr1.genotype.vcf.gzThe
-p vcfasks tabix to use the vcf preset, the
vcfpart is not a stand in for the actual vcf file.
I didn't understand. I am sorry but i am completely new to bioinformatics. Should i use the following command: tabix -p GEUVADIS.chr1.genotype.vcf.gz GEUVADIS.genotype.vcf.gz(all the combined vcf files)?
Type that exactly
I am able to do the above command and not receive any error.
Then when I use annotate function of bcftools its showing error. the command:
Those are warnings and should not kill the process, but they indicate there's something wrong with the VCF file. Can you check the VCF headers and see what's going on?
I'm afraid this thread is becoming unmanageable and quite niche, and I won't be able to help you much longer with this.