Question: Detection of SNPs
0
gravatar for kamel
5 months ago by
kamel0
kamel0 wrote:

Hi Biostars,

I have detected SNPs in whole genomes by GATK and I have .vcf files. my question is how I can extract SNPs from this gene (I do not know the name of this gene but I have a fasta file of 3kb which corresponds to my gene of interest). could you give me the vcftools command to extract the snps from this specific region. Thank you for your help

snp alignment gene genome • 315 views
ADD COMMENTlink modified 5 months ago by Alex Reynolds26k • written 5 months ago by kamel0

It is unclear which data you have available. Please elaborate and be precise.

ADD REPLYlink written 5 months ago by WouterDeCoster32k

it is done . I redid the question

ADD REPLYlink written 5 months ago by kamel0
1
gravatar for guillaume.rbt
5 months ago by
guillaume.rbt450
France
guillaume.rbt450 wrote:

If you want to detect SNPs on a specific region I advise you to do a SNP calling on all genomes (aligning reads with a mapper like bwa, then use a variant caller, as freebayes).

Then you can extract the SNPs of the specific region you are interested in with vcftools.

ADD COMMENTlink modified 5 months ago • written 5 months ago by guillaume.rbt450

HI guillaume.rbt I have detected SNPs in whole genomes by GATK and I have .vcf files. my question is how I can extract SNPs from this gene (I do not know the name of this gene but I have a fasta file of 3kb which corresponds to my gene of interest). could you give me the vcftools command to extract the snps from this specific region. Thank you for your help

ADD REPLYlink written 5 months ago by kamel0
2

That is important information which you should have provided in your first post. Please update your question and be as precise as possible. You just wasted 4 hours by not explaining your issue sufficiently.

ADD REPLYlink modified 5 months ago • written 5 months ago by WouterDeCoster32k

You need to get the position of your gene to extract the SNPs. For that you can do a blast with your fasta against your genome : https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch

Then with the positions you can use the --position parameters from vcftools to filter your vcf : http://vcftools.sourceforge.net/man_latest.html

ADD REPLYlink written 5 months ago by guillaume.rbt450

I recovered the position of my gene by blast, it is in chromosome 2 (2: 2241384-2244383). I search on vcftools to a commade that allows me to study the snp in this region but I have not found. can you help me

ADD REPLYlink modified 5 months ago • written 5 months ago by kamel0
1

second thought, bedtools intersect is even easier to use :

echo -e "2\t2241384\t2244383" > gene_position.bed
bedtools intersect -a ./your_vcf.vcf -b ./gene_position.bed > filtered_vcf.vcf
ADD REPLYlink written 5 months ago by guillaume.rbt450

Thanks for your help guillaume.rbt

ADD REPLYlink written 5 months ago by kamel0

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.
Upvote|Bookmark|Accept

ADD REPLYlink written 5 months ago by WouterDeCoster32k
1
gravatar for Alex Reynolds
5 months ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

If your FASTA has metadata in its record header that points to its location on the genome, you can use that directly to map any SNPs to it via BEDOPS bedmap and vcf2bed:

$ echo -e 'chr2\t2241384\t2244383' | bedmap --echo --echo-map-id-uniq --delim '\t' - <(vcf2bed < snps.vcf) > answer.bed

Replace chr2 with 2, depending on the format of chromosome name in your snps.vcf file. This could either be UCSC (chr2) or Ensembl (2), most likely.

The file answer.bed will contain the 3k nt interval and a listing of all SNP ID values that map to — or associate with, or overlap — that interval.

If you don't have metadata in the FASTA header that tells you where you are, you could use a BLAST search on the sequence to get back the location of your sequence for your genome of interest.

Then you run the command above, again replacing what goes into echo with whatever region comes out of the BLAST search.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Alex Reynolds26k
0
gravatar for H.Hasani
5 months ago by
H.Hasani610
Freiburg, Germany
H.Hasani610 wrote:

I would simply use GATK HaplotypeCaller with -L option

ADD COMMENTlink written 5 months ago by H.Hasani610

Hi H.Hasani. as I said to guillaume. I have detected snp of whole genomes (fungi). then that interests me to look directly snp of a gene of 3kb. I have a fasta file of this genes. I can add the opion -L followed by this fasta file of 3kb of this gene ???

ADD REPLYlink written 5 months ago by kamel0

No you cannot, the -L option takes the genomic coordinates of your region of interest. Since you already did SNP calling there is no point in repeating this.

ADD REPLYlink written 5 months ago by WouterDeCoster32k

Exactly, it wasn't quite clear in the question. I believe guillaume.rbt has given you the correct answer.

ADD REPLYlink written 5 months ago by H.Hasani610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1835 users visited in the last hour