Question: Can gffread extract specific region sequences from vcf or fasta file?
0
gravatar for MatthewP
5 months ago by
MatthewP110
China
MatthewP110 wrote:

Hello, I have NGS sequencs result(vcf and fasta) file, I download reference sequence(GFF3 format) from NCBI, this link..
Here shows a few part of GFF3 file:

##species https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=45949
KX254564.1      Genbank region  1       18117   .       +       .       ID=KX254564.1:1..18117;Dbxref=taxon:45949;Name=MT;gbkey=Src;genome=mitochondrion;mol_type=genomic DNA
KX254564.1      Genbank gene    1       1614    .       +       .       ID=gene-COX1;Name=COX1;gbkey=Gene;gene=COX1;gene_biotype=protein_coding
KX254564.1      Genbank CDS     1       1614    .       +       0       ID=cds-ANP25568.1;Parent=gene-COX1;Dbxref=NCBI_GP:ANP25568.1;Name=ANP25568.1;gbkey=CDS;gene=COX1;product=cytochrome coxidase subunit I;protein_id=ANP25568.1;transl_table=5
KX254564.1      Genbank gene    1673    2992    .       +       .       ID=gene-COX2;Name=COX2;gbkey=Gene;gene=COX2;gene_biotype=protein_coding
KX254564.1      Genbank CDS     1673    2992    .       +       0       ID=cds-ANP25569.1;Parent=gene-COX2;Dbxref=NCBI_GP:ANP25569.1;Name=ANP25569.1;gbkey=CDS;gene=COX2;product=cytochrome coxidase subunit II;protein_id=ANP25569.1;transl_table=5
KX254564.1      Genbank gene    2998    3060    .       +       .       ID=gene-trnG(TCC);Name=trnG(TCC);gbkey=Gene;gene=trnG(TCC);gene_biotype=tRNA
KX254564.1      Genbank tRNA    2998    3060    .       +       .       ID=rna-trnG(TCC);Parent=gene-trnG(TCC);gbkey=tRNA;gene=trnG(TCC);product=tRNA-Gly
KX254564.1      Genbank exon    2998    3060    .       +       .       ID=exon-trnG(TCC)-1;Parent=rna-trnG(TCC);gbkey=tRNA;gene=trnG(TCC);product=tRNA-Gly
KX254564.1      Genbank gene    3063    3124    .       +       .       ID=gene-trnR(TCG);Name=trnR(TCG);gbkey=Gene;gene=trnR(TCG);gene_biotype=tRNA
KX254564.1      Genbank tRNA    3063    3124    .       +       .       ID=rna-trnR(TCG);Parent=gene-trnR(TCG);gbkey=tRNA;gene=trnR(TCG);product=tRNA-Arg
KX254564.1      Genbank exon    3063    3124    .       +       .       ID=exon-trnR(TCG)-1;Parent=rna-trnR(TCG);gbkey=tRNA;gene=trnR(TCG);product=tRNA-Arg

What I want: extract specific parts sequence from vcf or fasta file.
example: all tRNA, all gene, specific gene.

I google and find gffread tool and see -h option, it seems this tool can't handle my job? I hope I can get some help here, thanks.

gffread gff3 vcf • 279 views
ADD COMMENTlink modified 5 months ago by shubhra.bhattacharya120 • written 5 months ago by MatthewP110

I can write python script get those regions than get sequence from vcf file, but this needs much time, I hope there is tools already.

ADD REPLYlink modified 5 months ago • written 5 months ago by MatthewP110
1

awk -F "\t" '{if($3=="tRNA")print}' file.gff > new_file.gff

ADD REPLYlink written 5 months ago by shubhra.bhattacharya120

Hi, if you need the entire information for each of those, you could easily grep them out.

ADD REPLYlink written 5 months ago by vinayjrao120
1
gravatar for shubhra.bhattacharya
5 months ago by
shubhra.bhattacharya120 wrote:

I am not sure about gffread but extracting specific gene or all genes etc can be done using bedtools.

https://bedtools.readthedocs.io/en/latest/content/tools/getfasta.html

ADD COMMENTlink written 5 months ago by shubhra.bhattacharya120

It seems we can't get fasta file directly from NGS sequence VCF result? I actually get my fasta sequence from VCF by writting python script myself. Because we may get more than one variants at the same position in VCF.

ADD REPLYlink written 5 months ago by MatthewP110
1

We can get fasta from vcf file using vcftools consensus.

ADD REPLYlink written 5 months ago by shubhra.bhattacharya120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2172 users visited in the last hour