Question: Can gffread extract specific region sequences from vcf or fasta file?
gravatar for MatthewP
22 months ago by
MatthewP780 wrote:

Hello, I have NGS sequencs result(vcf and fasta) file, I download reference sequence(GFF3 format) from NCBI, this link..
Here shows a few part of GFF3 file:

KX254564.1      Genbank region  1       18117   .       +       .       ID=KX254564.1:1..18117;Dbxref=taxon:45949;Name=MT;gbkey=Src;genome=mitochondrion;mol_type=genomic DNA
KX254564.1      Genbank gene    1       1614    .       +       .       ID=gene-COX1;Name=COX1;gbkey=Gene;gene=COX1;gene_biotype=protein_coding
KX254564.1      Genbank CDS     1       1614    .       +       0       ID=cds-ANP25568.1;Parent=gene-COX1;Dbxref=NCBI_GP:ANP25568.1;Name=ANP25568.1;gbkey=CDS;gene=COX1;product=cytochrome coxidase subunit I;protein_id=ANP25568.1;transl_table=5
KX254564.1      Genbank gene    1673    2992    .       +       .       ID=gene-COX2;Name=COX2;gbkey=Gene;gene=COX2;gene_biotype=protein_coding
KX254564.1      Genbank CDS     1673    2992    .       +       0       ID=cds-ANP25569.1;Parent=gene-COX2;Dbxref=NCBI_GP:ANP25569.1;Name=ANP25569.1;gbkey=CDS;gene=COX2;product=cytochrome coxidase subunit II;protein_id=ANP25569.1;transl_table=5
KX254564.1      Genbank gene    2998    3060    .       +       .       ID=gene-trnG(TCC);Name=trnG(TCC);gbkey=Gene;gene=trnG(TCC);gene_biotype=tRNA
KX254564.1      Genbank tRNA    2998    3060    .       +       .       ID=rna-trnG(TCC);Parent=gene-trnG(TCC);gbkey=tRNA;gene=trnG(TCC);product=tRNA-Gly
KX254564.1      Genbank exon    2998    3060    .       +       .       ID=exon-trnG(TCC)-1;Parent=rna-trnG(TCC);gbkey=tRNA;gene=trnG(TCC);product=tRNA-Gly
KX254564.1      Genbank gene    3063    3124    .       +       .       ID=gene-trnR(TCG);Name=trnR(TCG);gbkey=Gene;gene=trnR(TCG);gene_biotype=tRNA
KX254564.1      Genbank tRNA    3063    3124    .       +       .       ID=rna-trnR(TCG);Parent=gene-trnR(TCG);gbkey=tRNA;gene=trnR(TCG);product=tRNA-Arg
KX254564.1      Genbank exon    3063    3124    .       +       .       ID=exon-trnR(TCG)-1;Parent=rna-trnR(TCG);gbkey=tRNA;gene=trnR(TCG);product=tRNA-Arg

What I want: extract specific parts sequence from vcf or fasta file.
example: all tRNA, all gene, specific gene.

I google and find gffread tool and see -h option, it seems this tool can't handle my job? I hope I can get some help here, thanks.

gffread gff3 vcf • 828 views
ADD COMMENTlink modified 22 months ago by shubhra.bhattacharya120 • written 22 months ago by MatthewP780

I can write python script get those regions than get sequence from vcf file, but this needs much time, I hope there is tools already.

ADD REPLYlink modified 22 months ago • written 22 months ago by MatthewP780

awk -F "\t" '{if($3=="tRNA")print}' file.gff > new_file.gff

ADD REPLYlink written 22 months ago by shubhra.bhattacharya120

Hi, if you need the entire information for each of those, you could easily grep them out.

ADD REPLYlink written 22 months ago by vinayjrao180
gravatar for shubhra.bhattacharya
22 months ago by
shubhra.bhattacharya120 wrote:

I am not sure about gffread but extracting specific gene or all genes etc can be done using bedtools.

ADD COMMENTlink written 22 months ago by shubhra.bhattacharya120

It seems we can't get fasta file directly from NGS sequence VCF result? I actually get my fasta sequence from VCF by writting python script myself. Because we may get more than one variants at the same position in VCF.

ADD REPLYlink written 22 months ago by MatthewP780

We can get fasta from vcf file using vcftools consensus.

ADD REPLYlink written 22 months ago by shubhra.bhattacharya120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1616 users visited in the last hour