Question: Can gffread extract specific region sequences from vcf or fasta file?
gravatar for MatthewP
2.2 years ago by
MatthewP880 wrote:

Hello, I have NGS sequencs result(vcf and fasta) file, I download reference sequence(GFF3 format) from NCBI, this link..
Here shows a few part of GFF3 file:

KX254564.1      Genbank region  1       18117   .       +       .       ID=KX254564.1:1..18117;Dbxref=taxon:45949;Name=MT;gbkey=Src;genome=mitochondrion;mol_type=genomic DNA
KX254564.1      Genbank gene    1       1614    .       +       .       ID=gene-COX1;Name=COX1;gbkey=Gene;gene=COX1;gene_biotype=protein_coding
KX254564.1      Genbank CDS     1       1614    .       +       0       ID=cds-ANP25568.1;Parent=gene-COX1;Dbxref=NCBI_GP:ANP25568.1;Name=ANP25568.1;gbkey=CDS;gene=COX1;product=cytochrome coxidase subunit I;protein_id=ANP25568.1;transl_table=5
KX254564.1      Genbank gene    1673    2992    .       +       .       ID=gene-COX2;Name=COX2;gbkey=Gene;gene=COX2;gene_biotype=protein_coding
KX254564.1      Genbank CDS     1673    2992    .       +       0       ID=cds-ANP25569.1;Parent=gene-COX2;Dbxref=NCBI_GP:ANP25569.1;Name=ANP25569.1;gbkey=CDS;gene=COX2;product=cytochrome coxidase subunit II;protein_id=ANP25569.1;transl_table=5
KX254564.1      Genbank gene    2998    3060    .       +       .       ID=gene-trnG(TCC);Name=trnG(TCC);gbkey=Gene;gene=trnG(TCC);gene_biotype=tRNA
KX254564.1      Genbank tRNA    2998    3060    .       +       .       ID=rna-trnG(TCC);Parent=gene-trnG(TCC);gbkey=tRNA;gene=trnG(TCC);product=tRNA-Gly
KX254564.1      Genbank exon    2998    3060    .       +       .       ID=exon-trnG(TCC)-1;Parent=rna-trnG(TCC);gbkey=tRNA;gene=trnG(TCC);product=tRNA-Gly
KX254564.1      Genbank gene    3063    3124    .       +       .       ID=gene-trnR(TCG);Name=trnR(TCG);gbkey=Gene;gene=trnR(TCG);gene_biotype=tRNA
KX254564.1      Genbank tRNA    3063    3124    .       +       .       ID=rna-trnR(TCG);Parent=gene-trnR(TCG);gbkey=tRNA;gene=trnR(TCG);product=tRNA-Arg
KX254564.1      Genbank exon    3063    3124    .       +       .       ID=exon-trnR(TCG)-1;Parent=rna-trnR(TCG);gbkey=tRNA;gene=trnR(TCG);product=tRNA-Arg

What I want: extract specific parts sequence from vcf or fasta file.
example: all tRNA, all gene, specific gene.

I google and find gffread tool and see -h option, it seems this tool can't handle my job? I hope I can get some help here, thanks.

gffread gff3 vcf • 965 views
ADD COMMENTlink modified 2.2 years ago by shubhra.bhattacharya120 • written 2.2 years ago by MatthewP880

I can write python script get those regions than get sequence from vcf file, but this needs much time, I hope there is tools already.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by MatthewP880

awk -F "\t" '{if($3=="tRNA")print}' file.gff > new_file.gff

ADD REPLYlink written 2.2 years ago by shubhra.bhattacharya120

Hi, if you need the entire information for each of those, you could easily grep them out.

ADD REPLYlink written 2.2 years ago by vinayjrao200
gravatar for shubhra.bhattacharya
2.2 years ago by
shubhra.bhattacharya120 wrote:

I am not sure about gffread but extracting specific gene or all genes etc can be done using bedtools.

ADD COMMENTlink written 2.2 years ago by shubhra.bhattacharya120

It seems we can't get fasta file directly from NGS sequence VCF result? I actually get my fasta sequence from VCF by writting python script myself. Because we may get more than one variants at the same position in VCF.

ADD REPLYlink written 2.2 years ago by MatthewP880

We can get fasta from vcf file using vcftools consensus.

ADD REPLYlink written 2.2 years ago by shubhra.bhattacharya120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1650 users visited in the last hour