bedtools getfasta concatenating sequences
Entering edit mode
2.2 years ago
asalimih ▴ 60

Hi, I have a bed file containing exons of the genes. the name field is specified with name of the gene like (ENSG***). when I run bedtools getfasta I get the sequences of each exon separately. is there a standard way in order to concatenate sequences that have the same gene name? or I should write a script to do this manually on the fasta files.
when I read the bedtools documentation there is a -split switch which is only applicable to bed12 file format. link but my bed files are not bed12.
Thanks in advance

bedtools getfasta • 1.6k views
Entering edit mode

You might try something like that with AGAT --bed file.bed -o file.gff --gff file.gff --fasta file.fasta -t exon --merge -o merged_exon.fa
Entering edit mode

this produced an empty file. I assume the file.fasta is the genome. here is a demonstration of my bed file:

GL000009.2      56139   58376   ENSG00000278704.1       1       -
GL000194.1      53589   55676   ENSG00000277400.1       1       -
GL000194.1      53593   54832   ENSG00000274847.1       1       -
GL000194.1      55445   55676   ENSG00000274847.1       1       -
GL000194.1      112791  112850  ENSG00000274847.1       1       -
GL000194.1      112791  112850  ENSG00000277400.1       1       -
GL000194.1      114985  115018  ENSG00000277400.1       1       -
GL000194.1      114985  115055  ENSG00000274847.1       1       -
GL000195.1      37433   37534   ENSG00000277428.1       1       -
GL000195.1      42938   44923   ENSG00000276256.1       1       -
Entering edit mode

Ok it is because the first command create gene features only and the second remove gene feature if they do not have any sub-feature like mRNA,transcript,exon etc. So like that it should work:

# Convert the bed6 to gff (exon feature only) --bed file.bed --primary_tag exon  -o file.gff
# replace Name attribute by Parent attribute
sed 's/Name=/Parent=/'   file.gff > file2.gff
# create a clean gff (optional step) --gff  file2.gff -o file_clean.gff
# Extract exon --gff file_clean.gff --fasta file.fasta -t exon --merge -o merged_exon.fa

Yes file.fasta is the genome from wchich you will extract the sequence from.


Login before adding your answer.

Traffic: 1851 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6