bedtools getfasta concatenating sequences
0
0
Entering edit mode
2.6 years ago
asalimih ▴ 60

Hi, I have a bed file containing exons of the genes. the name field is specified with name of the gene like (ENSG***). when I run bedtools getfasta I get the sequences of each exon separately. is there a standard way in order to concatenate sequences that have the same gene name? or I should write a script to do this manually on the fasta files.
when I read the bedtools documentation there is a -split switch which is only applicable to bed12 file format. link but my bed files are not bed12.
Thanks in advance

bedtools getfasta • 1.8k views
ADD COMMENT
0
Entering edit mode

You might try something like that with AGAT

agat_convert_bed2gff.pl --bed file.bed -o file.gff
agat_sp_extract_sequences.pl --gff file.gff --fasta file.fasta -t exon --merge -o merged_exon.fa
ADD REPLY
0
Entering edit mode

this produced an empty file. I assume the file.fasta is the genome. here is a demonstration of my bed file:

GL000009.2      56139   58376   ENSG00000278704.1       1       -
GL000194.1      53589   55676   ENSG00000277400.1       1       -
GL000194.1      53593   54832   ENSG00000274847.1       1       -
GL000194.1      55445   55676   ENSG00000274847.1       1       -
GL000194.1      112791  112850  ENSG00000274847.1       1       -
GL000194.1      112791  112850  ENSG00000277400.1       1       -
GL000194.1      114985  115018  ENSG00000277400.1       1       -
GL000194.1      114985  115055  ENSG00000274847.1       1       -
GL000195.1      37433   37534   ENSG00000277428.1       1       -
GL000195.1      42938   44923   ENSG00000276256.1       1       -
ADD REPLY
0
Entering edit mode

Ok it is because the first command create gene features only and the second remove gene feature if they do not have any sub-feature like mRNA,transcript,exon etc. So like that it should work:

# Convert the bed6 to gff (exon feature only)
agat_convert_bed2gff.pl --bed file.bed --primary_tag exon  -o file.gff
# replace Name attribute by Parent attribute
sed 's/Name=/Parent=/'   file.gff > file2.gff
# create a clean gff (optional step)
agat_convert_sp_gxf2gxf.pl --gff  file2.gff -o file_clean.gff
# Extract exon
agat_sp_extract_sequences.pl --gff file_clean.gff --fasta file.fasta -t exon --merge -o merged_exon.fa

Yes file.fasta is the genome from wchich you will extract the sequence from.

ADD REPLY

Login before adding your answer.

Traffic: 2660 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6