Question

bedtools getfasta command

0

Entering edit mode

3.1 years ago

harry ▴ 30

Hi, I want to extract sequences from the coordinates with header include the name with transcript id and exon rank. Till now I used bedtools getfasta with the -name and -name+ option but I didn't get the exon rank include and I must want exon rank include in my header with transcript id.

Chromosome  Exon region start   Exon region end Transcript stable ID    Exon rank in transcript
1   11869   12227   ENST00000456328 1
1   12613   12721   ENST00000456328 2
1   13221   14409   ENST00000456328 3
1   12010   12057   ENST00000450305 1
1   12179   12227   ENST00000450305 2
1   12613   12697   ENST00000450305 3
1   12975   13052   ENST00000450305 4
1   13221   13374   ENST00000450305 5
1   13453   13670   ENST00000450305 6
1   29534   29570   ENST00000488147 1
1   24738   24891   ENST00000488147 2
1   18268   18366   ENST00000488147 3
1   17915   18061   ENST00000488147 4
1   17606   17742   ENST00000488147 5
1   17233   17368   ENST00000488147 6
1   16858   17055   ENST00000488147 7
1   16607   16765   ENST00000488147 8
1   15796   15947   ENST00000488147 9
1   15005   15038   ENST00000488147 10
1   14404   14501   ENST00000488147 11

Thanks in advance

fasta • 1.1k views

ADD COMMENT • link updated 3.1 years ago by vkkodali_ncbi ★ 3.7k • written 3.1 years ago by harry ▴ 30

score 2 · Answer 1 · 2021-03-27

2

Entering edit mode

3.1 years ago

vkkodali_ncbi ★ 3.7k

-name+ is deprecated per bedtools documentation. You should be able to concatenate the last two columns using sed and then run bedtools getfasta...

cat original.bed | sed -r 's/(ENST[0-9]*)\t([0-9]*)/\1_\2/' > modified.bed
bedtools getfasta -fi genome.fa -bed modified.bed -name > transcripts.fa

ADD COMMENT • link 3.1 years ago by vkkodali_ncbi ★ 3.7k