bedtools getfasta command
1
0
Entering edit mode
3.1 years ago
harry ▴ 30

Hi, I want to extract sequences from the coordinates with header include the name with transcript id and exon rank. Till now I used bedtools getfasta with the -name and -name+ option but I didn't get the exon rank include and I must want exon rank include in my header with transcript id.

Chromosome  Exon region start   Exon region end Transcript stable ID    Exon rank in transcript
1   11869   12227   ENST00000456328 1
1   12613   12721   ENST00000456328 2
1   13221   14409   ENST00000456328 3
1   12010   12057   ENST00000450305 1
1   12179   12227   ENST00000450305 2
1   12613   12697   ENST00000450305 3
1   12975   13052   ENST00000450305 4
1   13221   13374   ENST00000450305 5
1   13453   13670   ENST00000450305 6
1   29534   29570   ENST00000488147 1
1   24738   24891   ENST00000488147 2
1   18268   18366   ENST00000488147 3
1   17915   18061   ENST00000488147 4
1   17606   17742   ENST00000488147 5
1   17233   17368   ENST00000488147 6
1   16858   17055   ENST00000488147 7
1   16607   16765   ENST00000488147 8
1   15796   15947   ENST00000488147 9
1   15005   15038   ENST00000488147 10
1   14404   14501   ENST00000488147 11

Thanks in advance

fasta • 1.1k views
ADD COMMENT
2
Entering edit mode
3.1 years ago
vkkodali_ncbi ★ 3.7k

-name+ is deprecated per bedtools documentation. You should be able to concatenate the last two columns using sed and then run bedtools getfasta...

cat original.bed | sed -r 's/(ENST[0-9]*)\t([0-9]*)/\1_\2/' > modified.bed
bedtools getfasta -fi genome.fa -bed modified.bed -name > transcripts.fa
ADD COMMENT

Login before adding your answer.

Traffic: 2266 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6