Question

pybedtools change fasta headers

1

Entering edit mode

7.1 years ago

GR ▴ 400

I am using pybedtools to get promoter sequences for my genes of interest. I am almost done with the script. But I would also like to include gene names in the fasta headers. By default, it only include corrdinates in fasta headers.

Below is my script:

coords=Chr1 1000 2000 forward 1 +

a = pybedtools.BedTool(coords, from_string= True)

fastaseq = pybedtools.example_filename(genome_file)

a = a.sequence(fi=fastaseq, s=True)

promoterseq= open(a.seqfn).read()

output:

Chr1:1000-2000(+)

TTTGGGGTTATAAATTATTAGAAGTT......

Expected output:

genename:Chr1:1000-2000(+)

TTTGGGGTTATAAATTATTAGAAGTT......

I was wondering if there is a way to include the gene name in fasta header.

Thanks, R

pybedtools python • 1.7k views

ADD COMMENT • link 7.1 years ago by GR ▴ 400

0

Entering edit mode

Do you have a file or a dataset with gene names that correpond to these promoter coordinates? What nomenclature for gene names you want to use? Also tagging by transcript id (preferably RefSeq transcript) is a good idea from my experience because of extra data on something more stable relative to the assembly changes and it captures alternative TSSs better.

ADD REPLY • link 7.1 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

And I personally would add distance from TSS or translation start site relative to the transcript and with specific reference used sotred together alignments of transcripts to that particular reference. But this is just me =)

ADD REPLY • link 7.1 years ago by Petr Ponomarenko ★ 2.8k