pybedtools change fasta headers
0
1
Entering edit mode
7.1 years ago
GR ▴ 400

I am using pybedtools to get promoter sequences for my genes of interest. I am almost done with the script. But I would also like to include gene names in the fasta headers. By default, it only include corrdinates in fasta headers.

Below is my script:

coords=Chr1 1000 2000 forward 1 +

a = pybedtools.BedTool(coords, from_string= True)

fastaseq = pybedtools.example_filename(genome_file)

a = a.sequence(fi=fastaseq, s=True)

promoterseq= open(a.seqfn).read()

output:

Chr1:1000-2000(+)

TTTGGGGTTATAAATTATTAGAAGTT......

Expected output:

genename:Chr1:1000-2000(+)

TTTGGGGTTATAAATTATTAGAAGTT......

I was wondering if there is a way to include the gene name in fasta header.

Thanks, R

pybedtools python • 1.7k views
ADD COMMENT
0
Entering edit mode

Do you have a file or a dataset with gene names that correpond to these promoter coordinates? What nomenclature for gene names you want to use? Also tagging by transcript id (preferably RefSeq transcript) is a good idea from my experience because of extra data on something more stable relative to the assembly changes and it captures alternative TSSs better.

ADD REPLY
0
Entering edit mode

And I personally would add distance from TSS or translation start site relative to the transcript and with specific reference used sotred together alignments of transcripts to that particular reference. But this is just me =)

ADD REPLY

Login before adding your answer.

Traffic: 2276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6