Entering edit mode
7.6 years ago
alireza346
▴
10
I am trying to get the sequence at both sides of the coding sequence stop site for all of the genes for the alignment. 40 nt before stop site (in the CDS) and 50 nt at the downstream of CDS stop site(in the 3p UTR) for all of the genes. do you know how I can do that correctly
It's unclear which data format you have.
I have RNAseq and want to align that to this part of transcriptome.
That is important information you should have mentioned in your question. Why do you think this is a good idea?
Anyway, to get this done you would first need a GTF/GFF of your organism and isolate the stop sites. Then you would make a bed file with -40nt and + 50nt intervals for each stop site. Then you could use
bedtools getfastato get the nucleotide sequence of those intervals from the reference genome.