Renaming fasta files with chromosomes
2.5 years ago
I have a .fasta file that I'm trying to use bedtools getfasta on. When I run it I get the error

which I think is because my .fasta headers look like this

>HWI-D00270:252:CB1D5ANXX:8:1303:19141:48584/1
ACAGCTGATTAGACACAATGTCAACAAAGTACTGAAGACCAGAGAAAAACACTTATTATACTC
TTTGTTTTCAGGTGTGGAATGTGCTTTCTACCACGGCTACAAATACTACAAAGGATGTAGTA


and not like this

>chrI
ACAGCTGATTAGACACAATGTCAACAAAGTACTGAAGACCAGAGAAAAACACTTATTATACTC
TTTGTTTTCAGGTGTGGAATGTGCTTTCTACCACGGCTACAAATACTACAAAGGATGTAGTA


Is there a way I can edit the header to reflect only the chromosome.

You appear to have Illumina reads converted into fasta format in first example. That does not appear to be chromosome data at all.

Are you using a bed file for the intervals? What does it look like?

I have a bed file I downloaded from the USCS genome browser

chrI 2653 2738 (TAG)n 322 +

chrI 2974 3011 AT_rich 30 +

chrI 3034 3069 AT_rich 21 +

Why are you using a file with Illumina read data in fasta format instead of using the genome sequence file from UCSC? I assume you want to retrieve the fasta sequence corresponding to those intervals?

Yeah I'm trying to use the DNA my lab sequenced from a particular region to look for transposons