Replacement of fasta file header
2
0
Entering edit mode
8.7 years ago
bnina9999 ▴ 30

Hi there,

I have fasta files with header

@AS500187:87:J5LBGHRXX:2:11101:7742:1046 1:N:0:21

I want to replace

@AS500187:87:J5LBGHRXX:2 with @AS500187:87:J5LBGHRXX:1

I am trying sed but not getting results.

Any help would be highly appreciated.

Thank you.

sed sequence • 3.0k views
ADD COMMENT
0
Entering edit mode

Please tell us about the sed command that you tried. You may be really close to getting it done.

ADD REPLY
0
Entering edit mode

Thank you so much Pierre and Wocka, SED worked well, I was trying to compare pattern instead of :

Wocka i am not a programmer but I would definitely try your script , my learning starts from here :)

ADD REPLY
1
Entering edit mode

All right! You can upvote and accept the Pierre's solution! This is a good one! ;)

Ah nice! If you need some advice or have some questions, here we are!

ADD REPLY
5
Entering edit mode
8.7 years ago

replace the 3rd instance of a colon followed by a character by :1

sed 's/\:./\:1/3'
ADD COMMENT
2
Entering edit mode
8.7 years ago
glihm ▴ 660

Hi there,

You can use a little python script to generate a new fasta file with modificated headers.

#!/usr/bin/python

#Handle to open your file with Python.
fasta_handle = open("your_file_name.fa", "r")
#outputfile to generate the modificated fasta file
output_handle = open("output_file_name.fa", "w+")

#For each line of your file
for line in fasta_handle:
    #If it is a title, starts with @ in your example. (You can change the @ to > if you want.
    if "@" in line:
        #If it is a title, we want to rename it. As I can see in your header structure, ":" can separate the header.
        header_split = line.strip().split(":")
        print header_split

        #The result of the split is a list of the different part separated by a ":".
        #You want to conserve the first 3 parts, and change the 4, right?
        modification = "1" #What you want to add
        new_header = header_split[0] + ":" + header_split[1] + ":" + header_split[2] + ":" + modification
        print new_header

        #We write the result in the output file, with '\n' as return to the new line.
        output_handle.write(new_header + '\n') 
    #If the line is a part of the sequence, we have to right it without modifications.
    else:
        output_handle.write(line)

#Closing files
fasta_handle.close()
output_handle.close()

Here is the code. If you are not fluent with programing, you just have to read the commentaries to understand what the script is doing. To run it, you have to select the code, copy it in a new file and save it as rename_header.py for instance. You can change your_file_name.fa with your input file name (with headers to modify) and output_file_name.fa with the name you want. This file will contain the fasta file with the new header. If you want to change the 1 in 123084 or other, you just have to modify the variable modification.

Then, in a terminal, you have to enter: python rename_header.py

​I hope this will help you

EDIT : ok... SED way is the best!

ADD COMMENT

Login before adding your answer.

Traffic: 2398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6