How to make some modifications to reference genome?
0
0
Entering edit mode
4.1 years ago

What is an efficient way to delete or insert a piece of sequence in genome? For example, I might want to delete chr3: 1000100-1000000 this region or insert a 10000 bp sequence at chr11:12345656. How to do that? If programming is necessary, Python solution is preferred.

sequence genome • 748 views
ADD COMMENT
1
Entering edit mode

If programming is necessary, Python solution is preferred.

nope. (Almost) any text editor of your choice will do. Use emacs if you want to go completely crazy, vim for a normal level of bioinformatic insanity, Notepad++ for beginners.

Trust me, I'm a self-confessed specialist in solving one off stuff without writing code and took the blame for that.

ADD REPLY
0
Entering edit mode

If the reference file has lines of, say, 60 characters each ending with line breaks, he can't just delete or insert in a text program.

ADD REPLY
0
Entering edit mode

respectfully, I disagree. chr3:1000100 is col 20 on line 16668 of the chr3 record.

I'm not saying it's the best solution, quite on the contrary. It's error prone, tedious to get the coordinates right, text editors struggle with large files, etc...

Anyway, my comment was only half serious and most certainly not meant to trigger any sort of discussion on best practices on how, when or why to use or not use text editors with fasta files.

ADD REPLY
0
Entering edit mode

Sorry to say but your question lacks vital information to receive valuable answers. For instance you might want to mention the species you're working with? What have you tried to get to a solution? ...

Please go through [[ Please read before posting a question ::: How To Ask A Good Question ]] and then consider editing your question.

why specifically python? sounds like an assignment

ADD REPLY
0
Entering edit mode

Thanks for your feedback. I will make some changes but I am still confused about something. First, why using python sounds like an assignment? Then which programming language should I use not to make it sounds like an assignment? If I say Java, perl, R or C, will you have the same question? And for species, I don’t see necessity to mention that because I just want a software take a fasta file as input, find a particular position and do some changes. Does species affect how this kind of softwares work? You mean there might be a specie’s genome file which I cant use a coordinate to locate a particular position and modify something, in fact, delete or insert a sting?

ADD REPLY
1
Entering edit mode

getting more serious, I recommend starting with Biopython::SeqIO. I never tried to manipulate a genome like you describe (I used Bioperl back then), but at the bottom of the page you might find some useful information in the "Random subsequences" section. I bet you can manipulate the object similarly.

Manipulation of the reference genome is not the most common thing in bioinformatics, but a relatively simple one. The lack of context makes it read like an assignment. Your objection is valid, neither species nor purpose are necessary to solve the problem. However, generally speaking, a bit of background is appreciated, otherwise you risk to convey the impression to wait for someone else to do the work.

ADD REPLY

Login before adding your answer.

Traffic: 3152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6