Biostar

11,355 results • Page 1 of 228

Hello! I'm building a database of a certain gene family. I downloaded the fastas from uniprot , concatenated the resulting fastas using `cat` and the fasta headers of each sequence have the following...that the gene information (the `GN=` part) is the first string after the first pipe sign (|) on each fasta header. is there a way to do that using awk or R string manipulation? I want that all my…

fasta R bash string

updated 23 months ago • v.berriosfarias

I have so many bacterial Refseq fasta files and want to parse the headers in the fasta files to see if is there any word 'chromosome' in the headers, as a side note...there are some sequences in FASTA files started with '>' so i want to parse all the lines staring with '>' . I know i have files that do not have word like 'chromosome...I would like to separate the files with header '…

FASTA sequence header Parse

updated 5.6 years ago • Shelle

Hi all, I'm looking for a simple solution for renaming fasta headers. I have this fasta header >trpE___AA_HMM___6fa05435949258489b608db9e58e5ba38821f2f26fffe5755daff43abin_id

fasta regex

updated 17 months ago • Diego

Hi all, I have fasta files containing sequences from different loci (one fasta file per individual) and would like to change the headers...to then merge everything into one big fasta file. This is the format of the first two lines of one my fasta files (called individual1-allele1.fa): >lcl|contig_13517...contig_22604 AAGGATTAAAAATGAAAACTATGCAAAACTATGAGGAATAAAACTTCTTACATCTGAACT …

fasta

updated 13 months ago • Zoe

Hello I have a fasta file with amino acids sequences, which was translated from nucleotide sequence by transeq. ＞CE99543_15407_1 MAV..... ＞CE99641_51257_1...MSQ...... I want to delete `_1` at the end of fasta header because it was somehow added by transeq. ＞CE99543_15407 MAV..... ＞CE99641_51257 MSQ...... Just deleting `_1` would not work...because there are hea…

fasta

updated 4.7 years ago • ysas

I am trying to open the fasta file with sequences. My bio perl script opens the sequence but not with fasta header. How can I get fasta header with bio...perl -w use Bio::SeqIO; $seqio_obj = Bio::SeqIO->new(-file=> "no_plasmid.fasta", -format=>"fasta"); $seq_obj = $seqio_obj->next_seq; $acc = $so->accession_number; while($seq_obj = $seqio_obj->next_s…

RNA-Seq perl bio perl

updated 7.7 years ago • bandanaschapagain

Hi All, Can someone help me to rename the following fasta headers >M42.PS.NODE_36229_length_665_cov_0.0565371:i:367638.gene_3804 >M43.PS.NODE_26456_length_662_cov_0.0603908...gt;M42_gene_3804 >M43_gene_16048 >M52_gene_14948 >M53_gene_3089 The fasta file has 80,0000 such headers! I am totally new :-) SO I would really appreciate away using…

fasta

updated 6.7 years ago • davkam30

Hi, I have a fasta file with 300 protein sequences. I intend to construct a phylogenetic tree with it. I would want only the accession number...and the organism name in the fasta header and remove the rest of the information. Can anybody suggest how to do this? I have a linux based system with perl...and python installed. For example, i want to convert a header like this: >gi|685204…

sequence edit

updated 8.0 years ago • bkvijay.jayaraman

I have a fasta file with headers that I want to compare with two other text files and if it's present in the first file, put it first on the...to compare the text files to the fasta header and if there's a match, organize/reorder the fasta header file so that that match name is the first entry on the header...look like this. (for the first one since fly is present in file2, place it as the first …

fasta python

updated 17 months ago • jnora0625

Hi, I have 10 fasta files (each file with 20 gene sequences from each of the 10 samples). I would like to create 20 files, specific to each gene...from 10 samples. I proceeded as follows to extract genes with the file_name in header: pyfasta extract --header --fasta test.fasta gene_name1 | awk '/^>/ {$0=$0 "_sample1"}1' > gene_name1.fasta Output: >gene_na…

pyfasta header fasta bash gene

updated 6.7 years ago • bioinfo8

I have fasta file namely `119XCA.fasta` as shown below, >cellulase ATGCTA >gyrase TGATGCT >16s TAGTATG I need to remove all the...fasta headers, keep the sequences one by one and need to write file name as a fasta header. The expected outcome is shown below...TAGTATG I have used the following script `sed '/^>/d' foo.fa > out.fa` which re…

gene sequence genome alignment next-gen

updated 3.5 years ago • Kumar

I have strange fasta headers like this for some good number of sequences, >gi|61221638|sp|P0A366.1| >gi|61221640|sp|P0A368.1|CR1AA_BACTE...I would like to replace the other (`>gi`) in the fasta header to blank or `;`. Can anyone suggest how to do it. I have many such sequences in a big fasta file

awk perl sed unix python

updated 4.7 years ago • empyrean999

Hi, pls, let me know how can i edit the fasta file header. >LR99555.1 Avo, chromosome: 1 I want this header like this. >LR99555.1

linux

updated 2.0 years ago • p

if this has been asked before, but I have a genome assembly file that I just converted from .bam to fasta format in order to start annotation. I would like to run CEGMA on this assembly, because I have concerns about the quality...but the problem is that the default header format when the fasta was created is not acceptable. This is because in the current format here are 5237924 sequences...with …

Assembly

updated 22 months ago • zgayk

I have large fasta file. As you see below there are > sign present in some fasta header like >exon2_ENST00000218032|>exon2_ENST00000218032...gt;exon17_ENST00000253024|>exon17_ENST00000253024 I want to remove the >sign from the header sequence, after remove the header is then look like this >exon2_ENST00000218032|exon2_ENST00000218032 &…

fasta

updated 3.1 years ago • harry

Hello, i want to change the fasta header of this input file : >M04631:312:000000000-C6V6K:1:2107:11495:1734 1:N:0:ACTGAGCG+TTATGCGA >M04631:312:000000000...C6V6K:1:2107:13059:1785 1:N:0:ACTGAGCG+TTATGCGA In this fasta header: >adjH001 >adjH002 >adjH.... >adjH099 >adjH100 what script do I have…

next-gen

updated 5.2 years ago • kari_vo3

I have downloaded a reference uniprotkb FASTA file. How can I only extract the FASTA headers of each gene (raw-wise) into a CSV file using R

updated 13 months ago • WUSCHEL

Hi community, I am not an expert with sed but i want to edit the headers of each sequence in a fasta file. I want to let only the gene id **>NODE_39_length_59461_cov_85.505003_1** The header

edition sed headers command fasta

updated 2.2 years ago • Candela

Hi, I would like to modify the fasta headers from a file. I would like to change: >A0A0F2M4U6|A0A0F2M4U6_SPOSC Endoplasmic reticulum chaperone BiP OS

format header fasta

updated 2.4 years ago • marcus.teixeira

NCBI, the filename with be for example GCF_006351845.1_ASM635184v1_genomic.fna and the corresponding fasta header >NZ_CP040904.1 Enterococcus faecium strain N56454 chromosome, complete genome After some formatting, all...my fasta headers are like this, for example: >NZ_CP040904.1_Ef I would like to rename my filename like this Ef_GCF_006351845.1_ASM635184v1_genomic.fna..…

sequence

updated 3.7 years ago • genomes_and_MGEs

I'm having some problems trying to change headers of hundreds of fasta files. Each fasta file is a gene sequence for several species, but for some species header is different...I'm having some problems trying to change headers of hundreds of fasta files. Each fasta file is a gene sequence for several species, but for some species header is different for each gene, for example: >EOG7B0…

fasta header

updated 2.2 years ago • Oscar

Hello Community! I know several similar questions have been asked but they all seem to want to rename their fasta headers entirely using a new variable name, or they have a separate text file with names that they would like to use to replace their sequence headers. In my case, I just want to rename the fasta headers in my very large fasta file using a chunk from the header already present. Her…

fasta

updated 4.6 years ago • FreshBio

Hello I have a lot of sequences in a FASTA file, and I want to extarct a specific sequence knowing the header ID. for example the header of a sequence is: NODE_19_length_5758_cluster_19_candidate_1...I know that with `grep` I can extract the header, but i want the below sequences to appear on stdout. How can I do this on bash

fasta bash

updated 3.3 years ago • v.berriosfarias

In this example of fasta sequence, you see there is some repeat of fasta sequence many times.for example- exon19_ENST00000194900|exon21_ENST00000194900...exon18_ENST00000194900|exon21_ENST00000194900 So I want to remove all fasta sequence which has the same header in the fasta file and keep only 1 fasta sequnece. I want to remove fasta sequence on...the basis of header not the sequence. Thanks in…

fasta header

updated 3.1 years ago • harry

Hey guys, I have a multi-fasta protein file like this >SF_hydrolase MKG... >LH_reductase MKI... >SM_hydrolase MSN... Basically, I would like to extract...only the fasta headers that have the other "reductase". I know how to extract headers that have the same headers as the ones present on...a list, but I don't know how to extract fasta-headers solely based on o…

Assembly sequencing

updated 5.1 years ago • genomes_and_MGEs

Hello Everyone Can anyone you guide me editing of the fasta header file. My fasta header file shown as below >NP_006556.1 transcriptional repressor CTCF isoform 1 [Homo sapiens

gene

updated 3.6 years ago • bioinformatics.queries

I have about 100 multiple fasta files (e.g., file.faa), which I have to rename with the species name mentioned in the fasta header. The fasta headers of these

awk sed

updated 3.9 years ago • KG

I have large fasta files containing all the sequences of some large families of receptors, each sequence is currently indicated by the...ensembl ID. I would like to change each ensembl ID header to be the gene ID. I have a list of all the corresponding IDs (ensembl - gene), is there a way to change these headers? For example...I want to change this type of ID header: >ENSMUSG000000…

fasta rename headers

updated 7.7 years ago • pendragon

So I have a director full of fasta files and I want to change the fasta header in each one by the name of their corresponding fasta file. For example: HC1993.fa...gt; X58834 CCTGCATCTGCAA HC1993.fa > HC1993 CCTGCATCTGCAA I have about 50 fasta files like that in a directory that I was to do the same thing to. I've been using this sed command for one file that works...sed '…

sequence fasta bash loop sed unix

updated 3.8 years ago • tpaisie

Hello everybody! I have a fasta file I'm looking to work with in qiime. Unfortunately, it doesn't currently meet their formatting requirements. I need...to change headers like this: >3180275|DCO_MAC_Bv6--LI09_3|40099 XXXXXXXXXXXXXXXXXXXXXX >13488354|DCO_MAC_Bv6--LD09_2_3|2 XXXXXXXXXXXXXXXXXXXXXX...gt;333430241|DCO_MAC_Bv6--LO13_8|1 XXXXXXXXXXXXXXXXXXXXXX To…

awk fasta headers sed

updated 11 months ago • Dani

I have 5000 FASTA sequences with Uniprot ids. Now, I want to add a unique identifier at the beginning of each FASTA header. An example will...And so on I want to add ABC0001 to ABC5000 at the beginning of the fasta header. And the corresponding gene name from my txt file. gopA ABC0001 A12345 gopD ABC0002 B57384 ........................ fotR ABC5000 C12345...And so on As I understand, I …

fasta perl awk

updated 10.4 years ago • bioinfo

Hi all friends, I have a large fasta file that most sequences have a identical header (they differ from the length). I usually extracted the sequences of interest...requires the Biopython library" sys.exit(0) try: fasta_file = sys.argv[1] # Input fasta file wanted_file = sys.argv[2] # Input wanted file, one gene name per line result_file = sys.argv[3] # Output fasta file …

fasta extracting identical header

updated 7.1 years ago • seta

Hello; I need to process fasta header by matching fasta description (not fasta id) with a first column in a another file with two columns and print second...column in file on to fasta header. Here are examples and what i have till now. file1.txt (list file) group_1 gene 1 group_2 gene 2 group_3 gene 3 group_4...my $input; close $infile; { local $/ = undef; …

perl unix fasta

updated 7.8 years ago • empyrean999

Hello everyone, I try to replace the headers A of a FASTA file (file.fasta) with headers B. For this, I have a list which match the headers names. >A_1 >B_1 >A_2 &gt...B_2 >A_3 >B_3 etc... I am using this loop to replace the headers: cat list | while read f ; do echo $f > temp_file A=$(awk '{print $1}…

fasta FASTA sed loop

updated 3.5 years ago • Begonia_pavonina

Hi everyone! I need help with something. I am very new to bioinformatics. I have a fasta file with 32K reference sequences for an X gene. The headers are the Accession numbers, but I need to change them for the...So I think I already did the hardest part) but now I need to combine this information and change de headers of my fasta for the GI of each sequence. I've tried with this script: ``` …

header fasta

updated 20 months ago • marcelavillegasp

I have a fasta file with the following format: >BNY.1.2.t17987.mrna1 CDS=1-1065 seq... How can I remove everything after ".mrna1" from...the headers

fasta RNA-Seq RNA transcriptome

updated 4.2 years ago • 2822462298

Hello, I have a list of headers, I need to extract the sequence from the fasta file. how can I do it? kindly let me know. The header file looks like this &gt...gt;TRINITY_DN74659_c0_g1_i1 >TRINITY_DN74659_c0_g1_i1 >TRINITY_DN74698_c0_g1_i1 fasta file looks like this >TRINITY_DN74697_c0_g1_i1 len=243 path=[221:0-242] [-1, 221, -2] GTATGTCCCACCAGACAC…

Fasta

updated 22 months ago • Princy

of interest. I am almost done with the script. But I would also like to include gene names in the fasta headers. By default, it only include corrdinates in fasta headers. Below is my script: >coords=Chr1 1000 2000 forward...gt;TTTGGGGTTATAAATTATTAGAAGTT...... I was wondering if there is a way to include the gene name in fasta header. Thanks, R

pybedtools python

updated 7.1 years ago • RT

I am a newbi for linux stuff... I would like to modify the header of fasta file. **My header is like: >100123_00010T gene=100123_00010** **And, I would like to have headers like "100123_00010

fastfile modification

updated 12 months ago • hellokwmin

Hi, I have a fasta file, which has some same headers like below. They have different sequence but same header. How can I merge them or what...should I do? I want to run orthoMCL but it requires unique headers. ``` >c12358_g1_i9 >c12358_g1_i9

genome sequence

updated 21 months ago • Mehmet

Hello, I am trying to convert my vcf files to fasta. However, after aligning to reference, vcf ID from the header disappears, and bcftools/vcftools are writing only reference...seq name in file header. Like > NC_xxxx.1 Any ideas? I run consensus script like for file in $inpath/*.vcf ; do echo $file bname=$(basename $file) echo...base name is …

consensus

updated 3.5 years ago • storm1907

Does anyone have a handy method for making a fasta header comply with the UniProt header specifications? http://www.uniprot.org/help/fasta-headers In particular, I would

sequence

updated 7.2 years ago • nickp60

Hi, I have mutiple fasta file and I want to change the header, for this I am using - awk '/^>/{print ">C1_" ++i; next}{print}' C1_pandaseq.fasta > C1_pandaseq_new.fasta...input fasta- >M03419:60:656544:1:1101:25150:3877:1 CCTACGGGTGGCTGCAGTGGGGAATTTTGGACAA >M03419:60:656544:1:1101:8498:4267:1...gt;C1_3 CCTACGGGTGGCAGCAGTGGGGAATATTGGACAATGC…

genome next-gen sequencing sequence

updated 7.2 years ago • bioinformaticssrm2011

Hi, I have protein fasta file whose headers look like '>evm.model.chr.9.52'. There are almost 30k+ proteins. I have performed functional annotations...Now, I al performing some analysis and I want to add atleast protein name or even GO term in fasta header so it would make things alot easier for me. I want something like; >evm.model.chr.9.52 GO:1234678 Can I do it with

protein fasta functional-annotation header

updated 15 months ago • ahmadjoyyia

Hello All, I have a multi fasta file with millions of sequences. I want to duplicate a part of the header and join it to the header itself with a pipe, while...another part (of the header) should be deleted. Let's say I have a fasta file, "input.fasta," which looks like this: >Gene1 wbdfwbf ATGCCGATGCAGTGACG...f 1 < input.fasta > out1.fasta` for deleting spa…

fasta headers duplicate

updated 2.1 years ago • bionix

reference genome sequence using the BWA software, and it gave me a .sam file. I used samtools SAM to FASTA to convert the aligned reads to fasta file. I want to look at assembly statistics and also evaluate completeness with...BUSCO. I received the following error: **The character "/" is present in the fasta header >A00600:204:HFMJ3DSX3:3:1101:3640:1125/1, which will crash Reader. Please…

Fasta BWA BUSCO

updated 18 months ago • hpalk42

Hello, I have a text file with thousands of unique sequences in fasta format. Each read has a header in the following format: 122391_Tcount2352_Acount2352_Bcount0_length293 It's obvious...was used as some point in the pipeline. I'm curious to see if anyone here has encountered this header format before and can tell me which part of the sequence header represents the count of reads. Thanks …

alignment

updated 5.3 years ago • genya35

I have a fasta file with hundreds of sequences and their respective headers. The headers (all of them) are in the format >ABCD [id_123...I have a fasta file with hundreds of sequences and their respective headers. The headers (all of them) are in the format >ABCD [id_123] (gene_XYZ) [protein_ijk] [protein_id=qqq] [123..899] .......seqeunce............ >…

sequence

updated 7.2 years ago • leo1985.arnab

Hello I have a fasta file with sequence headers written as ``` >0|quiver|1..2075|- >0|quiver|2210..3058|- >0|quiver|3112..4169|- ``` and so on till around

sequence fasta

updated 20 months ago • utkarsh.sood

Hi, I would like to create a new fasta file from the original genome fasta and a vcf file. The fasta file will only have full gene sequences included. I can use...o sample_SNV.fasta -V sample_SNV_selected.vcf -L ref_gene.bed But I would like the output fasta to have the gene names as the header. For instance the current fasta output from gatk is: >1 chr01:2350 AGAAAGGACAGAAAAA…

bed fasta gatk fastaalternatereferencemaker header

updated 7.8 years ago • mosquitoes

11,355 results • Page 1 of 228

Recent Votes

Comment: NGS forensics: how to know if data is fabricated

Answer: blasting genome contigs against local SILVA 16S RNA database

blasting genome contigs against local SILVA 16S RNA database

Comment: Integrate transcriptomic data and proteomics data.

Comment: MA plot of shrunken fold change

Comment: Need help for downloading arabdopsis thaliana reference genome fasta file and gt

Answer: How to find rna strand direction before alignment?

Recent Locations • All