Entering edit mode
3.6 years ago
harry
▴
30
I have cDNA.fa file and i want to remove all header except the ENST name so how can i remove it. I don't want loose any cDNA sequence .In advance thanks
>ENST00000390567.1 cdna chromosome:GRCh38:14:105881034:105881053:-1 gene:ENSG00000211907.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD1-2
6 description:immunoglobulin heavy diversity 1-26 [Source:HGNC Symbol;Acc:HGNC:5485]
GGTATAGTGGGAGCTACTAC
>ENST00000452198.1 cdna chromosome:GRCh38:14:105881539:105881556:-1 gene:ENSG00000225825.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD6-2
5 description:immunoglobulin heavy diversity 6-25 [Source:HGNC Symbol;Acc:HGNC:5516]
GGGTATAGCAGCGGCTAC
>ENST00000390569.1 cdna chromosome:GRCh38:14:105883903:105883922:-1 gene:ENSG00000211909.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD5-2
4 description:immunoglobulin heavy diversity 5-24 (non-functional) [Source:HGNC Symbol;Acc:HGNC:5510]
GTAGAGATGGCTACAATTAC
>ENST00000437320.1 cdna chromosome:GRCh38:14:105884870:105884888:-1 gene:ENSG00000227196.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD4-2
3 description:immunoglobulin heavy diversity 4-23 (non-functional) [Source:HGNC Symbol;Acc:HGNC:5504]
TGACTACGGTGGTAACTCC
>ENST00000390571.1 cdna chromosome:GRCh38:14:105886031:105886061:-1 gene:ENSG00000211911.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:IGHD3-2
2 description:immunoglobulin heavy diversity 3-22 [Source:HGNC Symbol;Acc:HGNC:5497]
GTATTACTATGATAGTAGTGGTTATTACTAC
I want look my file like this---
>ENST00000390567.1
GGTATAGTGGGAGCTACTAC
>ENST00000452198.1
GGGTATAGCAGCGGCTAC
If the headers are single line and sequences have no space in between, try this:
awk -F " " '{print $1}' test.fa
orsed '/^>/ s/\s.*//g' test.fa