Hey everyone,
I want to add a customized sequence to the fasta file of my reference genome. So, I concatenated both files:
cat Homo_sapiens.GRCh38.dna.primary_assembly.fa Gene_mod.fa > HSapiens_Ensembl111mod.fa
In the Gene_mod.fa, the header of the sequence is similar to the ones found in the fasta:
>AddedSeq dna:scaffold scaffold:GRCh38:AddedSeq:1:2913:1 REF
Afterwards, to subset the file using samtools faidx for chromosome 3 and the AddedSeq, using the command:
samtools faidx HSapiens_Ensembl111mod.fa 3 AddedSeq >HSapiens.GRCh38_Chr3_AddedSeq.fa
it says it failed to retrieve AddedSeq.
Is there any problem with my code? I have used this fasta file for other purposes (like STAR) and it gave me no issues or problems.
Cheers
I have removed all of that text and it solved the problem, thanks!
The structure of the header was similar when compared with the
dna.primaryassembly.faI got from Ensembl. In addition, when using this structure and a.gtffile to match, it seemed to worked fine...