Hello All,
I have multiple fasta files. I want to make output files for each chromosome from all the fasta files. For example, output file all_ch1.fasta will have ch1 sequences from all the fasta files and so on. I tried:
samtools faidx *fasta.gz ch1 > all_ch1.fasta
But I am getting this error:
[W::fai_get_val] Reference sample2.fasta.gz not found in FASTA file, returning empty sequence
[faidx] Failed to fetch sequence in sample2.fasta.gz
I checked sample2.fasta.gz file but it is not empty. Thank you for any help!
if this format is correct,
ch1 is sample1_rhg1.0ch1 in sample1.fasta.gz and ch1 is sample2_rhg1.0ch1 in sample2.fasta.gz, try:with seqkit (dry-run)
New files would be in
stdin.splitdirectory. File names would bestdin.id_ch1.fasta,stdin.id_ch2.fastafor eachchand each fasta sequence name will be exactly as it is in eachchfasta. For eg.>sample1_rhg1.0ch1and>sample2_rhg1.0ch1forch1. Download seqkit from https://bioinf.shenwei.me/seqkit/download/. Removedfrom-dionce you are okay with dry-run output.without seqkit (assuming that sequences are flattened and all files have equal number
chentities)Files would be named
ch1.fasta,ch2.fastaetc.