I have a gene list with 504 genes and I want to convert them to fasta file, because I want to do alignment some NGS data with only these genes. How could I do that?
You can always extract the reads aligning to a position on the genome/genes (e.g. with samtools, bedtools, featurecounts, deeptools ... many) once all the reads are aligned to whole genome. If you remove the possibility for a read to align to a best position on the genome, it will be forced to align (or discarded) to a position where it doesn't belong. This introduces a lot of noise in your data.
I'm really curious to know what would you do in downstream analysis after aligning to only these 504 genes. Would you go for differential analysis with this set?
504 genes from a single organism? Post a few examples if you want further help.
Yes, these genes are from human.
This doesn't sound like a good idea. Why would you want to align to just these genes?
Because these genes are connecting to a specific phenotype
I don't think that reason justifies aligning just to those. If you align to a subset of the genome you will get false positive alignments. Always align NGS to the full genome.
Assuming you have the sequences and you can use R, in R you can use the write.fasta() command which is in the seqinr package.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy