Question: Good Methodology To Name Rna-Seq Files
5.8 years ago by
Belgium, Brussels
Nicolas Rosewick wrote:


I work on a lot of RNA-Seq samples and I wonder what is the best way to name the RNA-Seq files (Raw fastq, bam files,...) I have samples from several different species, different type of libraires (strandes, unstranded), different read length (50,76,100,...) and of course the date of the run.

Do you think a name like :

fastq : SampleName_date_species_library_readLength.fastq

bam : SampleName_date_species_library_readLength_alignment.bam


or do you think there is a better way to name files..



5.8 years ago by
Cambridge, UK
Jelena Aleksic wrote:

I would also add the genome release. One of the most common sources of bioinfo errors is assuming you're working from the wrong release... Also annotation release where relevant.

ADD COMMENTlink written 5.8 years ago by Jelena Aleksic900
5.8 years ago by
Vienna, Austria
André Rendeiro wrote:

I think naming conventions are always useful and essential when multiple persons are working on the same data. Nevertheless, the convention itself just needs to be one that everybody agrees to, provided that it is clear enough.

In my previous group we sequenced RNA but also a lot on chromatin modifications (ChIP-seq). Our naming convention was similar to your example, but we'd put the library right after the sample number, for more immediate recognition. We used mostly the same organism so we wouldn't add that tag to the file name, but add the developmental stage at which the run was performed. Read length was also mostly the same up to a point where we started decided to add it later for clarity. Transformations on the original file were generally added dot-separated, as such:

  • sampleNumber_library_stage_(readLength)_date.fastq
  • sampleNumber_library_stage_(readLength)_date.aligner.bam
  • sampleNumber_library_stage_(readLength)_date.aligner.q30.bam

Anyway, sometimes you work so much in a project that almost everyone knows which sample has what just by the sample number.

