Question: Good Methodology To Name Rna-Seq Files
3
gravatar for Nicolas Rosewick
5.8 years ago by
Belgium, Brussels
Nicolas Rosewick7.6k wrote:

Hi,

I work on a lot of RNA-Seq samples and I wonder what is the best way to name the RNA-Seq files (Raw fastq, bam files,...) I have samples from several different species, different type of libraires (strandes, unstranded), different read length (50,76,100,...) and of course the date of the run.

Do you think a name like :

fastq : SampleName_date_species_library_readLength.fastq

bam : SampleName_date_species_library_readLength_alignment.bam

...

or do you think there is a better way to name files..

Thanks

N.

file rna-seq • 2.0k views
ADD COMMENTlink modified 5.8 years ago by André Rendeiro50 • written 5.8 years ago by Nicolas Rosewick7.6k
3
gravatar for Jelena Aleksic
5.8 years ago by
Cambridge, UK
Jelena Aleksic900 wrote:

I would also add the genome release. One of the most common sources of bioinfo errors is assuming you're working from the wrong release... Also annotation release where relevant.

ADD COMMENTlink written 5.8 years ago by Jelena Aleksic900
3
gravatar for André Rendeiro
5.8 years ago by
Vienna, Austria
André Rendeiro50 wrote:

I think naming conventions are always useful and essential when multiple persons are working on the same data. Nevertheless, the convention itself just needs to be one that everybody agrees to, provided that it is clear enough.

In my previous group we sequenced RNA but also a lot on chromatin modifications (ChIP-seq). Our naming convention was similar to your example, but we'd put the library right after the sample number, for more immediate recognition. We used mostly the same organism so we wouldn't add that tag to the file name, but add the developmental stage at which the run was performed. Read length was also mostly the same up to a point where we started decided to add it later for clarity. Transformations on the original file were generally added dot-separated, as such:

  • sampleNumber_library_stage_(readLength)_date.fastq
  • sampleNumber_library_stage_(readLength)_date.aligner.bam
  • sampleNumber_library_stage_(readLength)_date.aligner.q30.bam

Anyway, sometimes you work so much in a project that almost everyone knows which sample has what just by the sample number.

Hope this was helpful.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by André Rendeiro50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1381 users visited in the last hour