Entering edit mode
6.4 years ago
misterie
▴
110
I have assigned 20 amplicons (~13,000 bp together) generated for 350 individuals to three 96-well plates for a variant calling analysis. So I have 3 directories with 96 x 2 fastq files (paired-end) in each directory. Summarizing 3 x 96 x 2 fastq files.
How can I identify my samples. I have done quality control analysis (FastQC) but I think I should do alignment for every individual separately.
Can you help me with describing my data set? I have never work with dataset with well plates.
Samples have been sequenced using Nextera XT.
You have no information about key-pairs for Samples = Indexes?
I have file SampleSheet.csv in Plate1 directory containing information about Sample ID, Sample Name, Sample Plate, Sample well, i7 index id, index, i5 index id, index. But there are only 96 rows, not 350...
you say that you have a sample sheet in the Plate1 directory, but should you then not also have a samplesheet in the Plate2 and Plate3 directory?
I have only one sample sheet... maybe Fastq files are demultiplexed and I have to separate sth but I do not know how...
If possible you should ask the lab. You can demultiplex with cutadapt or sabre... Mostly demultiplexing is done during basecalling based on illumina tag. But how many files do you have? In your question you say
So that means you have all the samples right?