I am trying to generate gene counts using featureCounts v2.0.1. The syntax I used is a simple one, as shown below:
featureCounts -T 2 -a MtrunA17r5.0-ANR-EGN-r1.7.gtf -F GTF -o A17_T00_1_BASIC1_fC.txt A17_T00_1.bam
The summary file reads as follows:
cat A17_T00_1_BASIC1_fC.txt.summary Status A17_T00_1.bam Assigned 13377184 Unassigned_Unmapped 0 Unassigned_Read_Type 0 Unassigned_Singleton 0 Unassigned_MappingQuality 0 Unassigned_Chimera 0 Unassigned_FragmentLength 0 Unassigned_Duplicate 0 Unassigned_MultiMapping 1625691 Unassigned_Secondary 0 Unassigned_NonSplit 0 Unassigned_NoFeatures 334969 Unassigned_Overlapping_Length 0 Unassigned_Ambiguity 2112828
When I peeked into the actual counts file, I noticed some weird entries in column 1 instead of gene_IDs, as shown below. Only 49 out of 52724 lines contain such strange entries.
gene:MtrunA17Chr5g0436531 0 63135360-95f3-44e9-8358-c7975fb56306 92 gene:MtrunA17Chr5g0436551 0 . . gene:MtrunA17Chr6g0458631 0 64238042-c1c7-40e1-b07d-e5050f9b567e 0 gene:MtrunA17Chr6g0458661 0 . . gene:MtrunA17Chr7g0236431 0 25191532-34dd-4669-9d7b-a20c59e60855 1 gene:MtrunA17Chr7g0236441 0
So I went back to the BAM input file and attempted it's validation with Picard's ValidateSamFile, on a SLURM-based HPCC, as shown below. As visible, it gave me 1 ERROR, and also 49 warnings. Only warning #1 and warning #49 are shown below, rest being similar.
srun --partition=high --time=1:00:00 --mem=24000 java -jar picard.jar ValidateSamFile I=A17_T00_1.bam MODE=VERBOSE srun: job 22816732 queued and waiting for resources srun: job 22816732 has been allocated resources [Thu Jun 11 20:16:35 PDT 2020] picard.sam.ValidateSamFile INPUT=A17_T00_1.bam MODE=VERBOSE MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Thu Jun 11 20:16:35 PDT 2020] Executing as aksrao@c8-62 on Linux 4.15.0-99-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_20-b26; Picard version: 2.7.1-SNAPSHOT ERROR: Read groups is empty WARNING: Read name HWI-ST797:117:D091UACXX:4:2203:11139:169433, A record is missing a read group WARNING: Record 1, Read name HWI-ST797:117:D091UACXX:4:2203:11139:169433, NM tag (nucleotide differences) is missing . . WARNING: Record 49, Read name HWI-ST797:117:D091UACXX:4:1107:17611:90721, NM tag (nucleotide differences) is missing WARNING: Read name HWI-ST797:117:D091UACXX:4:2102:6277:14310, A record is missing a read group Maximum output of  errors reached. [Thu Jun 11 20:16:35 PDT 2020] picard.sam.ValidateSamFile done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=1012924416 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp srun: error: c8-62: task 0: Exited with exit code 1
Here are questions that I have for forum members. Thanks in advance!
1. Is the
srun: error: c8-62: task 0: Exited with exit code 1 message indicative that PicardTools'
ValidateSamFile execution is not proper? From this GATK discussion board post, and the corresponding explanation for return codes, it does not appear
exit code 1 is for PicardTools execution per se, but I'm not sure?!
2. Regardless of any SLURM or installation problems, how do I fix my BAM file so error and warnings in
ValidateSamFile do not recur, and the weird entries in
featureCounts gene counts also do not recur. Looks like they are both related, i.e. those 49 warnings in
ValidateSamFile for my BAM input, probably each correspond to the 49 weird entries in my
featureCounts gene counts output ?
edit: I suppose it should suffice to follow instructions at this Broad GitHub Picard page?
3. If it is relevant, my BAM file was generated by
STAR mapping, and I know my FASTQ input files were all OK, based on
FastQValidator results. An additional detail is that I ran pass 1 to collect new Splice Junctions for 100+ libraries, concatenated them all, and used them again in pass 2 to obtain final BAM file to use with
edit - Should I look at the STDERR files captured for 1st and 2nd pass of my STAR mapping runs, for anything specific causing header errors in the BAM output files?
4. I've looked into only one of these 100+ BAM files, and only after pass 2 (as reported in this post). I've not looked at any other BAM files from pass 2, or any at all from pass 1. This is because I'm not sure if my
picard.jar ValidateSamFile is even executing OK.
Please let me know if I should provide any more information to help you help me. Thanks again.