Quantcast
Channel: vqsr — GATK-Forum
Viewing all articles
Browse latest Browse all 326

VQSR on ~500 genomes

$
0
0

Hi,

I am working on VQSR step (using GATK 2.8.1) on variants which have been called by UG from ~500 whole genomes of cattle . I run VariantRecalibrator as following:

${JAVA} ${GATK}/GenomeAnalysisTK.jar -T VariantRecalibrator \
-R ${REF} -input ${OUTPUT}/GATK-502-sorted.full.vcf.gz \
-resource:HD,known=false,training=true,truth=true,prior=15.0  HD_bosTau6.vcf \
-resource:JH_F1,known=false,training=true,truth=false,prior=10.0  F1_uni_idra_pp_trusted_only_LMQFS_bosTau6.vcf \
-resource:dbsnp,known=true,training=false,truth=false,prior=6.0  BosTau6_dbSNP138_NCBI.vcf \
-an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an DP -an HaplotypeScore \
-mode SNP \
-recalFile ${OUTPUT}/gatk_502_sorted_fixed.recal \
-tranchesFile ${OUTPUT}/gatk_502_sorted_fixed.tranches \
-rscriptFile ${OUTPUT}/gatk_502_sorted_fixed.plots.R

HD_bosTau6.vcf : ~770k markers on Illumina bovine high-density chip array

F1_uni_idra_pp_trusted_only_LMQFS_bosTau6.vcf : ~5.4M SNPs

The tranches pdf I got looks really weird, please check the attached file.
imageimage

Then I tried to vary the 'prior' score of trainning VCF, and also supply additional VCF file from another project as training datasets. And I still got the similar tranches graph as above. e.g.:

-resource:HD,known=false,training=true,truth=true,prior=15.0  HD_bosTau6.vcf 
-resource:JH_F1,known=false,training=true,truth=false,prior=12.0  F1_uni_idra_pp_trusted_only_LMQFS_bosTau6.vcf 
-resource:DN,known=false,training=true,truth=false,prior=12.0  HC-Plat-FB.3in3.vcf.gz 
-resource:dbsnp,known=true,training=false,truth=false,prior=6.0  BosTau6_dbSNP138_NCBI.vcf 

HC-Plat-FB.3in3.vcf.gz : ~ 14M markers

It is worthy to mention that I have done VariantRecalibrator step with the same parameters and training sets on another 50 whole genomes very recently, and it worked fine. Actually I have done VariantRecalibrator on the 500 animals before when I accidentally took a unfiltered VCF called by UG as training set. Surprisingly, I got good tranches graph that time, similar to the graph posted on GATK best practice. Do you have any suggestion for me?

Thanks,


Viewing all articles
Browse latest Browse all 326

Trending Articles