VQSR for single sample exome/targeted regions
Hi, I read on the best practices slides that I should not use VQSR if the cohort is small. I only one sample for a single individual. I was wondering how useful it is to perform VQSR on this sample by...
View ArticleVQSR error - no MQ annotation detected
Hi, I am trying to run VQSR and an error occurred. Here are my commands java -Xmx240g -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator \ -R /ref/ucsc.hg19.fasta \ -input input_raw.vcf \ -recalFile...
View ArticleNo false positives in VQSR tranche plot
I'm doing a large variant calling project on a cohort of ~10,000 exomes. I've run into an issue with VQSR. Everything appears to be working normally except for my output tranche plot (attached), where...
View ArticleHow to interpret Gaussian mixture model plots
Hi, I ran VQSR with 150+ whole genome samples. Attached is one of the Gaussian mixture model plots. I have read the VQSR guide on how to interpret the plots but I am not quite understand. Referring to...
View ArticleVSQR resource format: which elements of vcf are needed?
Dear GATK team, I am working with maize aDNA and would like to find SNPs called in aDNA samples that are at least as good as those in HapMap. Am right to assume that variant recalibration is the...
View ArticleHaving problem with VQSR step
Hi there. I'm Lynn, doing the whole genome sequencing for human. I'm currently using GATK for variant call and now having problems for VQSR step. Here is the command I entered:...
View ArticleVariantRecalibrator SNP and INDEL failure rate
Hi I've been struggling with some issues we have been having with the VariantRecalibrator. Here's the story. We run VariantRecalibrator on our new sample in combination with the previous ones (gVCF...
View ArticleVQSR failed with "No data found" with whole genome variant calls, but...
Hi, I ran VQSR on the SNP calls from a WGS sample mapped to b37+decoy reference, it failed at the training step with error message "No data found". I then removed the SNP calls on the small contigs...
View ArticleVQSR with missing annotation fields
Hi, I am calling variants (non-model organism) following the best practice workflow. After haplotypecaller (with GVCF) and GenotypeGVCFs, I want to perform VQSR (separately for SNPs and INDELs) to the...
View ArticleQuestion related to VQSR
Hi Everyone, It might be very basic but I just want to reach some clarifications what i understand after applying VQSR steps to WGS sequencing data. For SNPs I've set tranches as described in the best...
View ArticleDo strand flips in the dbSNP or training file cause problems for BQSR, Mutect...
Our group is working on putting together a file of known germline variants in a non-model organism. While we have a large set of known variants, my colleague has noted that some number of these are...
View ArticleCombineGVCFs subsampling questions
Hi GATK! I want to merge ~3000 HC outputs into one large cohort. However, even I run it directly by scattering on 30M genome chunk, it would still take a long time to compute. So I think I should first...
View ArticleWarning message from VariantDataManager
Hello, I notice the following warning messages during the first step of VQSR: ------------------------------------------------------------------------------------------ Done. There were 3 WARN...
View ArticleHow VQSR deals with multiallelic SNPs and Indel
Hi, May I know how VQSR deals with multiallelic SNPs and indels? How to classify them as pass or fail?
View ArticleVQSR - ./. genotypes retained after VQSR filtering
We have generated a set of variant calls based on the GRCh38 pipeline described in GATK using GATK version 3.3. We observed that many calls made on the ALT contigs had the genotype call "./." . On...
View ArticleVariant recalibration (WQSR) after pooled calling
Hi - we have whole genome samples consisting of non-barcoded DNA from 50 diploid individuals with 80-120x coverage. We use version 3.5-0-g36282e). We align, then use BQSR. For variant calling, we...
View ArticleVQSR --maxGaussians paramater
Hi, I am performing VQSR (GATK 3.7) using the SNP model on individual chromosomes on hundreds of WGS data. However, for some chromosomes, it ran without a problem using the --maxGaussians default,...
View ArticleVQSR Resources for Indels
Hi, I noticed that on the page for setting the right arguments for VQSR you mentioned that Mills and dbSNP should be used as resources for INDEL variant recalibration. However, at the bottom of the...
View ArticleI have 50 exome samples belong to 25 families. Do I run GenotypeVCFs on...
We have exome sequenced data for 50 samples in total for a cardiac disease. But they have been sequenced in different batches. Even some of the batches were 2 years old. We have relationship...
View ArticleLow coverage loci - GATK pipeline
Hi GATK team, I am posting this question for everyone's benefit as it will shed more light on how HaplotypeCaller and other GATK programs deal with low coverage positions. For the sake of this example,...
View ArticleSuggestions for WGS 5X Sequences
Hi Geraldine or Sheila, I am in the process of customizing a GATK pipeline for processing aDNAA. I have processed a couple of 3000 year old WGS sequences so far using GATK best practices, and although...
View ArticleWhat is purpuse of multiple True Sites in VQSR
I have 3 questions: 1- What is the exact purpose of having both HapMap and Omni True Sites in VQSR, vs just one; 2- If I want to restrict the variant calling to my custom list of positions. Which of...
View ArticleMy VQSR tranches-plot shows cumulative variants in tranch 0-90, 90-99, 99-99.9
Dear GATK-Team, My VQSR tranches-plot (exome data) shows cumulative variants in tranch 0-90, 90-99, 99-99.9. To my understanding it should be the other way round (like in your article link). My tranch...
View ArticleVQSR error
I used to run VQSR using the following command. Approximately for 400 samples it worked very well. But for the first time I am getting an error while doing VQSR by adding few more sample with old ones....
View ArticleVariantRecalibrator error
Dear GATK Team, I'm trying to perform variant recalibration on 3 WGS sequencing data. I am following the pipeline described in Best Practices. I have generated individual g.vcf files for each patient...
View ArticleVQSR on specific genomic region
Dear GATK Team, I have exome-data of many individuals (>2000) called with the HaplotypeCaller, but only of a specific set of genes from the genome. I would like to apply the VQSR-tool to recalibrate...
View ArticleCan't use VQSR on non-model organism or small dataset
The problem: Our preferred method for filtering variants after the calling step is to use VQSR, a.k.a. recalibration. However, it requires well-curated training/truth resources, which are typically not...
View ArticleVQSR and VariantAnnotator on Samtools VCFs
Hi everyone! My goal is to run VQSR on VCFs generated with samtools mpileup. According to GATK best practices first i have to run VariantAnnotator on each of my VCFs in order to do that. here's the...
View ArticleWGS+WES combined discovery/genotyping
Hi GATK team, Hope you had great holidays! We're analyzing small families where some individual have been sequenced by WES (HiSeqX) and others by WES (HiSeq4000). Could you please advise on the best...
View ArticleVariantAnnotator is not annotating variants with InbreedingCoeff
Hi, I am using GATK VariantAnnotator to annotate my VCF with the InbreedingCoeff but when I check the output VCF I see that no variant was annotated with the InbreedingCoeff. I've used a pedigree file...
View ArticleVQSR / CNN filtering for small (~100) gene panels
Hello, I'm trying to perform germline variant calling on a panel with ~100 genes. I was wondering what the bare minimum (in terms of sample size) would be for variant filtering via VQSR. If the sample...
View ArticleToo many (?) variants detected by joint genotyping of 8232 exomes
Hello, I am about to finish analyzing 8232 exome samples. I have used GATK 3.8 and 3.6 throughout my workflow, and followed the best practices guideline. After making variant calling by running...
View ArticlegenotypeGVCFs call confidence and emit thresholds
We have ~10,000 WES samples and generated gVCFs for each using HC. To generate a multi-sample consensus VCF, we performed joint genotyping using genotypeGVCFs. Subsequently we performed VQSR analysis...
View Articleerror with VariantRecalibrator java.lang.IllegalArgumentException: No data...
Dear GATK Team, I have one whole genome data called with the HaplotypeCaller. I would like to apply the VariantRecalibrator to recalibrate my variant set, but I get back an error as follows: INFO...
View ArticleVariant Quality Score Recalibration (VQSR)
VQSR stands for Variant Quality Score Recalibration. In a nutshell, it is a sophisticated filtering technique applied on the variant callset that uses machine learning to model the technical profile of...
View ArticleDo you need to do Variant Quality Score Recalibration when calling somatic...
Hi, I am currently working to call somatic variants from tumour samples with matched normal pairs from the same patient. I have carried out all of the steps in this tutorial:...
View ArticleVQSR --recal-file not reconized
In gatk 4.0.6.0 is the --recal-file option not required or has it been changed? Applyvqsr uses this recal-file, but will that throw an error? Thank you very much . gatk VariantRecalibrator \ -R...
View ArticleApplying VQSR to the Raw VCF vs Filtered VCF
Hi, I am working on a germline WES dataset with ~450 samples, all the variants are called following an adapted version of GATK Best Practices, using GATK 4.0.3. My question is about at which step we...
View ArticleVariant recalibration tranche plot gives a very high number of novel variants...
Hi, My question concerns the VQSR step. I am using GATK version 3.7 on 350 human WES samples. After the calling with HaplotypeCaller I have used the VariantRecalibrator function with the following...
View ArticleError occur on VariantRecalibrator : Malformed floating point valueprior
The problem is about VQSR. I have no idea how to fix this error: A USER ERROR has occurred: Unknown file is malformed: Malformed floating point valueprior The input vcf file is generate by...
View ArticleRecommendations for calling on and flitering ~100 low coverage samples
I have about 80-100 population-specific WGS samples with coverage around 5-10X. What modifications would you recommend in GATK best practices to suit variant calling (and VQSR) ? Also, what are the...
View ArticleSeeking help debugging a VQSR tranche plot
Dear GATK Team, I have received data from ~ 35k whole exome sequences from a collaborator and while proceeding with variant filtering I noticed that the VQSR tranche plot looked abnormal. Contrary to...
View Articlesplit multiallelic variants before VQSR and CNNScoreVariants, gatk team opinion
Hi, if I remember well I saw in the forum a user to suggest to split the multiallelic variants before VQSR. I think that is something logical (I never done before), but I would like to know the opinion...
View ArticleExcessHet filtering in cohorts with family members
This post mentions that the first step in Best Practice VQSR filtering involves hard filtering on ExcessHet. The post also states that "ExcessHet filtering applies only to callsets with a large number...
View ArticleWhat is the difference between --truth-sensitivity-tranche and...
I'm using GATK v4.0.3.0. I'm wanting to use the recommended ApplyVQSR --ts-filter-level values, as specified at the end of GATK's document #1259 (albeit this document was written for GATK3, but I...
View ArticleA way to come up with "truth set" to use VQSR
Dear GATK Team, I have a question regarding finding cutoffs for hard filtering. I am working with yeast for which we do not have a good true variation set. I am following the best practices and have...
View ArticlegnomAD in VQSR
Hi, I am not sure why no one asked this before but I need help please as I couldn't find sufficient info: Is it recommended to use gnomAD variants database as known, training and truth set with...
View ArticleVQSR filtering and dbSNP
Hi, For VQSR filtering, it's assumed that all calls made by HaplotypeCaller or MuTect2 are put into VQSR? Polymorphic sites should only be culled AFTER the vcf is annotated with VQSLOD scores, and a...
View ArticleWeird VQSR filtering pattern
Hi I performed HC joint calling against 300 normal tissue samples. I did every step according to the best practice and got a VQSR plot very similar to this one. Most variants have the same MQ and FS...
View ArticleAre there issues with using reads coming from different technologies and...
Hello! We are analyzing a WGS data of 60 samples (6 groups, 10 samples/group) produced by HiSeq4000. The mean coverage per sample is 25x (lowest sample is 15x). Now we realized we need to sequence more...
View Article