Package picard.vcf
Class LiftoverVcf
- java.lang.Object
-
- picard.cmdline.CommandLineProgram
-
- picard.vcf.LiftoverVcf
-
@DocumentedFeature public class LiftoverVcf extends CommandLineProgram
Summary
Tool for "lifting over" a VCF from one genome build to another, producing a properly headered, sorted and indexed VCF in one go.Details
This tool adjusts the coordinates of variants within a VCF file to match a new reference. The output file will be sorted and indexed using the target reference build. To be clear,CommandLineProgram.REFERENCE_SEQUENCEshould be the target reference build (that is, the "new" one). The tool is based on the UCSC LiftOver tool and uses a UCSC chain file to guide its operation.
For each variant, the tool will look for the target coordinate, reverse-complement and left-align the variant if needed, and, in the case that the reference and alternate alleles of a SNP have been swapped in the new genome build, it will adjust the SNP, and correct AF-like INFO fields and the relevant genotypes.
Example
java -jar picard.jar LiftoverVcf \\ I=input.vcf \\ O=lifted_over.vcf \\ CHAIN=b37tohg38.chain \\ REJECT=rejected_variants.vcf \\ R=reference_sequence.fastaCaveats
Rejected Records
Records may be rejected because they cannot be lifted over or because of sequence incompatibilities between the source and target reference genomes. Rejected records will be emitted to theREJECTfile using the source genome build coordinates. The reason for the rejection will be stated in the FILTER field, and more detail may be placed in the INFO field.Memory Use
LiftOverVcf sorts the output using aSortingCollectionwhich relies onCommandLineProgram.MAX_RECORDS_IN_RAMto specify how many (vcf) records to hold in memory before "spilling" to disk. The default value is reasonable when sorting SAM files, but not for VCFs as there is no good default due to the dependence on the number of samples and amount of information in the INFO and FORMAT fields. Consider lowering to 100,000 or even less if you have many genotypes.
-
-
Field Summary
Fields Modifier and Type Field Description booleanALLOW_MISSING_FIELDS_IN_HEADERstatic StringATTEMPTED_ALLELESAttribute used to store the position of the failed variant on the target contig prior to finding out that alleles do not match.static StringATTEMPTED_LOCUSAttribute used to store the position of the failed variant on the target contig prior to finding out that alleles do not match.FileCHAINbooleanDISABLE_SORTstatic intEXIT_CODE_WHEN_CONTIG_NOT_IN_REFERENCEstatic StringFILTER_CANNOT_LIFTOVER_REV_COMPFilter name to use when a target cannot be lifted over.static StringFILTER_INDEL_STRADDLES_TWO_INTERVALSFilter name to use when an indel cannot be lifted over since it straddles two intervals in a chain which means that it is unclear what are the right alleles to be used.static StringFILTER_MISMATCHING_REF_ALLELEFilter name to use when a target is lifted over, but the reference allele doesn't match the new reference.static StringFILTER_NO_TARGETFilter name to use when a target cannot be lifted over.FileINPUTdoubleLIFTOVER_MIN_MATCHbooleanLOG_FAILED_INTERVALSstatic StringORIGINAL_ALLELESAttribute used to store the list of original alleles (including REF), in the original order prior to liftover.static StringORIGINAL_CONTIGAttribute used to store the name of the source contig/chromosome prior to liftover.static StringORIGINAL_STARTAttribute used to store the position of the variant on the source contig prior to liftover.FileOUTPUTbooleanRECOVER_SWAPPED_REF_ALTFileREJECTCollection<String>TAGS_TO_DROPCollection<String>TAGS_TO_REVERSEbooleanWARN_ON_MISSING_CONTIGbooleanWRITE_ORIGINAL_ALLELESbooleanWRITE_ORIGINAL_POSITION-
Fields inherited from class picard.cmdline.CommandLineProgram
COMPRESSION_LEVEL, CREATE_INDEX, CREATE_MD5_FILE, GA4GH_CLIENT_SECRETS, MAX_ALLOWABLE_ONE_LINE_SUMMARY_LENGTH, MAX_RECORDS_IN_RAM, QUIET, REFERENCE_SEQUENCE, referenceSequence, specialArgumentsCollection, TMP_DIR, USE_JDK_DEFLATER, USE_JDK_INFLATER, VALIDATION_STRINGENCY, VERBOSITY
-
-
Constructor Summary
Constructors Constructor Description LiftoverVcf()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected intdoWork()Do the work after command line has been parsed.protected ReferenceArgumentCollectionmakeReferenceArgumentCollection()-
Methods inherited from class picard.cmdline.CommandLineProgram
checkRInstallation, customCommandLineValidation, getCommandLine, getCommandLineParser, getCommandLineParserForArgs, getDefaultHeaders, getFaqLink, getMetricsFile, getPGRecord, getStandardUsagePreamble, getStandardUsagePreamble, getVersion, hasWebDocumentation, instanceMain, instanceMainWithExit, parseArgs, requiresReference, setDefaultHeaders, useLegacyParser
-
-
-
-
Field Detail
-
INPUT
@Argument(shortName="I", doc="The input VCF/BCF file to be lifted over.") public File INPUT
-
OUTPUT
@Argument(shortName="O", doc="The output location for the lifted over VCF/BCF.") public File OUTPUT
-
CHAIN
@Argument(shortName="C", doc="The liftover chain file. See https://genome.ucsc.edu/goldenPath/help/chain.html for a description of chain files. See http://hgdownload.soe.ucsc.edu/downloads.html#terms for where to download chain files.") public File CHAIN
-
REJECT
@Argument(doc="File to which to write rejected records.") public File REJECT
-
WARN_ON_MISSING_CONTIG
@Argument(shortName="WMC", doc="Warn on missing contig.", optional=true) public boolean WARN_ON_MISSING_CONTIG
-
LOG_FAILED_INTERVALS
@Argument(shortName="LFI", doc="If true, intervals failing due to match below LIFTOVER_MIN_MATCH will be logged as a warning to the console.", optional=true) public boolean LOG_FAILED_INTERVALS
-
WRITE_ORIGINAL_POSITION
@Argument(doc="Write the original contig/position for lifted variants to the INFO field.", optional=true) public boolean WRITE_ORIGINAL_POSITION
-
WRITE_ORIGINAL_ALLELES
@Argument(doc="Write the original alleles for lifted variants to the INFO field. If the alleles are identical, this attribute will be omitted.", optional=true) public boolean WRITE_ORIGINAL_ALLELES
-
LIFTOVER_MIN_MATCH
@Argument(doc="The minimum percent match required for a variant to be lifted.", optional=true) public double LIFTOVER_MIN_MATCH
-
ALLOW_MISSING_FIELDS_IN_HEADER
@Argument(doc="Allow INFO and FORMAT in the records that are not found in the header", optional=true) public boolean ALLOW_MISSING_FIELDS_IN_HEADER
-
RECOVER_SWAPPED_REF_ALT
@Argument(doc="If the REF allele of the lifted site does not match the target genome, that variant is normally rejected. For bi-allelic SNPs, if this is set to true and the ALT allele equals the new REF allele, the REF and ALT alleles will be swapped. This can rescue some variants; however, do this carefully as some annotations may become invalid, such as any that are alelle-specifc. See also TAGS_TO_REVERSE and TAGS_TO_DROP.", optional=true) public boolean RECOVER_SWAPPED_REF_ALT
-
TAGS_TO_REVERSE
@Argument(doc="INFO field annotations that behave like an Allele Frequency and should be transformed with x->1-x when swapping reference with variant alleles.", optional=true) public Collection<String> TAGS_TO_REVERSE
-
TAGS_TO_DROP
@Argument(doc="INFO field annotations that should be deleted when swapping reference with variant alleles.", optional=true) public Collection<String> TAGS_TO_DROP
-
DISABLE_SORT
@Argument(doc="Output VCF file will be written on the fly but it won\'t be sorted and indexed.", optional=true) public boolean DISABLE_SORT
-
EXIT_CODE_WHEN_CONTIG_NOT_IN_REFERENCE
public static int EXIT_CODE_WHEN_CONTIG_NOT_IN_REFERENCE
-
FILTER_CANNOT_LIFTOVER_REV_COMP
public static final String FILTER_CANNOT_LIFTOVER_REV_COMP
Filter name to use when a target cannot be lifted over.- See Also:
- Constant Field Values
-
FILTER_NO_TARGET
public static final String FILTER_NO_TARGET
Filter name to use when a target cannot be lifted over.- See Also:
- Constant Field Values
-
FILTER_MISMATCHING_REF_ALLELE
public static final String FILTER_MISMATCHING_REF_ALLELE
Filter name to use when a target is lifted over, but the reference allele doesn't match the new reference.- See Also:
- Constant Field Values
-
FILTER_INDEL_STRADDLES_TWO_INTERVALS
public static final String FILTER_INDEL_STRADDLES_TWO_INTERVALS
Filter name to use when an indel cannot be lifted over since it straddles two intervals in a chain which means that it is unclear what are the right alleles to be used.- See Also:
- Constant Field Values
-
ORIGINAL_CONTIG
public static final String ORIGINAL_CONTIG
Attribute used to store the name of the source contig/chromosome prior to liftover.- See Also:
- Constant Field Values
-
ORIGINAL_START
public static final String ORIGINAL_START
Attribute used to store the position of the variant on the source contig prior to liftover.- See Also:
- Constant Field Values
-
ORIGINAL_ALLELES
public static final String ORIGINAL_ALLELES
Attribute used to store the list of original alleles (including REF), in the original order prior to liftover.- See Also:
- Constant Field Values
-
ATTEMPTED_LOCUS
public static final String ATTEMPTED_LOCUS
Attribute used to store the position of the failed variant on the target contig prior to finding out that alleles do not match.- See Also:
- Constant Field Values
-
ATTEMPTED_ALLELES
public static final String ATTEMPTED_ALLELES
Attribute used to store the position of the failed variant on the target contig prior to finding out that alleles do not match.- See Also:
- Constant Field Values
-
-
Method Detail
-
makeReferenceArgumentCollection
protected ReferenceArgumentCollection makeReferenceArgumentCollection()
- Overrides:
makeReferenceArgumentCollectionin classCommandLineProgram
-
doWork
protected int doWork()
Description copied from class:CommandLineProgramDo the work after command line has been parsed. RuntimeException may be thrown by this method, and are reported appropriately.- Specified by:
doWorkin classCommandLineProgram- Returns:
- program exit status.
-
-