ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data
ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others). Given a list of variants with chromosome, start position, end position, reference nucleotide and observed nucleotides, ANNOVAR can perform:
SUMMARIZE_ANNOVAR is a script within the ANNOVAR package that is very popular among users. Given a list of variants from whole-exome or whole-genome sequencing, it will generate an Excel-compatible file with gene annotation, amino acid change annotation, SIFT scores, PolyPhen scores, LRT scores, MutationTaster scores, PhyloP conservation scores, GERP++ conservation scores, dbSNP identifiers, 1000 Genomes Project allele frequencies, NHLBI-ESP 5400 exome project allele frequencies and other information.
In a modern desktop computer (3GHz Intel Xeon CPU, 8Gb memory), for 4.7 million variants, ANNOVAR requires ~4 minutes to perform gene-based functional annotation, or ~15 minutes to perform stepwise "variants reduction" procedure, making it practical to handle hundreds of human genomes in a day.
BIOBASE is responsible for the exclusive worldwide marketing and distribution of the ANNOVAR tool to commercial users. ANNOVAR will be distributed stand alone, and as a complement to Genome Trax™, which includes data from HGMD® and TRANSFAC®. With ANNOVAR and Genome Trax™ combined, users can identify and annotate known disease causing inherited mutations in whole-genome or whole-exome data sets.
: 2013May08: New ANNOVAR version is available. The most important change is the replacement of summarize_annovar by table_annovar (instruction here), which allows better flexibility for users to specify annotation tasks.
: 2013Apr08: COSMIC64 is uploaded, use -downdb cosmic64 to use.
: 2013Mar07: COSMIC63 is uploaded, use -downdb cosmic63 to use. It includes both coding and non-coding variants, and doubles the size for version 61.
: 2013Feb21: New ANNOVAR version is available, which fixed a bug that exonic variants at exon end were annotated as splicing when -exonicsplicing is not set. Registered users will get an email notification on Feb 27, as an email server issue has caused this delay. But as usual, whenever you do a "annotate_variation.pl -downdb null ." you will know if new version is available.
: 2013Feb11: New ANNOVAR version is available. Registered users will get an email with download links soon. Changes include: mitochondria genome is now supported, the -zerostart argument is no longer supported, better handling of GFF3 files with undefined scores, added -gff3attr argument so that attribute field from GFF3 file can be printed in output, changed summarize_annovar.pl to take -alltranscript argument to print out all isoforms for exonic variants, summarize_annovar.pl now takes esp6500si and snp137NonFlagged as databases, exonic variant near intron/exon boundary are no longer reported as splicing, unless -exonicsplicing is set, fixed a minor issue in finding tar program in BSD-derived operating system, convert2annovar.pl now handles *.gz file or handles stdin as input file name, convert2annovar.pl accepts -comment argument to keep comment lines in VCF4 file in output.
: 2013Jan24: The updated summarize_annovar.pl can take arguments such as "-verdbsnp 137NonFlagged -veresp 6500si".
: 2013Jan22: The ESP6500si database is updated, to fix a bug in annotating insertions (previously there was a one-bp error in position for insertions when reference allele is one single base) .
: 2013Jan07: The dbSNP version 137 is available from ANNOVAR now! Use keyword snp137 to download and annotate. The COSMIC version 61 is available from ANNOVAR now! It helps cancer researchers identify if their somatic mutations have been previously observed, how many times are observed, and in which cancer tissues are observed. Use keyword cosmic61 to download and annotate by filter-based annotation.
: 2012Nov04: The NHLBI 6500 Exome data sets with indels and chrY calls is available from ANNOVAR now! Use keyword esp6500si_ea, esp6500si_aa and esp6500si_all to download and annotate.
: 2012Oct23: New ANNOVAR version is available. Registered users will get an email with download links. I also updated large portions of the website to provide updated information to ANNOVAR beginners. The major changes include: added -veresp argument to summarize_annovar.pl to suppert esp6500 data set, added -aamatrixfile argument to print out amino acid substitution scores such as Gratham scores, changed UCSC download from FTP to HTTP to help users with firewall settings, fixed a problem handling genericdb file when chr prefix is present for chromosomes, fixed a problem downloading index for gerp++gt2 files, added variants_reduction.pl program. Updated Oct25: the previous program cannot handle -veresp argument correctly, please download again from the same URL link. Update Nov01: I updated summarize_annovar.pl to take -alltranscript argument to print out all isoforms for exonic variants and to fix slight problems in variants_reduction.pl. Please download again with the same URL link.
: 2012Jun24: The NHLBI 6500 Exome data sets is re-uploaded as the previous version (2012Jun21) has only chr22 data. Please download again.
: 2012Jun21: The NHLBI 6500 Exome data sets are available to download now. Use commands like "annotate_variation.pl -downdb esp6500_ea humandb -webfrom annovar -buildver hg19". You can change hg19 to hg18 or change "ea" to "aa" or "all". The whole-genome GERP++ scores are available to download now but I only include those with RS>=2! User commands like "annotate_variation.pl -downdb gerp++gt2 humandb/ -webfrom annovar -buildver hg19" to download and use "annotate_variation.pl -filter inputfile humandb/ -dbtype gerp++gt2 -buildver hg19" to anntoate your inputfile. See download page.
: 2012Jun21: A slight bug fix to convert2annovar.pl is available to download.
: 2012May25: The 1000 Genomes Project 2012 Aprial data sets are available download (this is based on phase 1 release v3 called from 20101123 alignment). The populations include ALL, AMR, AFR, ASN and EUR. Use latest version of ANNOVAR and "-downdb 1000g2012apr" to download and "-filter -dbtype 1000g2012apr_eur" and so on to annotate. Additionally, 9 NonFlagged dbSNP data sets are available to download. See download page for details.
: 2012May25: A new version of ANNOVAR is available. Existing users will receive an email with link to download. The -seq_padding argument and -indel_splicing_threshold arguments were added, and a bug to report beginning/end of transcript as splicing variants was fixed, thanks to Jamie Teer @ NIH. The dbtype of 1000g2012apr is now supported with five populations (based on files from here), thanks to Mehdi Pirooznia @ Hopkins.
: 2012Apr17: New mRNA FASTA files were uploaded for hg18 and hg19 (refseq, knowngene, ensgene), given recent update in gene annotations. Users can always generate the latest files using retrieve_seq_from_fasta.pl by yourself. Updated hg18/hg19 SNP130/131/132/135 index files are uploaded, as the previous version has a minor issue that may miss a tiny fraction of SNPs during filter-based operation.
: 2012Mar08: New ANNOVAR is available with minor feature enhancements. The variation database 1000g2012feb is now available for ANNOVAR users (for 1000 Genomes Project Feb 2012 variant call release, with 38 million SNPs and 3.8 million indels).
: 2012Feb23: New ANNOVAR is available with cumulative bug fixes and many function enhancements. All indexes for ANNOVAR annotation databases have been updated to further improve speed for whole-exome sequencing data, see here for details. New summarize_annovar generates more informative results.
: 2011Dec20: Whole-exome GERP++ scores can be downloaded and annotated by ANNOVAR now for both hg18/hg19. Additionally, allele frequency data for the 5400 exomes from NHLBI (for European Americans, African Americans and all ethnicity) can be downloaded and annotated by ANNOVAR now for both hg18/hg19.
: 2011Dec20: A new generation of variants annotator called ANNOVAR++ is being developed and will be tested by certain avid users. Most known limitations in ANNOVAR will be solved by using this fundamentally new framework for annotation. Users will be able to specify your own customized workflow (summarize_annovar, auto_annovar, index_annovar, etc) in the future.
: 2011Nov20: New version of ANNOVAR is realeased. Major changes include: mRNA FASTA sequences without complete ORF annotation will no longer be used in exonic annotation, retrieve_seq_from_fasta.pl now reports transcripts whose ORF have premature stop codon, fixed the hg18_cg69 and hg19_cg69 allele frequency error and others. See the download page.
: 2011Oct02: The last Version of ANNOVAR has introduced some bugs related to ncRNA annotation, which subsequently affects exonic/splicing annotation. An updated version is released. Please report bugs to me if you still see problems.
: 2011Sep11: New Version of ANNOVAR is released with significant speedup of filter operation for certain databases (dbSNP, SIFT, PolyPhen, 1000G, etc), thanks to Ion Flux for the speed improvements. In previous version of ANNOVAR, filter-based annotation for ex1.human (12 variants) requires ~10 minutes for snp132, sift or polyphen. In the new version, it takes 1 second only! Performance improvements for larger query file will be less apparent. To use the new version, it is necessary to re-download the databases by -downdb. See details here. (Updated 2011Sep14: User reports that the previously uploaded program cannot download index file correctly and was fixed. Please download annovar program again).
: 2011Jun18: New Version of ANNOVAR is released with some function enhancements. New mRNA FASTA files were uploaded for hg18 and hg19 (refseq, knowngene, ensgene), given recent update in gene annotations.
: 2011Jun18: The 1000g2010nov file was updated to include indel calls. Now it has 26.1 million SNPs (released by 1000G in Nov 2011 based on Aug 2011 alignments) and 3.7 million indels (released by 1000G in Feb 2011 based on Aug 2010 alignments). A new 1000g2011may file was provided with 39 million SNPs. Read details here.
: 2011May06: New version of ANNOVAR is released with minor bug fixes and feature enhancements. Whole-exome pre-computed PolyPhen v2, MutationTaster, LRT, PhyloP scores are available as ANNOVAR annotation database to give more detailed annotation of non-synonymous mutations in humans, in addition to SIFT. Use "-downdb ljb_pp2 -webfrom annovar", "-downdb ljb_lrt -webfrom annovar", "-downdb ljb_mt -webfrom annovar", "-downdb ljb_phylop -webfrom annovar" to download them. Add "-buildver hg19" to download them in hg19 coordinate. The annotation database ljb refers to Liu, Jian, Boerwinkle paper in Human Mutation with pubmed ID 21520341. Cite this paper if you use the scores; higher scores (0-1) represent functionally more deleterious predictions. (2011May11: There is a bug in the hg18_lrt_pp2 file which has been fixed now; if you download before this date, please download file again. Please report other bugs).
: 2011May03: Fourty six whole-genome (variant calls and allele frequency information) from Complete Genomics are now available as a ANNOVAR annotation database. Users need to use "-downdb cg46 -webfrom annovar" (with either '-buildver hg18' or '-buildver hg19') to download the file. For filter-based annotation, use "-dbtype generic -genericdbfile hg18_cg46.txt" for annotation. The -score_threshold argument can be used to apply a MAF threshold.
: 2011Apr18: New mRNA FASTA files were uploaded for hg18 and hg19 (refseq, knowngene, ensgene), given recent update in gene annotations. Users can always generate the latest files using retrieve_seq_from_fasta.pl by yourself.
: 2011Mar25: dbSNP version 132 in hg19 coordinate with >30 million SNPs (more than double of dbSNP131). Download the files from the download page, or use "-downdb -webfrom annovar" in ANNOVAR to download directly (as the file is from ANNOVAR not UCSC).
: 2011Mar18: dbSNP version 131 and 132 in hg18 coordinate! There is a huge community demand to have latest dbSNP in hg18 (NCBI 36), but unfortunately dbSNP elected to work on hg19 only. Dr. Leparc lifted over the latest dbSNP files and provided the dbSNP131 and dbSNP132 file in hg18 coordinate for use in ANNOVAR. Download the files from the download page, or use "-downdb -webfrom annovar" in ANNOVAR to download directly (-webfrom is required as the file is from ANNOVAR website).
: 2011Mar01: Small update to AVSIFT database based on updated annotations at http://sift-dna.org/.
If you have questions, comments or concerns, contact