ANNOVAR
Home
Download
Quick Start-up Guide
Prepare Database
Prepare Input File
Annotation
Gene-based
Region-based
Filter-based
Accessary Programs
FAQ

Quick Start-up Guide

For impatient users, here is a quick start-up guide to show what ANNOVAR can do. Assuming that a Linux computer is used and that the computer is connected to Internet, go to the ANNOVAR directory, then run the following commands one by one. (Note that if you already added ANNOVAR path into your system executable path, then typing "annotate_variation.pl" would be okay instead of typing "perl ./annotate_variation.pl"). Internet connection is required for those commands involving the -downdb operation.

perl annotate_variation.pl -geneanno example/ex1.human humandb/

perl annotate_variation.pl -downdb band humandb/
perl annotate_variation.pl -regionanno -dbtype band example/ex1.human humandb/

perl annotate_variation.pl -downdb mce44way humandb/
perl annotate_variation.pl -regionanno -dbtype mce44way example/ex1.human humandb/

perl annotate_variation.pl -downdb 1000g humandb/
perl annotate_variation.pl -filter -dbtype 1000g_ceu example/ex1.human humandb/

perl annotate_variation.pl -downdb avsift humandb/
perl annotate_variation.pl -filter -dbtype avsift example/ex1.human humandb/

Hopefully you can get a rough idea what ANNOVAR can do after doing these exercises.

The first command annotates the 12 variants in ex1.human file and classify them as intergenic, intronic, non-synonymous SNP, frameshift deletion, large-scale duplication, etc. Examine the ex1.human file to see the simple text format, one variant per line. The annotation procedure should take a few seconds in a typical modern computer. Two output files are generated as ex1.human.variant_function and ex1.human.exonic_variant_function. Examine the two output files in example/ directory to see what they contain. Note that he ANNOVAR package already contains a humandb/ directory with RefSeq library files (so that users can start running gene-based annotation in ANNOVAR immediately without doing "-downdb gene" first); the other commands below requires -downdb argument to download database files first.

Next, the program downloads cytogenetic band annotation databases from the UCSC Genome Browser and saves it to the humandb/ directory as hg18_cytoBand.txt file, then annotates variants in ex1.human file and idenifies the cytogenetic band for these variants. The annotation procedure should take a few seconds. Examine the output file ex1.human.hg18_cytoBand to see what it contains.

Next, the program downloads 44-way alignment Most Conserved Elements database (users need to be patient, but once the file is downloaded it can be used in the future without downloading again), and then annotate variants in the ex1.human file and identify a subset of variants located in conserved regions (as well as their normalized conservation scores). The annotation procedure should take a few minutes and requires <2GB memory. Examine the ex1.human.hg18_phastConsElements44way output file to see what it contains.

Next, the program downloads 1000 Genome Projects allele frequency annotations (users need to be patient again, as the file is large), and then identify a subset of variants in ex1.human that are not observed in 1000G CEU populations (saved in ex1.human.hg18_1000g_ceu_filtered) and those that are observed with allele frequencies (saved in ex1.human.hg18_1000g_ceu_dropped file). Using the latest ANNOVAR (September 2011 version or later), the annotation procedure should take less than one second.

Finally, the program download the AVSIFT database, which cotntains pre-computed SIFT scores for all possible whole-genome non-synonymous mutations (again, be patient with the downloading process; it takes a while but it is worth it). The program then identifies a list of "benign" (SIFT>0.05) non-synonymous variants in the input file, and saves them to ex1.human.hg18_avsift_dropped file; the rest of the filtered variants (SIFT<0.05 or without annotated SIFT score) were written to ex1.human.hg18_avsift_filtered file. Using the latest ANNOVAR (September 2011 version or later), the annotation procedure should take less than one second.

Note: By default, all the above commands work on variants files in hg18 (human genome NCBI build 36) coordinate. If your file is in hg19 coordinate, add "-buildver hg19" in every command that you use. Similarly, if you variant file is from fly, add "-buildver dm3" in every command that you use; if your variant file is from mouse, add "-buildver mm9" in every command that you use ......

These represent a few examples how ANNOVAR can help researchers annoate genetic variants generated from high-throughput sequencing data. Of course ANNOVAR can do much more than that, so if you are interested, click the banner to the left to learn the input formats and see what additional things ANNOVAR can do for your research...