PennCNV
Home
Download
Installation
Tutorial
Quick Examples
Input Files
CNV Calling
Trio Calling
Joint Calling
De novo CNV
Validation Calling
QC & Annotation
Visualization
PennCNV Plug-in
PennCNV-Affy
Reference
FAQ

Download PennCNV

PennCNV is a software originally developed for CNV detection from Illumina whole-genome SNP genotyping arrays. It has been extended to handle candidate gene SNP arrays, to handle recent high-density arrays with non-polymorphic markers (so-called CN markers), and to handle Affymetrix genome-wide arrays.

PennCNV-Affy is a collection of data pre-processing protocols and helper scripts that convert Affymetrix CEL files into a data format suitable for CNV detection by PennCNV.

1 PennCNV main package
2 PennCNV example data sets
3 PennCNV-Affy protocols
4 Genome Browser tracks for commonly used SNP arrays
5 Archives of older versions of PennCNV

 

PennCNV main package

The link to latest version (2011Jun16 version) of PennCNV main package is given below. The package includes both source codes and pre-compiled executables for several commonly used system architecture, including 32-bit Windows. (Due to the problem of several unzipping software under Windows in handling *.tar.gz files, Windows users should try to download the zip file instead. The contents of these two files are identical). Files were updated 2011Jun28 to fix a small packaging issue. DO NOT STOP HERE, READ THE PARAGRAPHS BELOW TO DOWNLOAD UPDATED FILES.

User-supplied PFB files: Users can use the compile_pfb.pl program included in PennCNV to generate a PFB file for their specific arrays.

The hc12v1.hg18.pfb.gz and hc12v1.hg18.gcmodel.gz file are provided for HumanCytoSNP12 V1 array with 300K markers on that array specifically (based on Caucasian populations).

The ho1v1.hg18.pfb.gz and ho1v1.hg18.gcmodel.gz files are provided for HumanOmni1 QuadV1 array specifically (based on Caucasian populations).

Updated 2011Apr25: A user (Denise Kay) submitted an updated ho1v1.hg18.pfb.gz file in which 32,290 markers have modified PFB=2; these are originally designed as SNP markers by Illumina but their quality is too low so it is best to treat them as intensity-only markers by flagging PFB=2.

Updated 2011Mar18: Dr. Luis M. Franco from Baylor College of Medicine provided PFB file for the Illumina OmniExpress array compiled from 96 unrelated Caucasian individuals. The number of SNPs in the file’s current form is 703,965 ( pseudoautosomal and Y-chromosome SNPs are excluded).

Updated 2011Aug16: Dr. Colm O'Dushlaine lifted over the PFB file for Illumina OmniExpress array to hg19 coordinate.

Updated 2013Nov06: Dr. Lijian Yu from Harvard University provided Affymetrix PFB files in hg19 coordinate, using version na33 annotations from Affymetrix. Fie updated on 2014Jul23 to fix white space problem in the file.

Updated 2014Aug16: Han Fang from Cold Spring Harbor Laboratory provided PFB file for Illumina Omni 2.5M array with 2,379,855 markers, generated from 600 controls from the Simon Simplex Collection.

User contributed programs (Uploaded 2010Dec01) : The penncnv_to_plink.pl program can be used to convert PennCNV output into PLINK input format. Run it with -h argument to read the documentation. The program is written and provided by Matthew Gillman at the Wellcome Trust Sanger Institute.

User contributed programs (Uploaded 2011Feb27): The plot_raw_PennCNV.R program can plot from PennCNV rawcnv file on screen or to a high solution png file. Example screen shot 1, 2, 3 and 4. This program is written band provided by Dr. Bowang Chen. See updated script below.

User contributed programs (Uploaded 2012Jul16): The plot_raw_PennCNV2PDF.R program can plot from PennCNV rawcnv file to a high solution PDF file. This new script exports all plots to a pdf file, by default 70 samples/page (can be changed by users). It is much fast now, it plots a 50000+ line raw file to a 400+ page pdf in ~2 minutes. Example output is here. Example command line is "R CMD BATCH --no-save --no-restore "--args filename=\"$filename\" PDFname=\"$PDFname\"" $path_R/plot_raw_PennCNV2PDF.R ". This program is written band provided by Dr. Bowang Chen@German Cancer Research Center.

 

 PennCNV example data sets

This example data set contains genotyping data for a father-mother-offspring trio, genotyped on the Illumina HumanHap550 array. The file is in BeadStudio project format, and can be opened by the Illumina BeadStudio software. The file is used in the tutorial in the web site.

This file is an exported text file from the Illumina BeadStudio software using the above project file. It is useful for users without access to BeadStudio to test PennCNV.

This “serial dilution” data set contains genotyping data on a sample genotyped five times, each time with ~2-fold dilutions. For more details please see the Diskin et al paper. The file is in BeadStudio project format, and can be opened by the Illumina BeadStudio software.

 

PennCNV-Affy protocols

This package contains PennCNV-Affy protocols and helper scripts for handling Affymetrix Mapping 500K, genome-wide 5.0 and genome-wide 6.0 data as raw CEL files. For instructions on how to use the PennCNV-Affy package, please refer to the PennCNV-Affy link in the menu bar to the left of the page. The package below is updated on 2009Aug27, with very minor bug fixes only. It should produce identical results as previous version.

 

Genome Browser tracks for commonly used SNP arrays

Click any of the link below will load the corresponding track in the UCSC Genome Browser on hg18 coordinate, or 2006 human genome assembly. It may take several minutes to load each of the track. The user can visualize the location of the marker coverage in each specific arrays and compare them with each other. If the user does not want to see the marker names in the browser, the “dense” drop-down menu option can be used in the browser page to display red vertical bars for each marker.

All the genome coordinates are also annotated in the PFB file supplied with the PennCNV package. The Illumina data were based on supplied annotation from Illumina's sample files. The Affymetrix data were based on na26 (July 2008) annotation from Affymetrix website.

Please report annotation mistakes/bugs or request for additional arrays.

 

Archives of older versions of PennCNV

The older versions of PennCNV are given below together with one-sentence description of changes. It is highly recommended to use only the latest version. More detailed change log is recorded in the program code per se.

  • Latest version (2011Jun16): cumulative bug fixes and function enhancement. Added clean_cnv.pl and cal_gc_snp.pl scripts to clean CNV calls and for calculate GC content for user-supplied genome builds. Additionally, pre-compiled executables are included for 32-bit Perl 5.8.9, 5.10.1, 5.12.3 and 5.14.0 in Windows XP, Vista and 7.
  • 2010May01 version: reduce memory usage for PennCNV such that Affy6 array requires <2GB memory for CNV calling. Added compile_pfb.pl program to generate users' own PFB files given a list of signal intensity files. Added functionality to plot the signal intensity values for each CNV call for visual validation of reliability of CNV calls. Re-write genomic wave adjustment procedure to solve compatibility issues in certain system architectures during compilation.
  • 2009Aug27 verion: minor bug fix, added --lastchr argument to detect_cnv.pl to handle non-human arrays. Added infer_snp_allele.pl program to infer CNV-based SNP genotypes, or to validate de novo CNVs and assign P-values. Reorganize kext/ directory structure to accormodate different Perl versions. Fix the missing "-" before "minsnp" problem in the BeadStudio/GenomeStudio plug-in. Enhanced functionality of convert_cnv.pl, which now handles XML files exported from BeadStudio/GenomeStudio. Added -reciprocal argumen to the compare_cnv.pl program to fine-tune -minoverlap argument. Updated scan_region.pl program for better functionality and more accurate control of overlapping criteria. See "PennCNV main package" section above for link to download files.
  • 2008Nov19 version: adding functionality in the filter_cnv.pl and compare_cnv.pl program.
  • 2008Jun26 version: GC-model signal pre-processing to handle low-quality samples; family-based CNV calls by joint-calling algorithm; other enhancements
  • 2008Mar11 version: better compatibility with BeadStudio, beta-version of CNV visualization
  • 2008Mar03 version: beta-version for using PennCNV with BeadStudio plug-in, beta-version for signal adjustment for wavy samples
  • 2008Feb15 version: minor changes with more QC summary
  • 2007Dec14 version: support 64bit system and support cygwin now, print sample quality measure, produce confidence score by --conf argument (experimental feature)
  • 2007Nov13 version: cumulative minor bug fix and function enhancement
  • 2007Oct31 version: fix bugs for chrX processing; fix bugs in kcolumn.pl for splitting huge files
  • 2007Sep28 version: re-implement calling algorithm to process each chromosome separately; change output format