PigGIS Update to v2
Update Date: 2010.03.03

General Questions

User's Guide

What can I found from the Pig Analysis Database?

The Pig Analysis Database presents accurate pig gene annotations in all sequenced genomic regions. It integrates various available pig sequence data, including 3.84 million whole-genome-shortgun (WGS) reads and 0.7 million Expressed Sequence Tags (ESTs) generated by Sino-Danish Pig Genome Project, and 1 million miscellaneous GenBank records. The Pig Analysis Database has covered nearly 50% of the whole pig genome and over 70% of the coding sequences (CDS), and aims to provide the most complete pig gene set to date.

In addition to gene annotations, the Pig Analysis Database also presents expressional information from 98 EST libraries, SNPs detected from both WGS reads and ESTs, oligos that can be used in microarray design and relevant evolutionary data.

Various views can be found here.

How did you make the pig annotations?

Human proteins were downloaded from Ensembl v32. They were aligned against ESTs and WGS reads by BLASTX. One sequence, either read or EST, was arbitrarily anchored to its best-aligned human gene, but the sequence might also be discarded as repeats if its best match is similar to its second best match. After all the sequences are anchored, PHRAP was applied to assemble collected sequences for each exon. The resultant contigs were further aligned to the corresponding exon by FASTY in order to fix potential frameshifts. Contigs with protein identity less than 80% at protein level were discarded, and only one best-aligned contig were reserved.

How are SNPs detected?

After the assembling step, one human exon would be aligned to zero or one pig contig. SNP detection pipeline was then applied. In this pipeline, a high-quality base of a sequence is a base pair that satisfies: a) its quality is not lower than 25, and b) the qualities of its 5-bp flanking sequences is not lower than 20. If high-quality bases at the same position disagreed with each other, a SNP was then detected.

How many `views' are there in this database?

Transcript View that shows the consensus of the pig coding sequences, Gene View that presents various splicing forms of human genes, Exon View that gives alternative sequences given a human exon, Cluster View that depicts the detailed assembly of contigs, SNP View that displays the every detail about a pig SNP, Trace View that displays the graphics of trace files, and Sequence View that presents the complete information of raw sequences.

What are pig consensus sequences?

One pig consensus sequence represents one transcript. It is made by concatenating the coding regions of the best aligned contigs that are anchored to one human transcript. Noncoding regions were trimmed out, and unsequenced regions of the pig were filled with gap `-'. In the Pig Analysis Database, each pig consensus corresponds to just one human transcript, but NOT all the human transcripts can be recovered due to the existence of species-specific regions and also the incompleteness of the pig genome.

Why do I see an EST present in `ESTs only' assembly but absent from `WGS reads+ESTs' assembly?

WGS reads and ESTs were first assembled separately, and then they were merged together at exon level, also by PHRAP. However, it is not always guaranteed that the merged assembly can always consist of all the original sequences. More mismatches may make PHRAP behave differently.