Data reliability in Protein databases

About 30% of the proteins in the databases have erroneous sequences due to:
- missing exons in the DNA translation.
- Introns mistakenly translated.

Another common problem is the assigning of functions to �new� proteins, based on sequence similarity.

About 30% of the proteins in the databases have erroneous sequences due to: