In 1999, the LICR and FAPESP (The state of São Paulo Research Foundation) initiated a multi-million dollar Human Cancer Genome Project (HCGP) to identify which genes were expressed in cancer, by the sequencing and characterization of ESTs from tumor samples and normal tissues. By 2003, the HCGP had generated just over one million RNA transcripts and submitted them to a publicly-available database, thus allowing the international scientific community to mine the data without restriction.
The HCGP effort, led by LICR Scientists at the Sâo Paulo Branch, developed and utilized a new cDNA generation technique called ORESTES (open reading frame expressed sequence tags), which generates expressed sequence tags (ESTs) from the center of the mRNA. The only other large-scale, public EST sequencing effort, the Cancer Genome Anatomy Project, was performed by the USA’s National Cancer Institute (NCI), and used techniques that sequenced ESTs from one end (the 3´ end) of the genes.
In 2003, the LICR and the NCI combined their data to create the largest, publicly-available EST database, the Cancer Genome Anatomy Project (CGAP) (http://cgap.nci.nih.gov/).