edecibel.com edecibel.com
   Home Page >> About Us >> Privacy >> Terms & Conditions >> Add Your Link >> Submit Article
Search:   
Add Url
 

Computers & Networking

Teens & Kids

Health & Hygiene

Sports & Adventure

Hotels & Travel

Property & Agents

Indoor Games

Self Enhancement

Malls & Shopping

Careers & Employment

Academics & Learning

Finance & Investment

Government & Politics

Fashion & Relationships

Medicine & Treatment

Events & News

Automotive

People & Society

Food & Recipe

Garden & Home

Science & Research

Business & Services

Creative Arts

Recreation & Entertainment

 

Home Page –› Academics & Learning –› Pure Sciences
 

Capturing the Data and Making It Useful

 
Author: Aaron Hall
 

Redesigning GDB and GSDB

The explosive growth of information and the challenges of acquiring, representing, and providing access to data pose new and monumental tasks for the large public databases. Ken Fasman [Genome Database (GDB)] and Gifford Keen [Genome Sequence Data Base (GSDB)] discussed the restructuring of GDB and GSDB to handle the flood of data and make it useful for downstream biology.

GDB

Observing that one can't scroll or BLAST through 3 billion base pairs in a meaningful way, Fasman defined GDB's future role as the coordination site for the complete electronic description of the human genome. The map, he asserted, provides an ideal framework for jumping into the sequence (http://www.gdb.org/).

Fasman described the extensive changes made to GDB over the last 2 years that have culminated in the enhanced representation of genomic maps and gene information in GDB V6.0, which was released early this year [HGN 7(3-4), 13-14 and 7(5), 15].

Redesign of the database schema and front-end interfaces now provide true graphical genetic and physical map representation; direct community editing and curation, including third-party annotation; and an improved model for gene information that includes links to databases describing function, structure, products, expression, and associated phenotypes. A user can create a link from any GDB object to any other entity on the Internet. GDB plans to become the focal point for accessing information about the human genome.

Under the Hood

New technologies used in developing V6.0 include an object-oriented data model, object broker, data-driven WWW interface, and graphical interfaces for the most popular computer platforms. The new GDB architecture depends heavily on OPM developed by Victor Markowitz and colleagues at LBNL (see "GDB-LBNL"). GDB 6.0 data representation is captured in a schema file that drives all other pieces of software. This new architecture will enable GDB to adapt more quickly to changes in biological knowledge and representation of maps, genes, and other structures.

At the heart of the system is a Sybase database server that communicates in SQL, the relational query language. Everything from that point forward deals in complex objects, rather than in the rows and tables of a relational database.

Goals

Future enhancements will include improved map editing, an integrated editing environment, improved polymorphism and mutation representation, and integration with the specialized GSDB Sequence Annotator and Mouse Genome Database interfaces. To tie GDB to the evolving sequence databases, an interface is being developed to represent gene structure maps (maps of introns, exons, and regulatory regions associated with genes).

GSDB

Keen identified data acquisition, representation, and access as major issues for sequence databases.

Capturing and Annotating Data

Data acquisition is a two-part challenge, he said. Vast quantities of sequence data will be captured with custom software for bulk-submission processes; future plans include direct database-to-database communication for direct downloading of data from laboratories into GSDB. The more difficult task in data acquisition, he noted, is capturing the follow-on sequence annotation, which is usually published in print journals and subsequently "lost." This data will be crucial for studying gene expression, variation, and function. GSDB Annotator, a graphical browser and editor, is being developed to facilitate community annotation of the database. Researchers are also working to provide access to such common analysis algorithms as BLAST and GRAIL.

Data Representation: Building Whole Chromosomes

In addition to captured sequences and annotations, information needs to be generated about relationships between sequences. The data must be maintained in a form capable of supporting complex, ad hoc queries. GSDB is working toward a model within the near future of 24 sequences for humans, one for each chromosome. As data comes in, it will be aligned to the representative sequence, which initially will have many gaps. Keen drew an analogy of GSDB as a community laboratory information-management system supporting what is essentially a multiyear, multilaboratory, multiorganism shotgun-assembly process. Feature accession numbers will enable separation of annotation from sequences.

Data Access

Although GSDB has the tools and the structure (normalized and atomized data) to answer such robust queries as annotation relationships, problems with data quality and consistency do not allow this to be done well. GSDB is now mounting a major effort to develop software for rationalizing the data stream as it enters the database.

GSDB has also developed an object-oriented access library that sits on top of the database. Almost all GSDB applications and the software that imports data from other databases work through this object layer. GSDB will make the object libraries and an application programming interface available to the public. Programmatic access will be through assigned accounts, and the database can be accessed either through the object libraries or directly on the table, row, and column level.

Availability

The new GSDB schema is complete and should be operational later this year. After fairly extensive alpha and beta testing, GSDB Annotator should be released at the same time on Mac and Sun, with Windows to follow. Software will be available via ftp from NCGR's Web site

 
 
 

Related Articles

 
Human Genome Working Draft: First-Edition Travel Guides
 
Education: How Summer Vacations Reinforce Performance Failure Conditioning
 
Microbial Genome Explorations
 
Copyright In The Classroom
 
An Overview of the Sun
 
Succeeding at College in a Study Abroad Program
 
Self Improvement Books: Valuable Guides Or Dusty Doorstops?
 
Teacher Tips: Helping ADHD Students to Perform Better in Your Classroom
 
Genome Database 6.0: An Experiment in Community Curation
 
Violence in Schools
 
 
 
 

DOE Refocuses Instrumentation Program

In April the DOE Office of Biological and Environmental Research (OBER) announced its interest in re ... - Aaron Hall
 

Get Your College Degree Online

Does an online degree meet the requirements of your employer? Are the educational standards needed f ... - simon rand
 

The Lowdown On Earning An Associate Degree In Nursing Online

This article discusses earning an associate degree in nursing online in great detail. - Sara Reed
 
 

Scientist Wipeout Half of Endangered Pupfish

The pupfish population in Death Valley is at a low ebb after a scientific disaster that killed half ... - John T Jones, Ph.D.
 

SBIR 1996 Human Genome Awards Announced

In July the DOE Office of Health and Environmental Research announced awards in human genome topics ... - Aaron Hall
 

Don't Think So Hard

Earlier this year I reread the book "Think & Grow Rich" by Napoleon Hill. - Kevin Thompson
 

How Is Biodiesel Made?

A look at the manufacturing process for conventional Biodiesel. - Mark A. Allen
 

Who is a Successful Scientist?

This article may be interesting for those, who write their thesis and are sure to become famous scho ... - Chris Wells
 
 
Home Page >> Privacy >> Terms & Conditions
Copyright © 2008 www.edecibel.com All Rights Reserved.