The world's leading source of technology news and analysis
Search Spectrum IEEEXplore Digital Library Submit
Font Size: A A A
IEEE
Home [Alt + 1] Magazine [Alt + 2] Bioengineering [Alt + 3] Computing [Alt + 4] Consumer [Alt + 5] Power/Energy [Alt + 6] Semiconductors [Alt + 7] Communications [Alt + 8] Transportation [Alt + 9]

Downloading the Sky Continued By Jonathan C McDowell

First Published August 2004
emailEmail PrintPrint CommentsComments ()  ReprintsReprints NewslettersNewsletters

Astronomers Are Drowning In Data. To take one example, the ambitious Sloan Digital Sky Survey is using ground-based telescopes equipped with digital cameras to record a quarter of the sky with unprecedented accuracy and depth; its latest release of images and related data totaled six terabytes. The newest catalog of star positions, published by the U.S. Naval Observatory in Washington, D.C., contains more than a billion stars, while a single observation from the Hubble Space Telescope can easily swallow several gigabytes. In the next few years, as the archives from these and other instruments continue to swell and a number of new large-scale projects come online, the total amount of data is expected to double every year or two. Not surprisingly, astronomers are already worried about how to find what they need. Similarly crushing tides of data await lots of other people in other fields, whether they work in multinational corporations or in large, geographically dispersed research projects.

This abundance of astronomy data is actually fairly new. Decades ago, observations were made for a project and then thrown away—if you wanted to ask another question about the same star or galaxy, you went back to the telescope. The launch of the first space telescopes in the 1970s, such as the Einstein Observatory and the International Ultraviolet Explorer, changed that. Their high cost convinced researchers, and funders, that the data were too precious to lose.

In the process, archival astronomy was born. Researchers quickly learned that data gathered for one star could be reused to study other stars; hundreds of papers were generated in this way. Nowadays, astronomers, more than other scientists, tend to share their data. Those who observe on a NASA space facility, for example, get exclusive access to their data for only one year—after that, anyone can download it and look for discoveries that the original observer may have missed. [For a glimpse of how modern astronomy is done, and how the VO might help, see box, "An Astronomer's Life."]

To take the next step and make any data set available to any astronomer anywhere in the world will mean solving a number of major challenges. These include resolving differences in data format, defining a query language for accessing the VO, creating the computational infrastructure, figuring out how to keep the VO up to date as new data sets are created, and, of course, getting all the players to agree on the many software standards and protocols.

Astronomical Data Take Many Forms, depending on the instrument that collected them and the format and medium they were stored in. Some records aren't digitized; a lot of radio astronomy data, for example, are still on analog nine-track tape. Some smaller observatories don't even archive their data; instead, researchers take home whatever raw data they collect. VO collaborators hope that as the virtual observatory comes online and begins to prove its worth, the data laggards will devote the resources necessary to create or upgrade their databases.

Further complicating things is that different archives can refer to the same object by different names. The International Astronomical Union, based in Paris, oversees the naming of celestial objects (and no, you can't pay to have a star named after you), but that doesn't prevent other unsanctioned designations from popping up. So when comparing astronomical catalogs covering two types of wavelengths, researchers must also typically check an object's position. Such double-checking fails, however, if the object in question is visible (and therefore recorded) only at one of the wavelengths.

Tracking down data sets, which can take weeks or even months, became somewhat more straightforward in 1996, with the creation of the online SIMBAD service. Run by the Stellar Data Center, SIMBAD lets researchers call up a list of papers that cite a celestial object, plus its other names, its position, and a few other numbers. What SIMBAD doesn't tell you is where the data are archived; nor does it return actual data with which you can do actual science.

For that, you need the VO. With it, a user in Chicago will be able to sit at her computer, type in a data request—say, all the brightness information at all different colors of the spectrum for quasar PG1407+265—and then wait for the data to come in. Behind the scenes, her query may be processed by a Web portal at Caltech, which in turn searches several archives, including one based in Strasbourg that lists star locations and another in Cambridge, Mass., that knows the stars' X-ray intensities. After the searches, the Web portal gathers up all the results and replies to the user.

As this scenario suggests, the Virtual Observatory is a distributed system, much like the Internet itself. To link its disparate parts—to "federate" them, as astronomers say—the VO is being built around registries. A registry is basically an online catalog of what is in each archive, indexing the virtual sky by position and wave band; it is continually updated to incorporate new data and new archives. Functionally, a VO registry is like the domain name servers that point to things on the Internet. Prototype VO registries are already running at Johns Hopkins, the University of Illinois, and Caltech. The Data Inventory Service, created by Thomas McGlynn and colleagues at NASA Goddard Space Flight Center, calls on these registries (and eventually others) to locate data based on an object's position or name (see http://heasarc.gsfc.nasa.gov/vo/).

The VO registries will also point to Web-based programs, known as Web services, which will allow data from those archives to be processed. Astronomers already use various software tools for analyzing and filtering their data, but such programs are designed to run on local workstations, using locally stored data. A Web service, by contrast, is accessed through the Internet, and the user may not even know it is running. The VOStat service, for example, lets users run many types of statistical routines on their data; the user doesn't need to worry about having the latest statistical software.

Sifting through these disparate databases is eased by past attempts at data standardization, such as the Flexible Image Transport System (FITS) format. Invented by radio astronomers back in the 1970s to exchange data on magnetic tape, FITS has since been widely adopted by other astronomers, and FITS files can now be read by almost all astronomical software.

But FITS typically can't be read by mainstream software. The VO team therefore plans to supplement FITS files with eXtensible Markup Language descriptions of the data. Although XML is fast becoming the common text format for exchanging a wide variety of data on the Web and elsewhere, astronomers have been relatively late to embrace it. The VO's first use of XML is the VOTable format, developed by groups at the Stellar Data Center and Caltech, for exchanging tables and star catalogs.

Another problem with FITS is that it allows each group to make up the keywords that describe what the file contains; uninitiated astronomers have no way of deciphering these custom keywords. What's needed is a precise and universal vocabulary. For example, if I ask the VO for data about photon frequencies, I don't want data about stellar pulsation frequencies. The UCD (or Unified Content Descriptor), invented by the star-catalog experts at Strasbourg, is a first cut at defining such an unambiguous vocabulary. Initially, the VO will use UCDs to augment FITS keywords; eventually, they could become the sole means of describing a file's contents.

Many Of The Vo Astronomers pride themselves on their computer savvy. Even so, they can be overwhelmed by the latest software techniques and jargon. VO computer scientists are equally lost in the zoo of celestial fauna, which includes such exotica as exoplanets, magnetars, and superclusters, to say nothing of the arcana of astronomical instrumentation. An "object" to astronomers may be an enormous physical thing in outer space, but to computer scientists familiar with "object-oriented design" it is an abstraction describing a concept in software.

A more serious ongoing debate revolves around how much and what kinds of new computing techniques to incorporate into the VO; the computer scientists lean toward the most cutting-edge technologies, whereas the astronomers worry whether the new ways will be useful and stable in the long run.

One such argument involves the use of so-called virtual data. At present, most astronomy archives store their data as calibrated images; the calibration takes into account deviations introduced by the instrumentation—hot pixels on the charge-coupled device (CCD) camera chip, say. With a virtual data system, archives would store only raw, uncalibrated data; each time a user would ask for a particular image, the data would be processed and calibrated, and the image created on the fly. Virtual data have the advantage of taking up less storage space and being easier to archive. On the other hand, such a system is fragile—a hardware change or failure may render the software unable to process the data, leaving the user with no means to generate images at all.


« Previous Page 2 of 3 Next »
emailEmail PrintPrint CommentsComments ()  ReprintsReprints NewslettersNewsletters

MOST POPULAR

Most Read Articles Most Emailed Articles Editor's Pick Articles
Most Read Content

Top 3 most read articles:



WHITE PAPERS

Featured White papers:

More»

White papers:

      More»