"Extreme performance data warehouse" refers to handling of “big data" or the processing of tera, peta, exa, zetta, yotta bytes of information and the ability to gleen information from these huge data sources. For those of you who are unfamiliar with these terms, tera is 10 to the power of 12 or 1000 to the power of 4 bytes (or 2 to the power of 40 bytes or 1024 to the power of 4), peta is 10 to the power of 15 or 1000 to the power of 5 bytes (or 2 to the power of 50 bytes or 1024 to the power of 5), exa is 10 to the power of 18 or 1000 to the power of 6 bytes (or 2 to the power of 60 bytes or 1024 to the power of 6), zetta is 10 to the power of 24 or 1000 to the power of 7 bytes (or 2 to the power of 70 bytes or 1024 to the power of 7) and yotta is 1000 to the power of 8 bytes!
In other words, If a MB (1,024,000 bytes) is a tablespoon of sand, then a gigabyte is patch of sand that is 9” square and 1’ deep, a terabyte is a sand box that is 24’ square and 1’ deep, petabyte is a mile long beach that is 100’ wide and 1’ deep, exabyte is the same dimension beach from Maine to North Carolina, zettabyte is the same beach covering the whole US coastline, and yottabyte is enough sand to bury the entire US in 295 feet of sand! For those who are curious, after yottabyte is brontobyte, then saganbyte and lastly pijabyte…
Traditionally corporations would try to process these volumes utilizing parallelized loading, processing and storage using Very Large Database Engines [VLDB] engines (such as Apache Hadoop, MapReduce, etc.). However VLDBs are often not linearly scalable and require specialized expertise to manage and maintain.
There are actually a number of difficulties when dealing with these enormous data volumes including capture, storage, search, sharing, analytics and visualization. Industry experts continue to push the limits because of the benefits of working with larger and larger files gives analysts to ability to find out “why are these things happening, what will happen next and how can I optimize the outcomes."
Though this is a matter of discussion and debate, the size of “big data" varies depending on who you speak to but current limits are on the order of tera-bytes and peta-bytes.
Industries that regularly encounter this problem occur in astronomy and meteorology, genomics, internet keystroke captures, biological and study research, finance and business informatics.
Data sets also grow in size because they are increasingly being gathered by machine generated or ubiquitous information-sensing devices, software logs, video and audio capture devices, sensor networks (RFID), control systems, mobile devices and so on.
One current issue with "big data" is the difficulty working with it using relational databases and desktop statistics/visualization packages, requiring instead massively parallel processing software running on tens, hundreds or even thousands of servers (usually configured as a grid).
A new player in the market (Greenplum) offers a lower cost mechanism to go massively parallel at a price point that is hard to beat. Their technology utilizes a combination of linearly scalable, massively parallel processing with storage that automatically places the information across the drives so that MPP queries are optimized. Greenplum also offers integrated toolsets for the analysis of information similar to what is seen in higher-powered analytic solutions (such as Teradata Value Analyzer and Warehouse Miner, Hyperion Essbase, etc.).
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access