Publisher's Insight: Bus-Tech, Inc.

The ability to move large amounts of data for data mining from mainframes to RISC parallel systems that operate under UNIX is a daunting task, especially considering that the typical window available for data movement is continually shrinking. May & Speh is a leader in providing direct database marketing services to both mid-size and large corporations. They are continually downloading between .3 and .5 terabyte databases to their high-end UNIX parallel processing systems. Their solution to moving such large amounts of data from the mainframe was DataBlaster 2, an innovative IBM-channel-to-SCSI data transfer system.

Ron Powell
Publisher, DM Review

DataBlaster2:The Catalyst for Converting Data to Knowledge

by Anura Gurugé There is a wealth of difference between data and knowledge. Data is a rough, uncut gem, while knowledge is a mesmerizing, multifaceted diamond. It is thus not surprising that the arduous process of extracting vital knowledge out of raw data is referred to as "data mining." Data mining enables interrelationships buried within gigantic databases to be exposed and exploited.

Data mining provides corporations with the knowledge base that is imperative for today's highly competitive, highly targeted direct-market, product-introduction or customer retention initiatives. With data mining, corporations can obtain lists of people who fit a set of very specific criteria.

Data mining, which relies on the minute scrutiny and intense analysis of huge amounts of data, is extremely processor intensive. Consequently, data mining is invariably performed on very powerful parallel processing systems that harness the computing power of multiple RISC processors. The Sun Ultra Enterprise 10000 and the HP 9000 Enterprise Server are good examples of such parallel processing systems. However, the raw data that is to be mined is rarely kept or managed on these RISC parallel systems that operate under UNIX. Instead, IBM (or compatible) mainframes are the inevitable repositories of the "unmined" data. A major challenge faced by corporations, such as May & Speh (Downers Grove, Illinois), that works with data mining is that of continually down loading .3 to .5 terabyte databases from the mainframe to the high-end, UNIX parallel processing systems. Conventional mainframe-to-UNIX interconnect approaches, including FDDI, invariably cannot download databases of this size within acceptable time limits.

Bus-Tech, Inc.'s DataBlaster 2 has repeatedly proved to be the optimum solution for rapidly transferring data between mainframes and UNIX RISC systems for data mining applications. DataBlaster 2 is an innovative and highly acclaimed IBM-channel-to-SCSI data transfer system. It is an ultra high-performance, mission-critical and cost-effective solution for very high speed bulk data transfer at speeds approaching 32MB/sec (i.e., ~250Mbps)--between IBM (or compatible) mainframes and any system with a SCSI interface.

DataBlaster 2 can support two mainframe channel attachments and up to four Ultrawide SCSI interfaces. The mainframe channel interfaces can be either the 17MB/sec (i.e., 136Mbps) ESCON, or the 4.5MB/sec (i.e., 36Mbps) Bus-and-Tag. On the Ultrawide SCSI side, DataBlaster works at 40MB/sec (i.e., 320Mbps). Though mainly used with high-end UNIX platforms for data mining applications, DataBlaster 2 can also profitably be used with any other PC, workstation, mini-computer or RISC-based "server." In addition to UNIX, DataBlaster 2 supports PCs and workstations running Windows NT, Windows 95, DOS and OS/2.

With a DataBlaster 2 system, corporations involved in data mining or other knowledge-based ventures now have the ability to move, load, back-up and restore gigabytes of data between mainframes and servers in minutes, rather than hours--and, in the case of May & Speh, hours instead of days! The "ruggedized," rack-mounted DataBlaster 2 systems allow customers to create a separate data highway for mission-critical data movement needs that are totally independent of their heavily used LANs.

May & Speh's group leader for UNIX, Doug O'Leary, had this high accolade for DataBlaster 2 in terms of the tangible impact it made to their data center operations: "When populating our on-line marketing databases, which we refer to as data marts, it is not unusual for us to migrate 300GB to 500GB of data from our mainframes to our UNIX platforms. By utilizing DataBlaster 2 as the transport vehicle and four simultaneous data streams, we can accomplish this download in between five and eight hours depending on tape speed. Best estimate for our FDDI network was five days! " He continued, "We are expecting to reach terabyte-level databases within one to two years, and we already have multiple 300-500GB databases that must be loaded simultaneously. We literally would not be able to load these databases without DataBlaster 2."

What is truly innovative and unique about DataBlaster 2 is its ability to sustain a near continuous, virtually error and retransmission-free, extremely low-overhead data flow between mainframes and systems with Ultrawide SCSI interfaces. The additional bandwidth of ESCON (i.e., 136Mbps) and Ultrawide SCSI is not the only reason that makes mainframe-to-SCSI data transfer through DataBlaster 2 considerably faster than LAN-oriented schemes--even if the LANs are 100Mbps Fast Ethernet or FDDI.

For a start, DataBlaster 2 uses "data streaming" at both the mainframe and the SCSI platform sides. Data streaming is a technique whereby very large (e.g., 1MB range) blocks of data are transferred with an absolute minimum of delay between the transmission of the individual blocks. The goal of data streaming is to deliver a near continuous stream of data to the recipient at the maximum bandwidth of the I/O interface being used. With two ESCON interfaces, the total data-streaming I/O throughput of a single DataBlaster 2 is around 272Mbps. With this level of sustained throughput, it is not surprising that DataBlaster 2 is referred to by many as the "Ultimate Data Mover."

LANs cannot work in data-streaming mode. Neither can they support very large block sizes. The typical maximum block size for FDDI is around 4,096 bytes! In addition, all LAN schemes enforce a mandatory "inter-frame" gap between the transmission of consecutive frames. Moreover, there could be a "media- access" delay between each transmission while the transmitting station ensures that it has full use of the LANs actual physical layer by either capturing the "Token," in the case of FDDI and Token-Ring, or by checking for frame collisions caused by media access contention in the case of Fast Ethernet. On top of all of these performance-sapping factors, there are the relatively large headers, trailers and "preambles" that have to be appended to each data frame sent over a LAN. The combined effect of the small block sizes, the inter-frame gaps, the media-access delays and the header/trailer overhead is such that a 100Mbps LAN (e.g., FDDI) is unlikely to deliver an actual data transfer rate of even 65Mbps--when measured over the duration of a large data transfer operation.

DataBlaster 2 transfers data between a mainframe and the SCSI-based platform (e.g., Sun or H-P) by appearing to be a high-speed tape drive to each of the two systems. To the mainframe, DataBlaster 2 appears as a high-speed IBM tape subsystem à la the IBM 3490 Cartridge Tape Drive. To the SCSI-based platform, DataBlaster 2 looks like a high-speed tape drive with a 320Mbps Ultrawide SCSI interface. No software changes are required at either end.

On the mainframe side, DataBlaster 2 is attached to the mainframe ESCON channels using the standard ESCON fiber cables or to a Bus-and-Tag channel using the standard, heavy-duty, twin bus-and-tag, copper-based cables. On the SCSI-platform side, DataBlaster 2 is connected to the SCSI system via standard Ultrawide-SCSI port with heavy-duty, Ultrawide SCSI cables.

DataBlaster uses Bus-Tech's highly proven, award-winning and renowned channel-attachment hardware and software. DataBlaster 2's SCSI interface is built around Adaptec's Ultrawide-SCSI adapter. DataBlaster 2, which comes as a rack-mount device, has a built-in RS-232 "service port" to facilitate "dial-in" remote configuration and diagnostics. DataBlaster 2 supports redundant configurations for ultra high-availability, fail-safe applications. There are eight models of DataBlaster 2 that differ in the mainframe and SCSI port configurations supported.

May & Speh is a leading provider of "technology based" direct and database marketing services. May & Speh's acknowledged specialty is taking diverse sources of customer and prospect data and converting this raw data, through the innovative and concerted application of leading-edge computer technology, into valuable knowledge databases. DataBlaster 2 is now an invaluable asset in this knowledge extraction process. DataBlaster 2 is an integral part of May & Speh's $60+ million data center that boasts of a processing capacity in excess of 2,400MIPS.

Gainfully harnessing the respective strengths of mainframes and high-end UNIX platforms is pivotal to the value-added, integrated data management services provided by May & Speh. The key issue they had to successfully solve as the data files they were transferring from their mainframes to the UNIX platforms started to approach the .5TB range was that of being able to achieve these file transfers in a realistic timeframe. With FDDI, they were looking at download times that were being quoted in days! This was not acceptable or viable. Fortunately, DataBlaster 2 came to the rescue. DataBlaster 2s have enabled May & Speh to pulverize their mainframe-to-UNIX file transfer delays. File downloads that were estimated to take five days over FDDI are now completed within five to eight hours--a staggering 24-fold reduction in download time.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access