Big data in data processing systems appear either from their source (Big Source Data), or as a result of processing (Big Processable Data).
The amount of Big Source Data depends on two factors: the data’s source (in business, it is a business process) and the necessary (for example, for a business management) information. Both of these factors are specific for requirements to a particular data processing system; they do not depend on how the data processed.
The amount of Big Processable Data depends on how the source data processing is organized and processed. It means that, when more source data accumulates for further processing, the amount of the source data grows, becoming the Big Processable Data. The amount of Big Processable Data, primarily, depends on how long the source data accumulated for further processing.
For example, if the source data appear with an average amount 1,000 transactions per second and these source data are accumulated during an hour for further processing, then the amount of data which should be accumulated during 1 hour, and then processed, will be:
1,000 transactions/second * 3,600 second/hour = 3,600,000 transactions/hour
The further accumulation – the amount of accumulated for one processing cycle data grows:
- 3,600,000 transactions/hour * 24 hour/day = 86,400,000 transactions/day
- 86,400,000 transactions/day * 30 days/month = 2,592,000,000 transactions/month
- 86,400,000 transactions/day * 365 days/year = 31,536,000,000 transactions/year
This means that something like a monthly report can grow to 2.5 billion transactions. Such an amount of data already can be considered as relatively Big Processable Data, especially, comparing with amount of the source data appeared per second – 1 thousand transactions.
The conclusion is: if we increase the frequency of data processing, reaching the real time mode, the amount of processed data reduced, in our example, millions times. To do it, it is necessary to build the real time data process which can process 1,000 transactions in a second, for example, using modern hardware and software capability with their distributed and parallel processing.
It is clear to see that raw data at a high frequency quickly accumulates to astronomical quantities. The clear solution is implementation of real time data processing.
The natural question can be asked, “Why do real time data processing systems not appear in the general IT practice”? There are two main reasons: historical habit and absence of business requirements. Let us consider both of these reasons.
The first data processing systems, in which data were accumulated for further processing, appeared when the first computers had processed cards which were accumulated in long boxes. Then, this accumulation transformed into saving data on magnetic tapes, later – disks. This style becomes a tradition in data processing systems, and this tradition was absorbed by the business community as a natural data processing style.
Ever since then, businesses assume that to get a monthly report – it is necessary to accumulate data during the month and process it all at once. This tradition has become supported from both sides – business community and IT community. Even a real time data service has not changed the situation – accumulation of data still, in most cases, stays the same in IT practice: more and more source data are accumulated for further processing. While there are some exceptions, this inefficient approach predominates in most businesses.
How to overcome the standstill in widespread usage of the real-time data processing systems which process source data in real-time? There are, at least, three directions of work to switch to the real-time data processing systems:
Propagation of the real-time data processing approach Development of the real-time data processing technique Creation of the real-time data processing systems
The propagation should explain to the business community, and first of all, to the business management, that real-time data processing systems will give them ability to get information in a real time and with lower cost because there will not be expenses on data accumulation. When the business community begins to realize that it makes sense to try the new approach, the requests for real-time data processing systems will appear, and it will let IT professionals begin to build the real time data processing systems.
The ways of implementation the real-time data processing already exist. For example, there is the transfer of data between databases with one COMMIT or usage of the service interface. Nevertheless, an increasing number of created real time data processing systems may need a new technique.
Creation of the real-time data processing systems, definitely, will be the strongest argument in support of the idea of creation real-time data processing systems.
When I was asking in different groups of IT community, “Why are real-time data processing systems not created?”, I heard that business does not ask to create such systems. By that reason, the propagation should come first in the efforts to create real-time systems.
I believe that joint efforts of IT and Business communities will make the real time data processing systems a common practice. (About the author: Consultant Alex Treyster started working in IT as a technician. Alex later worked as a business analyst, an IT developer, a database administrator, a data modeler, and a data / information / data warehouse architect. He currently works as a solution architect, has a Masters in the Data Processing, and has been published more than a dozen times, including publications in editions of the Ukrainian Academy of Science. Contact Alex at email@example.com.)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access