And here is the point where data processing changes data center operation.
Throughout businesses we are moving from specialized, proprietary, and typically expensive supercomputers to the deployment of clusters of commodity machines connected with a low cost network.
The Total Cost of Ownership (TCO) determines the fate, quality, and size of a DataCenter. If the business is small, the DataCenter should be small; as the business demands, the DataCenter will grow or shrink.
Currently, one common practice is to create a dedicated cluster for each technology. This means you have a Spark cluster, a Kafka cluster, a Storm cluster, a Cassandra cluster, and so on, because the overall TCO tends to increase.
Modern organizations adopt open source to avoid two old and annoying dependencies: vendor lock-in and external entity bug fixing.
In the past, the rules were dictated from the classically large high-tech enterprises or monopolies. Today, the rules come from the people, for the people; transparency is ensured through community-defined APIs and various bodies, such as the Apache Software Foundation or the Eclipse Foundation, which provide guidelines, infrastructure, and tooling for the sustainable and fair advancement of these technologies.
There is no such thing as a free lunch. In the past, larger enterprises used to hire big companies in order to be able to blame and sue someone in the case of failure. Modern industries should take the risk and invest in training their people in open technologies.
The dominant and omnipotent era of the relational database is challenged by the proliferation of NoSQL .
You have to deal with the consequences: systems of recording determination, synchronizing different stores, and correct data store selection.
Data gravity is related to considering the overall cost associated with data transfer, in terms of volume and tooling, for example, trying to restore hundreds of terabytes in a disaster recovery case.
Data locality is the idea of bringing the computation to the data rather than the data to the computation. As a rule of thumb, the more different services you have on the same node, the better prepared you are.
DevOps refers to the best practices for collaboration between the software development and operational sides of a company.
The developer team should have the same environment for local testing as is used in production. For example, Spark allows you to go from testing to cluster submission.
The tendency is to containerize the entire production pipeline.