What is Big Data
The term Big Data describes the large volume of data that currently floods companies, health, macroeconomics and, in general, all the aspects that make up our society and the way we relate to each other.
When we talk about Big Data we refer to huge amounts of data whose size, complexity and growth rate make it extremely difficult to capture, manage, analyze and of course process it using conventional tools.
In this way, the minimum size that decides if we talk about Big Data is not perfectly defined and, in fact, can evolve over time. The most accurate consensus is found in data sets that range between 40-50 Terabytes.
Big Data characteristics
This feature refers to the huge amount of data generated and available in our environment. In a Big Data project the amount of information that is generated is immense and above all, it does not stop being produced. As databases grow in size, so do the applications and architecture built to collect and store that data.
Here we talk about the speed with which data is created, stored and processed. In some processes , time is essential and a delay in its processing would be fatal. If the data is not received, stored and processed in real time, it will become obsolete and its usefulness will be lost.
The data received in a Big Data project is generally very diverse. This data can come from various sources and can be found in different formats . For this reason we need to integrate different technologies and applications to be able to organize, process and integrate the different data and to be able to obtain effective conclusions or identify useful patterns.
This last characteristic refers to the reliability of the information that is collected in the process. The quality of this data will be essential to reach effective conclusions and even a competitive advantage in the business world. Companies will have to invest in applications that are capable of identifying and eliminating data that is unpredictable or that causes uncertainty.
In short, Big Data allows us to analyze vast amounts of data and obtain answers to questions, solve problems and improve production processes. The compilation of these enormous amounts of information and the search for trends and patterns within the data allow us to identify what we call opportunities and create value in multiple ways; namely: reducing production costs, designing products and services that are in demand, improving efficiency in the company’s decision-making or creating advertising campaigns more targeted to the needs of each specific customer.
Types of data in Big Data
The data that has to be processed in any Big Data project can be classified according to different criteria. One of the most widespread classifications is the one proposed by the giant IBM:
It refers to the data that comes from call records, messaging and bills that a communications operator can record: telecommunications, use of cards, types of payments, etc.
They refer to all the data that is generated from browsing the Internet, web pages and social networks. It is really useful information, which can allow companies to know the preferences and tastes of consumers.
This classification encompasses all those technologies that connect to devices (with sensors, for example) to collect large amounts of data. These types of sensors can be of various natures and collect all kinds of data: transport, thermometers, automatic irrigation, electricity meters, water pumps, marine buoys, etc, etc.
In this section we include the reading of fingerprints, retina, facial recognition, genetic recognition, voice, etc. In short, all the data that helps to unequivocally recognize an individual.
We as human beings also generate information on a daily basis with our behavior: phone calls, emails, voice notes, mobile messaging and web applications, etc.
But it is not the only way we have to classify the data in a Big Data project, it can also be done based on its format or structure:
We refer to data that is perfectly arranged according to some type of pattern, which allows it to be stored in tables and processed quickly and efficiently.
They could be the vast majority and refer to data that does not have a given definition: length, format, shape,… They are data collected in its original form, without processing. We are really surrounded by them in our day to day: spreadsheets, images, data files, audio recordings, etc.
Semi Structured Data
They are data with a certain previous organization, but that are not perfectly structured, for example HTML files (markup language for the construction of web pages).
Also Read: IMPS Full Form