With the spread of the digital economy, a ‘Big Data’ environment is emerging in which an immeasurable amount of information and data is produced around us. Big data refers to large-scale data that is vast in scale compared to data generated in the analog environment in the past, has a short generation cycle, and includes not only numerical data but also text and image data.
 |
| Definition of big data |
As the use of PCs, the Internet, and mobile devices have become commonplace, the footprints (data) people leave everywhere are increasing exponentially (Jung Yong-chan, 2012a). Let’s take shopping as an example.
From a data point of view, in the past, data was recorded only when you made a purchase in a store. On the other hand, in the case of Internet shopping malls, even if a visitor does not make a purchase, the record of the visit is automatically saved as data. You can find out which product you are interested in and how long you have stayed at the shopping mall. Not only shopping, but also financial transactions such as banking and securities, education and learning, leisure activities, data search and e-mail, etc., spend most of their time on PCs and the Internet.
The proliferation of Machine to Machine (M2M) communication, in which people and machines and machines and machines exchange information, is also the reason for the explosive increase in digital information.
Video contents including UCC produced by users, and texts generated from mobile phones and SNS (Social Network Service) are showing different aspects from the past in terms of shape and quality as well as the speed of data growth.
In particular, text information circulated on blogs or SNS can analyze not only the propensity of the person who wrote the text, but also the connection relationship of the other person with whom they communicate. In addition, the use of photo or video content through a PC has already become commonplace, and broadcasting programs are also viewed on a PC or smartphone without going through a TV receiver.
Twitter alone generates an average of 155 million views per day, and YouTube averages 4 billion video views per day. The global data size is predicted to increase to 2.7 zettabytes in 2012 and 7.9 zettabytes in 2015 (IDC, 2011). One zettabyte is 1000 exabytes, and one exabyte is 100,000 times the amount of information in the Library of Congress printouts (Lynman, P., & Varian, H., 2003).
The amount of video information captured by CCTV installed not only on major roads and public buildings, but even inside apartment elevators is enormous. In other words, every action in daily life is stored as data.
<Figure 1> The emergence of internet companies and the scale of global digital data
Source: Jeong Yong-chan (2012a). page 4. |
Not only the private sector, but also the public sector is mass-producing data. Data is being produced in the fields of various social surveys, including the Census, national tax data, medical insurance, and pensions. The full-fledged implementation of smart work is also expected to accelerate data growth (Korea Communications Commission, 2011).
2. Characteristics and Meaning of Big Data
It is common to summarize the characteristics of big data in the 3 Vs. In other words, it means the amount of data (Volume), the speed of data generation (Velocity), and the variety of forms (Variety) (O’Reilly Radar Team, 2012). Recently, value or complexity is sometimes added.
This diverse and vast amount of data is attracting attention in that it can be used as an important resource that determines the superiority of future competitiveness. Attempts to find meaningful information by analyzing large-scale data have existed in the past. However, the current big data environment means a paradigm shift in terms of quantity, quality and variety of data compared to the past. From this point of view, big data is regarded as an important source for innovation, competitiveness enhancement, and productivity improvement in the era of IT and smart revolution, like coal during the industrial revolution (McKinsey, 2011).
Businesses started customer relationship management (CRM) activities in the 1990s to activate marketing activities by using customer data they possess. CRM refers to various marketing activities such as maintaining and preventing customer churn through a data warehouse that integrates data owned by a company and customer data analysis (Data Mining). A company’s CRM activities include not only its own customer data, but also affiliate marketing using data from affiliates. Recently, through a combination of purchase history information, web-log analysis, and location-based service (GPS), the technology base has been established to suggest services desired by consumers at the right place at the right time.
This customer analysis is facing a turning point in the era of big data. By utilizing big data technology such as distributed processing, it is possible to quickly analyze large-scale customer information that is incomparable to the past. It analyzes company-related search terms and comments generated on Twitter and the Internet to identify customer reactions to its products and services in real time and take immediate action.
Efficient system operation even without building a high-cost data warehouse based on existing expensive storage and databases by utilizing open source Hadoop, analysis package R, distributed parallel processing technology, and cloud computing for software and hardware. this is possible.