Cloud Computing Assignment on Big Data Security Issues
Question
Task: Conduct a research on cloud computing assignment and prepare a report discussing the Big Data security issues in cloud computing. Propose a framework to overcome the identified issues.
Answer
Introduction
In today’s globalised world, most of the business organisations insist on effective and efficient solutions to amalgamate and assess enormous sets of information. Cloud computing in the form of an enabler offers potential resources and essential economical advantages in the form of decreased operational costs. During recent years, Big Data along with cloud computing tend to be the two major issues that facilitating computing resources to be offered as the same as IT services with greater effectiveness and efficiency. The concept of Big Data is referred to as the specific datasets with its size beyond the potential of the software tools and techniques that are utilised to course the information within the stipulated deadline.
The safeguarding and security of information tend to be major issues for cloud computing and the main reason for this is due to its open environment with limited authority of the user side. This also serves as a potential challenge for Big Data. Considering the future years, it can be said that a large proportion of data will be accessed via cloud computing which will act as a strong base for extra storage, calculation, and distributed potential in regards to the dispensation of the Big Data (Gholami and Laure, 2016).
This paper will further develop a critical understanding of the investigation of the different reasons and factors for the security problems and issues that are associated with the usage of Big Data in cloud computing. The primary benefit of cloud computing is that it guides the business organisation towards enabling the increase in demand stipulation of server resources including bandwidth, manage, CPU’s and storage to assess the Big Data quickly. In the context of modern organisations including Amazon Web Services, the space of cloud computing has been extensively dominated as a result of the increasing competition amongst the major rivals.
A section of methodology has also been included in this proposal to provide a critical comprehension of the different research process and methods used. The main thing that is important to be acknowledged is that the different cloud platforms tend to be in various forms incorporated with conventional architectures and this leads to errors in the effective decision-making of the Big Data projects (Ahmed and Saeed, 2014). This also raises a query concerning if cloud computing can be considered as the most appropriate choice for requirements of computing especially if the project is related to Big Data.
The large sets of Big Datahave evolved to be significantly important and are being developed in every activity field ranging from e-commerce to healthcare. The varying nature of Big Data tends to need elasticity and infrastructure flexibility hence, cloud bursting was made available through hybrid environments enabling to load higher workloads. It can be said that a majority of the cloud computing infrastructures includes dependable services distributed via data centres to accomplish a higher level of availability via redundancy(Khan, 2016).
This paper will also demonstrate the research goals and objectives that tend to be accomplished in dealing with potential research constraints and limitations. Some security and confidential factors will be discussed that affect the cloud providers activities in association with the lawful procession of the consumer data. Some additional sources of knowledge and cloud computing information will be referred to provide the learners with an understanding of the elimination of safety and security of Big Data along with the different security methods and algorithms. Most of the potential researchers have devised different cloud application systems covering all the protocol associated parameters for data gathering.
Need/Aim of the research
The research is aimed at understanding the security issues, which are concomitant with Big Data use in
Cloud Computing
Problem Statement
There are certain technical and security issues while handling Big Data in Cloud Computing such as honeypot detection, encryption, and logging. However, it has been observed that many companies using Cloud Computing in their operations face security challenges such as losing confidential data, malware attacks, malicious insiders, and denial in service. Some of the potential challenges dealt in the environment of cloud computing include lack of effective training, deficient user authority, the difficulty of regulatory conformity, litigation, and constraints in the flow of global data (Fernandeset al., 2014).Apart from this, encryption of enormous data is not possible and the processing of such big data to be encrypted alongside decryption will consume a considerable amount of time. Besides, the aspect of multitenancy in cloud creates security issues associated to data breach and privacy as well.
Some researchers have explained these safety and security challenges of genomic information in the cloud consisting terms of services of extensive cloud providers without their approval, data security and monitoring, and accountability. Further, the study will be describing similar research problems along with some potential findings to enhance security via coordination between various companies and firms. In case, the problem remainsunsolved, it could turn out to be a big loss for companies dealing with big data. It could lead to harm the sensitivity and confidentiality of the client data and can be misused as well. Thus, it is of utmost importance to ensure that effective control systems are placed in order to increase the safety of big data stored in cloud storages.
Background and Literature Review
Background
Big Data is definedas a large set of data which can be either structured or unstructured depending upon nature (Zhang, 2018)These data are used to analyse and reach conclusion for various research work. It is not the quantity of data that is the point of concern, but it is the usage of this data that matters, as such huge amount of data can be used for a lot of reasons by various organizations. The data is measured according to its volume, velocity, and Variety, but the most determining factor is the variety of the data. In today's world, Big Data plays a very important role in every organization as all market research depends on the results derived by analysing these data. The data helps in analysing the market and plan the marketing strategies according to the requirements of the customers.
On the other hand Cheng et al., (2018), states that Cloud Computing is the process of conducting various technical services through the help of internet. It is through a network that the databases are accessed, and the software can be used to derive results which also helps in saving a lot of space as the major database is cloud-based. Stergiouet al.,(2018), stated that it becomes difficult to store such a huge quantity of data in hard drives, and it is also not viable to carry such drives every time service needs to be conducted and at such a time cloud storage helps in accessing data at minimum time and it is even easy to achieve.
As per opined by Althagafy and Qureshi, (2017), in the past couple of years, organisations are encountering severe challenges in managing the exponentially growing data along with the capacity of the databases is also increasing exponentially. The explosion of big data is produced by numerous sources such as business processes, online transactions, social networking, client data, and son. Therefore, the processing or assessing such enormous amount of data in order to extract eloquent information is an extremely challenging task. The basic types of cloud computing implementation models are public, hybrid, and private clouds. The main reason for the utilization of cloud computing is to efficiently deal with the big data is reduction in hardware costs, less processing cost, and the latent of assessing the big data value. The use of AWS, Hive, Mapreduce framework plays a vital role in addressing such issues in the best possible and efficient way.
On the other hand, MetaCloudDataStorage security Architecture is utilised to address computers and its related elements including network systems, storage space, backup units of power and redundant network. In the modern corporate times, cloud computing has given birth to various safety threats including malicious elements, rejection of service, data loss and breach mainly, initiating from various issues such as loss of information, multi-tenancy, and beliefs. Some of the security and safety issues including trust chain in clouds, multi-tenancy and loss of control tend to impact the cloud computing system ultimately, leading to various concerns. Additionally, it is also important to acknowledge that business organisations amalgamating and storing data in clouds needs to confirm that whether the safeguarding and security of the data are maintained properly to form the basis for legal accessibility to personal information.
Figure 1: Use of Big Data and Cloud Computing for Business
(Source: Stergiouet al., 2018)
The Roles & Relationship between Big Data & Cloud Computing
Cloud Computing is network-based service provided for processing the data to make it easy for the customers and the data scientists so that the whole process of market research becomes easier to control and coordinate. According to Simseket al., (2019), there is a console which commands the systems in each location of the user so that he or she can be guided in performing the specific service. Some very prominent products which are a part of Cloud Computing are database management systems, machine learning, cloud-based virtual machines and identity management system.
Big data is network generated as the quantity is huge and the collection method becomes tedious if not monitored through networks. Besides, as stated above is it either in standard format or in a non-standard format and for the non-standard format of data artificial intelligence and machine learning features can be used in Cloud Computing to standardize the data. The Cloud Computing also ensures that the data is processed as it is received which saves time and effort. It can intake enormous chunks of data and process it at the same time and the data analysis part is also very less time consuming when it comes to Cloud Computing. On the contrary, Cheng et al., (2018),opined thatBig Data is huge quantities of data which is not easy to manage at manual level hence it is necessary that Cloud Computing is used hand in hand to analyse that data in real-time and reach to results in the minimum period.
However Varghese and Buyya, (2018), stated that Big Data and Cloud Computing are a perfect match as we have seen so far that they both go hand in hand to deliver the best results. Cloud Computing applications are a huge hit in today's market, but it is at a boom only because it can handle the analysis of Big Data in a fraction of seconds and bi data can be worked upon only because cloud applications are supporting its processing and analysis. They both exist because of each other and the importance becomes very minimum of one without the other. The only reason for collecting bid data is because the cloud application can access it, process it, and analyse it in real-time generating results which are accurate.
Big Data Vs Cloud Computing
Big Data and Cloud Computing are two terms with which everyone is accustomed to now as these are the driving factors of any business and research. Big Data is the game-changer in the market, as now with the help of Cloud Computing the industries can have real-time access to the customer requirements and the dissatisfactions as well (Subramanian, and Jeyaraj, 2018). The customer satisfaction is of utmost priority for any business and these two terms make the work possible and easy. Though these are related work-wise there is some difference in the two which are important to note when working with them. Some of the differences are mentioned below:
- Concept: In Cloud Computing we can store huge chunks of data and access it any time and process it to generate results, it is known for processing the data with various features to generate results. On the other hand, Big Data is a huge amount of data which is useful only once it is processed when the end information is received.
- Characteristics: The services provided by Cloud Computing are SaaS, PaaS, IaaS whereas Big Data as mentioned above has three features of Velocity, variety and Volume.
- Accessibility: Cloud Computing provides universal access but the operation on these is done by the developers. In the case of Big Data, it is only the end information which can be accessed and used, otherwise, it is just a chunk of data that cannot be put into any analysis.
- Usage: A business or a customer shifts to Cloud Computing when they need results in real-time and is very important to them for business structure and marketing. It is cost-effective hence that leverages the usage even more also the huge chunk of data can be analysed only by Cloud Computing so easily. Big Data, on the other hand, can solve specific problems related to the stack in hand (Yang et al., 2017).
- Budget: Big Data is a costly process as numerous data is recorded and that is time-consuming and cost consuming. The data entry requires a lot of manpower and even recording the responses is a tedious job role. So, to collect a huge quantity of data and accumulate in at one place is a costly process. Whereas Cloud Computing is a very cost-effective method as it does not require the regular intervention of maintenance as these are cloud features which can be accessed and implemented by the developers at their time.
- Trends: The basic types of Cloud Computing are private, public, hybrid, and community. In the case of Big Data, the most important trends are MapReduce, HDFS and Hadoop.
- People specific: In the case of Cloud Computing it is the frontend and backend software developers who are responsible for the functioning of AI and ML. These are the full stack developers who control such functions in an organization. Whereas in Big Data it is the data analysts and data entry subordinates who collect the data, put in together in the network and then analyse it for information.
Data security and privacy issues in Cloud Computing
As the dependency on technology is increasing so is a risk of privacy issues. When we talk about Big Data it is the data collected from people and then someone in the business has access to this which becomes risky (Stergiou, and Psannis, 2017). On top of that when this data is processed through the network via Cloud Computing then a lot of safety measures should be considered so that there is no hacking or illegal usage of this data happening. Some security and privacy issues are stated below:
- Ransomware and IoT: We are all aware of the increase in cyber-crime, one such way the lawbreakers commit such crimes is by encoding the data of individuals and asking for ransom for returning the encryption key. The Information Security Forum which takes care of such crimes has seen a rise in such case in the recent times where the businesses are a risk of paying a ransom for getting back the data encryption key to protect their customers so that their data is not used in illegal forums. These kinds of activity generally happen in IoT platforms (Satapathy, Moharana, and Ojha, 2016).
- Cyber Warfare: Cyber-crimes are not only affecting the businesses and individuals, but it has the power to harm the country and can even be a catalyst in starting a war. The government data is also available in network platforms and if such a network is hacked by another country then that country stands at a profit of more information which it can use against the nation. It can even destroy a nation financially by hacking the banking networks of the country and then removing the money from the accounts. Though such kind of actions would not start a war immediately, it does provide information to opposition nation the strength and upper hand. With the help of cyber-attacks, there can be a risk of being spied, manipulation, cyber bombs or maybe in worst cased entire network can be accessed in minutes (Abbasi, and Shah, 2017).
- AI-powered attacks: Artificial intelligence is called so because of its nature of learning by adjusting codes in the data without any processing being required. Though this makes the technology smarter and reduces the work of many and has even made life faster, it also has its risks. Today globally the threat that everyone cannot avoid is the presence of people who can hack such IoT platform and the same artificial intelligence which is now a boon to mankind can be the main weapon in the hands for the traitor for a digital attack. The voice-powered digital assistants have taken place in the personal space in homes and office and this is scary because is a digital attack is placed then the personal space of a person would not be safe. It makes possible to control and monitor IoT devices such as our smartphones, routers and digital assistants which can give away all personal information and that information could be used for an illegal purpose (Gaiet al., 2016).
- Untrusted mappers: It is a general format that when Big Data is gathered then the data parallelly handled and this method is known as MapReduce. At the time when the information is received then the mapper divides them into several volumes and guides then to alternate storage spaces. But during this act the mapper code if hacked by an outsider then he or she can tamer with the information and change the desired slot of the information where it needs to be stored, it can even out in alien data in place of actual information. This will ruin the system and the outsider gets access to sensitive data (Li et al., 2017).
- Hybrid cloud: Hybrid clouds are powerful as it lets private cloud to combine with other public clouds which increases communication and is profitable for the business as well. But this is risky as well because through the public forum there is a greater chance of the public cloud being hacked and information accessed and used for illegal purpose.
Figure 2: Challenges of CSA’s Big Data Working Group
(Source: cloudsecurityalliance.org, 2020)
Conclusion
We have noticed from the above discussions that Big Data and cloud computing are the new ways of life in today's business world and these are two things which nearly everyone should be aware of if they are planning to launch a new business or looking towards growth in the business front. It is noticed that the new ideas in the market which were not able to fly before for limited resources can do so because of the help of Big Data and Cloud Computing. The businesses can use the data they had collected and run it through the system to collect information on customer behaviour. More and more start-ups are using these IT services to fight competition in the market and come out as dominant market players.
Literature Gap
In general terms, the literature gap is known as the less explored topic in the whole research paper which is of importance but have not been expressed and talked about in detail. It is known as the work of not exploring fully on a potential sub-topic. For this research, the gap was that no concrete security measures are being developed for securing the Big Data in Cloud Computing
Research Plan
Scope
In the context of the proposal requirements and limitations of feasibility, the proposal will be conducted on the problem statement concerned with the various security challenges and issues associated with the utilisation of Big Data in cloud computing of international organisations. Some different methods and research will be extended to gather relevant information regarding the topic and subsequently, conduct a critical assessment to cater to the different requirements and identified aims of the research (Ahmed and Hossain, 2014). In this proposal, a proposed framework would be developed regarding privacy using the AWS and Apache Hive.
Research Questions
The first research question deals with the way in which big data can be processed efficiently alongside secure through incorporation of Amazon Cloud, Mapreduce framework and Hive whereas the second question deals with effectiveness of MetaCloudStorage Framework in processing and securing the data.
- How can BigData be processed faster and secure integrating Amazon Cloud, Mapreduce framework and Hive?
- How effective is MetaCloudStorage Framework in processing and securing the data?
Methodology
The research is based upon finding out the security challenges faced using Big Data in Cloud Computing, the experimental method has been deemed accurate for this study. In this study proposed research, the chosen research method would aid in experiencing certain benefits as well. Firstly, the chosen method is cost-effective as it would use the AWS (Amazon Web Services) for understanding the categorisation of the Data into three levels. Furthermore, using MetaCloudDataStorage security architecture can be proposed for the research to protect the Big Data in Cloud Computing. The classification of the Data includes Normal, Critical and Sensitive and each categorised data would be stored in a different data centre. The chosen interface of the MetaCloudDataStorage will redirect the user request efficiently towards the appropriate datacentre available in the Cloud that is offered by varied vendors.
To process the log files, the AWS CloudTrail has been incorporated in the proposed methodology and the AWS Key Management Service (KMS) is integrated with the former. It aids in delivering the log files into an Amazon S3 bucket. With the help of effective API, the CloudTrailcan be integrated with any kind of application. The AWS CloudTrail also helps in maintaining the API call time and the IP address of the caller. In this methodology, the datacentres have been divided in the form of a sequence of n parts and each part is represented by part k (k (1, n)), and m different storage providers will be used to store this and each provider is identified in the form of provider l (l (1, m)). Furthermore, m (number of providers) is always far lesser than n (parts of the datacentre) and belongs to organisations such as Google, Amazon, and Salesforce. Storing Big Data would form a unique storage path- Mapping Storage_Path = {Data ((P1(M1, M2 ... Mr)) (P2(M1, M2 ... Ms)) ... (Pn (M1, M2 ... Mt))}; where
P- storage provider
M- physical storage media
Due to the large size of Big Data, encrypting is impossible and hence the proposed methodology has suggested a cryptographic value known as cryptographic virtual mapping of the Big Data.Therefore, this proposed research has suggested protecting the mapping of the various data elements to each providers using the MetaCloudDataStorage interface instead of securing the Big Data itself.
Figure 3: End User Accessing Applications and Data in a Distributed Cloud
(Source:
Figure 4: Security Architecture for Meta Cloud DataStorage in Cloud
(Source:
The Map Reduce refers to a programming framework that processes tasks parallellyacross a huge size of the systems. With the help of Map function, the huge size of input data is split into key, value> pairs.
Mapper Function
Mapper Function
public void Map(LongWritable key, Text value, OutputCollector output, Reporter reporter)
for each key ? value do
Emit(term key; count 1)
Reducer Function
public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter)
sum?0
for each v ? value do
sum?sum + v
Emit(key, sum)
The Big Data collected through the Map function would be analysed and processed using the Apache Hive in Amazon Web Service. The Apache Hive is an open source software running on top of Hadoop in the Amazon EMR. The architecture Hive has been used to process the stored log files that have been stored in the Amazon S3 such as-
05:05:2020,56,address
05:05:2020,67,index
06:05:2020,47,sponsored
The data from the AWS would be fetched using the following Hive command-
hive> select count(*) from bankdetails where Time >= 40;
Research Constraints
The time for completing was less and hence it could not be properly conducted. Furthermore, there was a lack of dataavailability and accessibility. There was also elasticity and scalability of the data issues and it is yet to be determined whether the proposed framework would work for all sectors or not.
Time Plan and Milestones
Task |
Duration (days) |
Start |
Finish |
Conduct the introduction for the study |
5 |
15-10-2020 |
20-10-2020 |
Literature review |
8 |
20-10-2020 |
28-10-2020 |
Draft the scope of the study |
3 |
28-10-2020 |
31-10-2020 |
Determine the methodology |
5 |
31-10-2020 |
05-11-2020 |
Use the Amazon Web Services for Data Categorisation |
5 |
05-11-2020 |
10-11-2020 |
Develop the MetaCloudDataStorage Architecture |
10 |
10-11-2020 |
20-11-2020 |
Classify the data into Normal, Critical and Sensitive |
7 |
20-11-2020 |
27-11-2020 |
Process the log files and develop the AWS CloudTrail |
4 |
27-11-2020 |
01-12-2020 |
Integrate the AWS Key Management Service (KMS) |
5 |
01-12-2020 |
06-12-2020 |
Use the Amazon S3 bucket |
5 |
06-12-2020 |
11-12-2020 |
Use Mapper Function |
8 |
11-12-2020 |
19-12-2020 |
Use Reducer Function |
4 |
19-12-2020 |
23-12-2020 |
Use the AWS Apache Hive |
3 |
23-12-2020 |
26-12-2020 |
First draft |
9 |
26-12-2020 |
04-01-2021 |
Review with lecturer |
3 |
04-01-2021 |
07-01-2021 |
Second draft |
5 |
07-01-2021 |
12-01-2021 |
Submission |
1 |
12-01-2021 |
13-01-2021 |
Conclusion and Further Work
This study proposed the MetaCloudDataStorage security Architecture for securing the Big Data in Cloud Computing. The Map Reduce framework has been used to gain information regarding the number of users that were logged on into the cloud data centre. It has suggested protecting the mapping of various data elements for each provider with the use of MetaCloudDataStorage security interface. The future work is the extension of the proposed MetaCloudDataStorage security Architecturefor real-time processing for the streaming of data.
Total word count: 3909?
Acknowledgements and References
"Cloud Security Alliance", Cloud Security Alliance, 2020. [Online]. Available: https://cloudsecurityalliance.org/articles/csa-releases-the-expanded-top-ten-big-data-security-privacy-challenges/. [Accessed: 11- Oct- 2020].
Abbasi, B.Z. and Shah, M.A., September. Fog computing: Security issues, solutions and robust practices.In 2017 23rd International Conference on Automation and Computing (ICAC) (pp. 1-6).IEEE, 2017.
Ahmed, E.S.A. and Saeed, R.A. A survey of Big Data cloud computing security. International Journal of Computer Science and Software Engineering (IJCSSE), 3(1), pp.78-85, 2014. Ahmed, M. and Hossain, M.A. Cloud computing and security issues in the cloud. International Journal of Network Security & Its Applications, 6(1), p.25, 2014.
Althagafy, E. and Qureshi, M.R.J., 2017. Novel cloud architecture to decrease problems related to big data. International Journal of Computer Network and Information Security, 9(2), p.53.
Cheng, N., Lyu, F., Chen, J., Xu, W., Zhou, H., Zhang, S. and Shen, X., Big Data driven vehicular networks. IEEE Network, 32(6), pp.160-167 2018.
Fernandes, D.A., Soares, L.F., Gomes, J.V., Freire, M.M. and Inácio, P.R. Security issues in cloud environments: a survey. International Journal of Information Security, 13(2), pp.113-170, 2014.
Gai, K., Qiu, M., Zhao, H. and Xiong, J., June.Privacy-aware adaptive data encryption strategy of Big Data in Cloud Computing.In 2016 IEEE 3rd International Conference on Cyber Security and Cloud Computing (CSCloud) (pp. 273-278). IEEE, 2016
Gholami, A. and Laure, E., Big Data security and privacy issues in the cloud. International Journal of Network Security & Its Applications (IJNSA), Issue January, 2016.
Khan, M.A., A survey of security issues for cloud computing. Journal of network and computer applications, 71, pp.11-29, 2016.
Li, Y., Gai, K., Qiu, L., Qiu, M. and Zhao, H., Intelligent cryptography approach for secure distributed Big Data storage in Cloud Computing. Information Sciences, 387, pp.103-115, 2017
Satapathy, S.K., Moharana, S.K. and Ojha, A.K., Implication of Security Issues Associated With Big Data In Cloud Computing. International Journal of Recent Trends in Engineering & Research (IJRTER), 2(04), pp.2455-1457, 2016.
Simsek, Z., Vaara, E., Paruchuri, S., Nadkarni, S. and Shaw, J.D., New ways of seeing Big Data 2019.
Stergiou, C. and Psannis, K.E., Efficient and secure Big Data delivery in Cloud Computing. Multimedia Tools and Applications, 76(21), pp.22803-22822, 2017.
Stergiou, C., Psannis, K.E., Kim, B.G. and Gupta, B., Secure integration of IoT and Cloud Computing. Future Generation Computer Systems, 78, pp.964-975 2018.
Subramanian, N. and Jeyaraj, A., Recent security challenges in Cloud Computing. Computers & Electrical Engineering, 71, pp.28-42, 2018.
Varghese, B. and Buyya, R., Next generation Cloud Computing: New trends and research directions. Future Generation Computer Systems, 79, pp.849-861, 2018.
Yang, C., Huang, Q., Li, Z., Liu, K. and Hu, F., Big Data and Cloud Computing: innovation opportunities and challenges. International Journal of Digital Earth, 10(1), pp.13-53, 2017.
Zhang, D., October. Big Data security and privacy protection.In 8th International Conference on Management and Computer Science (ICMCS 2018).Atlantis Press 2018.