Jul 11, 2016

Big Data Insights: Tale of IT Investments and Returns

Once again, this post brings forth to the audience, a predictive analytical insight from huge volumes of information technology security data belonging to two fortune 500 companies (more or less having similar characteristics). Going to a quick background of the study, here, analytical interest was to know how both organizations understood and invested in their IT Security over a period of time and what was their ROI (Return on Investment)?

With respect to my earlier Big Data Insight post, I got many queries about data, hence, herein, I am publishing data used for plotting purposes, for quick play in R. As, just mentioned above, volumes were huge, and all initial volumes were processed in Apache Spark stack in cloud environment. Now, as usual, below analysis has been carried out using R Programming Language components viz., R-3.3.1, RStudio (favorite IDE), ggplot2 package for plotting.

Now, lets understand the below plot, x-axis has 'year' as measure that ranges from 1999 to 2015, y-axis has numbers observed for major threats and IT Security employees at both the organizations (Org). If one starts looking at the year 2000, it is evident that Org A has more threats than Org B, however, both organizations had their number of IT Security employees around 10 (Org A have only few more employees compared to Org B, also, it is clear that Org B has one more employee than Org A in earlier year 1999). But, Org A for next 2-3 years has increased its IT Security employess to 20 in number, where as Org B has more or less maintained same number of employees for next set of 10 years. As a result, Org B has reached a stage wherein their number of major threats exploded and went beyond existing teams control, whereas, Org A initial invesment in employees worked out better for them and their number of major threats were more or less either stable or decreased over a period of time (don't forget, here acheiving zero is impossible given new technologies, applications coming every year).

Data employed for the plot:
dput(IT_threats_returns)
structure(list(Year = c(1999, 1999, 1999, 1999, 2000, 2000, 2000, 
2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002, 2003, 2003, 
2003, 2003, 2004, 2004, 2004, 2004, 2005, 2005, 2005, 2005, 2006, 
2006, 2006, 2006, 2007, 2007, 2007, 2007, 2008, 2008, 2008, 2008, 
2009, 2009, 2009, 2009, 2010, 2010, 2010, 2010, 2011, 2011, 2011, 
2011, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2014, 2014, 
2014, 2014, 2015, 2015, 2015, 2015), Numeric_Value = c(28, 11, 
9, 10, 36, 26, 13, 7, 28, 26, 17, 9, 26, 29, 21, 10, 32, 21, 
19, 9, 25, 34, 19, 10, 30, 35, 20, 10, 22, 27, 19, 10, 31, 42, 
19, 11, 29, 47, 19, 11, 28, 45, 22, 11, 25, 55, 23, 13, 30, 51, 
21, 14, 25, 49, 22, 13, 32, 60, 22, 19, 25, 53, 25, 24, 19, 49, 
25, 29), Desc = c("Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", 
"Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", 
"Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps"
)), .Names = c("Year", "Numeric_Value", "Desc"), row.names = c(NA, 
68L), class = "data.frame")

# code used for plotting
library(ggplot2)
p <- ggplot(IT_threats_returns, aes(x=Year, y=Numeric_Value, col=Desc)) + geom_line(linetype=5, size=1) + theme_light() + theme(legend.position="none") + ylab("") + xlab("")
p + annotate("text", x=c(2012, 2012, 2004.5, 2012.5), y=c(47,34,18,10.5), label=c("   `Org_B` : No_of_Major_Threats", "   `Org_A` : No_of_Major_Threats", "   `Org_A` : No_of_IT_Security_Emps", "   `Org_B` : No_of_IT_Security_Emps"), col=c("#C77CFF", "#7CAE00", "#F8766D", "#00BFC4"))
Created by Pretty R at inside-R.org

2 comments:

James Walmsley said...

Great article Pradeep!

I am learning R now as part of a Data Science Specialization and articles like this one are giving me some insight into how the skills in R that I've been learning can be used in the real world.

Thanks

James Walmsley

Pradeep Mavuluri said...

James, Thanks and Welcome to Data Science World.