With respect to my earlier Big Data Insight post, I got many queries about data, hence, herein, I am publishing data used for plotting purposes, for quick play in R. As, just mentioned above, volumes were huge, and all initial volumes were processed in Apache Spark stack in cloud environment. Now, as usual, below analysis has been carried out using R Programming Language components viz., R-3.3.1, RStudio (favorite IDE), ggplot2 package for plotting.
Now, lets understand the below plot, x-axis has 'year' as measure that ranges from 1999 to 2015, y-axis has numbers observed for major threats and IT Security employees at both the organizations (Org). If one starts looking at the year 2000, it is evident that Org A has more threats than Org B, however, both organizations had their number of IT Security employees around 10 (Org A have only few more employees compared to Org B, also, it is clear that Org B has one more employee than Org A in earlier year 1999). But, Org A for next 2-3 years has increased its IT Security employess to 20 in number, where as Org B has more or less maintained same number of employees for next set of 10 years. As a result, Org B has reached a stage wherein their number of major threats exploded and went beyond existing teams control, whereas, Org A initial invesment in employees worked out better for them and their number of major threats were more or less either stable or decreased over a period of time (don't forget, here acheiving zero is impossible given new technologies, applications coming every year).
dput(IT_threats_returns) structure(list(Year = c(1999, 1999, 1999, 1999, 2000, 2000, 2000, 2000, 2001, 2001, 2001, 2001, 2002, 2002, 2002, 2002, 2003, 2003, 2003, 2003, 2004, 2004, 2004, 2004, 2005, 2005, 2005, 2005, 2006, 2006, 2006, 2006, 2007, 2007, 2007, 2007, 2008, 2008, 2008, 2008, 2009, 2009, 2009, 2009, 2010, 2010, 2010, 2010, 2011, 2011, 2011, 2011, 2012, 2012, 2012, 2012, 2013, 2013, 2013, 2013, 2014, 2014, 2014, 2014, 2015, 2015, 2015, 2015), Numeric_Value = c(28, 11, 9, 10, 36, 26, 13, 7, 28, 26, 17, 9, 26, 29, 21, 10, 32, 21, 19, 9, 25, 34, 19, 10, 30, 35, 20, 10, 22, 27, 19, 10, 31, 42, 19, 11, 29, 47, 19, 11, 28, 45, 22, 11, 25, 55, 23, 13, 30, 51, 21, 14, 25, 49, 22, 13, 32, 60, 22, 19, 25, 53, 25, 24, 19, 49, 25, 29), Desc = c("Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps", "Org_A _ No_of_Major_Threats", "Org_B _ No_of_Major_Threats", "Org_A _ No_of_IT_Security_Emps", "Org_B _ No_of_IT_Security_Emps" )), .Names = c("Year", "Numeric_Value", "Desc"), row.names = c(NA, 68L), class = "data.frame")
# code used for plotting
library(ggplot2) p <- ggplot(IT_threats_returns, aes(x=Year, y=Numeric_Value, col=Desc)) + geom_line(linetype=5, size=1) + theme_light() + theme(legend.position="none") + ylab("") + xlab("") p + annotate("text", x=c(2012, 2012, 2004.5, 2012.5), y=c(47,34,18,10.5), label=c(" `Org_B` : No_of_Major_Threats", " `Org_A` : No_of_Major_Threats", " `Org_A` : No_of_IT_Security_Emps", " `Org_B` : No_of_IT_Security_Emps"), col=c("#C77CFF", "#7CAE00", "#F8766D", "#00BFC4"))
2 comments:
Great article Pradeep!
I am learning R now as part of a Data Science Specialization and articles like this one are giving me some insight into how the skills in R that I've been learning can be used in the real world.
Thanks
James Walmsley
James, Thanks and Welcome to Data Science World.
Post a Comment