Dec 16, 2015

Big Data Insights - IT Support Log Analysis

This post brings forth to the audience, few glimpses (strictly) of insights that were obtained from a case of how predictive analytic's helped a fortune 1000 client to unlock the value in their huge log files of the IT Support system. Going to quick background, a large organization was interested in value added insights (actionable ones) from thousands of records logged in the past, as they saw both expense increase at no higher productivity.

As, most of us know in these business scenarios end-users will be much interested in out-of-knowledge, strange and unusual things that may not be captured from regular reports. Hence, here data scientist job not only ends at finding un-routine insights, but, also needs to do a deeper dig for its root cause and suggest best possible actions for immediate remedy (knowledge of domain or other best practices in industry will help a lot). Further, as mentioned earlier, only few of those has been shown/discussed here and all the analysis has been carried out using R Programming Language components viz., R-3.2.2RStudio (favorite IDE)ggplot2 package for plotting.

The first graph (below one) is a time series calendar heat map adopted from Paul Bleicher, shows us the number of tickets raised day-wise over every week of each month for the last year (green and its light shades represent less numbers, where as red and its shades represent higher numbers).



Herein, if one carefully observe the above graph, it will be very evident for us that, except for the month of April & December, all other months have sudden increase in the number of tickets raised over last Saturday's and Sunday's; and this was more clearly visible at Quarter ends of March, June, September (also at November which is not a Quarter end). One can think of this as unusual behavior as numbers raising at non-working days. Before, going into further details, lets also look at one more graph (below), which depicts solved duration in minutes on x-axis and their respective time taken through a horizontal time line plot.

The above solved duration plot show us that out of all records analyzed 71.87% belong to "Request for Information" category and they have been solved within few minutes of tickets raised (that's why we cannot see a line plot for this category as compared to others). So, what's happened here actually was a kind of spoof, because of lack of automation in their systems. In simple words, it was found that there doesn't exists a proper documentation/guidance for many of applications they were using; such situation was taken as advantage for increasing the number of tickets (i.e. nothing but, pushing for more tickets even for basic information in the month ends and quarter ends, which resulted in month end openings which in turn forced them to close immediately). Discussed one here is one of those among many which has been presented with possible immediate remedies which can be easily actionable.

Visual Summarization:





Dec 1, 2015

Average Expenses for TV across states of USA

This post makes an attempt to depict the averages spent across the states towards their TV channel expenses for a big size country (USA). Though it has been developed using sample data belonging to a particular service provider; this post depicts its interest in regional differences in average spent on said service across the country. Herein, I would like to bring to your notice that economic importance of some USA states being notably better connected with multiple service providers and due to geographical location and population density, results/insights may be specific to this sample data. All the analysis has been carried out using R Programming Language components (R-3.2.2, RStudio (favorite IDE), ggplot2 for mapping).

Average Amount Spent ($) on TV by States in 2015 (till Nov):

The figure below depicts the map of 48 states of USA (for which the data was available) wherein it shows the average TV expense by state for the year 2015 which was available till end of the November; with five different colors (i.e. five different intervals of average spent). 


As it is evident from above map that for given sample North-East (region) states region has highest averages spent on TV. Next best averages (orange color) are noticed in pacific region and few Central and East states. As mentioned earlier this may be due to economic importance or due to service provider geographical spread which the employed sample data fails to take an in-depth note.

Author undertook several projects and programs towards data sciences, views expressed here are from his industry experience. He can be reached at mavuluri.pradeep@gmail for more details.

Sep 15, 2015

"R" in Top 20 of TIOBE Index

Dear R programmers,


In this May (2015), our favorite "R" almost came to 12th position in the popular TIOBE Programming Community Index (TIOBE Index), however, it is experienced some volatility after that and couldn't move further to top 10. Currently, it holds 19th rank (for this month); wishing it retains its position in top 20 through the rest of the months of the year and hope to move quickly into top 10. Also, find below my compilation of what is R for the analytical solutions (System for Statistical Computation & Graphics), here, it doesn't mean or ignore the latest "machine learning" word, I treat it to be also part of our statistical computations.



Author (mavuluri.pradeep@gmail.com or pmavuluri@analyticaltis.com) of this post, had been using R for complete analytical solutions and educating purposes for a long time. Some 3rd party copyright materials have been reused here under fair usage approach for educating purposes. Hence, current blog post usage is restricted for educating and information purposes and fall under usual copyright usage terms.

Enjoy "R" Programming!