Follow by Email

Mar 23, 2015

Imparting Data Sciences - Industry Practices - Part 2

In continuation to my last post Part 1, today, I would like to bring few more observations from the industry that revolve around whether data scientists are supposed to run algorithms (or) they are meant towards solving business solutions?

q Firstly, hardly any checks exist for “does whatever data scientists applied for mining the data (let it be small or big) is helpful in providing solutions that can be combined with the sphere of real world understanding and their needs? (Reasoning or Strategic Reasoning)”.
q  Secondly, ignoring client requirements deeply and presenting the technologies they are comfortable in, which cannot be heart and soul of data sciences.
q  Finally, its became a habit of searching a better algorithm once and using the same for years ignoring new data dimensions which are adding day-by-day, wherein, its failure later provides blame on complete data sciences.

Author undertook several projects and programs towards data sciences, views expressed here are from his industry experience. He can be reached at mavuluri.pradeep@gmail or for more details.

Feb 24, 2015

Imparting Data Sciences - Industry Practices - Part 1

This is what happening in the industry from the observation over a period of time;  fresher’s/juniors’ forced to become more proficient with either a programming language or Statistics/ML theory, which is leading to these guys “trying all possible models with the hope that at least one will fit the data well”. This generally can also lead to misuse of the model or algorithm, wherein, here, domain knowledge which is critical and helps them in (as listed below) are ignored completed while imparting the data sciences:

q  Rightly define or appropriately refine the business problem e.g. economic reasoning between price and sales help in accurately determine the model or model tasks for analysis.
q  Can guide and provide right direction as an ad hoc model, may be difficult to justify both analysis and potentiality of the results obtained.
q  More importantly, can provide past knowledge and reduces some trial and error involved in choosing the "right" model for empirical analysis that provide actionable analytical insights.

Author undertook several programs towards data sciences talent development, views expressed here are from his industry experience and personal observations. He can be reached at mavuluri.pradeep@gmail or for more details.

Nov 18, 2014

Big Data HR Analytical Insights – Large Organization Performance and Growth Over a Period – Part 2

In continuation to the last post which was published last week BigDataHR_Part1, today I would like to highlight further insights on what happened when the organization went with penalizing mood for the average performers and rewarding higher performers. However, this was for one of their average earning revenue division over a period of time, herein also, organization has expected that growth will be exponential. But, it was for the shorter period (it took some momentum and pushed the division growth to good number) what happened later, whether growth momentum continued, again herein, I tried to summarize through below graph.

If actual growth momentum that was observed initially would have continued, then, organization’s business growth (cycle) should have taken the green line of business growth curve, since, now no more exists that heavy tyre as explained in the last post BigDataHR_Part1, that can push its pace down. However, it took course of red line, that resembling initial pick up in growth followed by flat line there after; one of the main reasons for this was that after certain period, high performer’s couldn’t alone drive growth with out the support of average performers was very clear in observation.

Nov 14, 2014

Big Data HR Analytical Insights – Large Organization Performance and Growth Over a Period – Part 1

Below graph, explains the summary of analytical insights obtained from a large organization’s performance data with respect to growth over a period of time. Presented results are obtained from one of their high revenue division, wherein, first identification of high performers and average performers happened. Said organization had rewarded their high performers promptly with larger benefits expecting that growth will be exponential. But, to surprise, next year they didn’t observed expected growth in the division. However, ignoring it, the same has been continued for the coming year, yet, not seen expected growth. Continuing the same policy, organization thought of giving a data-driven approach about what was happening?

When observed such large division performance and growth data over a period of time, following insights came out, which I tried to summarize through above graph. Organization was expecting a exponential form of growth curve year after year which is represented by green line of business growth (cycle) driven by rewarding high performer’s timely. However, organization had a large number of average performers, though they are not rewarded as good as compared to high performers, organization’s resource utilization towards them got out weighted such that high performers alone were unable to drive the growth cycle and it tilted down entire growth cycle to take slow paced curvy linear growth curve represented by red line.

Watch out for other Big Data HR Analytical Insights in coming posts.

Author has worked extensively in the HR Analytics and can be reached at mavuluri. pradeep@gmail for related discussions/projects.

Oct 31, 2014

Is data mining more about fitting data well? - Exercise Results

Today, I am going to share results of an exercise that I carried out recently for a start-up. Intention of the study was to extract those major attributes that are generally driving less/in experienced (or) re-skilled data miners towards the given objective and to understand where they are failing back. Herein, twist is majority of them have given same conclusion or explanation for the given objective. Results highlight or comment on, those important aspects of the practice where most of them failed to cognize for the sake of quick answer/solution.

Sample Observed:
All members of the sample had experience both with R and data mining solutions; either through course projects (free/paid/part-of-curriculum) or through industry experience, however, industry experienced sample have been limited between minimum of 1 year to maximum of 3 years from whatever domain. Details of sample are as below:
  1. 17  - Fresher’s from various engineering background (both Graduates and Post-Graduates)
  2. 12  - Fresher’s from various quantitative background (Maths, Stats, MBAs, Econometrics, etc.)
  3. 18  - Experienced from different industry background (data management related, programming, consulting, etc.)
  4. All members of the sample belong to two major cities of India.

About Test Data:
Bank data of customers belonging to a particular city branch having around 17000 observations for a period of one month, which as information about customer’s age, few demographics, no of transactions they did in that month, whether they visited branch in that month, etc., total of 12 variables.

Infrastructure Provided:
Computing machine with a pre-installed latest R (3.1.1) & RStudio that has 8GB RAM and Intel Core i7 Processor.

“Comment about the variables ‘visiting branch’ and ‘age’ relationship”.

Time Limit:
A time limit of 20 minutes was given, which was almost two and half times more than average time of experienced people, took to give their comments.

Highlights from the Exercise:
  • As mentioned earlier, almost all except few has given same inference that ‘numbers of visits to branch’ have positive relationship with ‘age’ of the customers. In other words, as age is increasing, customers are preferring to visit the branch. Not to forget to mention, interestingly most of them are comfortable with R programming except few typo errors, kudos to all developers making it more user friendly. 
  • Astonishingly, only 21% of the sample, has done some data understanding after reading the data, i.e. looking into descriptive stats either through summary functions or plots before moving to the modeling part. In these 21%, not even a single sample member is from engineering background (by saying this I am not generalizing it, nor against engineering background, but commenting from sample perspective). Also, perceptibly, another 15% came back to data understanding after fitting at least one or two models.
  • One more astonishment is, type of techniques employed by participants went onto deep learning methods. Average number of models applied by all participants was near to 3, herein, there are few participants, who didn’t even fitted a single technique/model.    
  • Only 15% of the sample, had clearly mentioned that result may be spurious or declined to comment on relationship due to noise in the data; however, only half of them came out with explanations for the same.
  • Notable fact from our exercise is that, many of them directly applied the techniques they are aware (few among them directly fitted neural networks, and then came back to machine learning classification techniques as they need to comment on relationship). And, more than half of the sample first directly test with a variant of Generalized Linear Model and then went to applications of other techniques as they found explanatory power of the model was low and they were behind all data mining techniques till time limit ends.

What was wrong in the data?
When this data was originally received, I observed that due to a machine/man-made mistake, column ‘age of the customer’ in the data was having representation of an additive nature, for instance, if customer has visited the branch twice in the month and his original age is 25, it appeared as 50. Hence, positive relationship as age increased, however, it was not the case after the noise removal.

Data Mining is a process of many stages as depicted in CRISP-DM1 and data understanding is key of them, I always suggest process your data incrementally, if you want efficient analytical solution, ignoring it, and employing which fits the data well practice, may not work in all situations.

Author thank management of start-up for allowing to publish exercise highlights. He undertook several programs towards analytical talent development, views expressed here are from his industry experience. He can be reached at mavuluri.pradeep@gmail for more details.

Oct 14, 2014

Lynchpins for Analytical Skill Development

As business are adopting more and more data-driven strategies (analytics) in their day to day life, I keep on listening from leadership or concerned people that training provided towards it, are not having anticipated impact. Herein, pragmatic confession would be happy with thought that 'it is not a pure science' (or) let’s appreciate the concepts and different relationships involved for their success:

Author has developed and undertook several programs towards analytical talent development, views expressed here are from his industry experience that lead him to develop/design analytical training's as fun concepts with games having clues. He can be reached at mavuluri.pradeep@gmail for more details.

Oct 11, 2014

Adoption of in-memory computing, a better choice for SMEs analytical capabilities

Delivering analytical solutions using in-memory computing can be a better choice for small and medium data enterprises (SMEs) if followed few good practices:

Author has worked and implemented in-memory analytical solutions and views expressed here are from his industry experience, he can be reached at mavuluri.pradeep@gmail for more details.

Sep 4, 2014

Big Data Analytical Services Environment (Success Struggles)


Observations are author's personal views after observing big data space over a period of time, he can be reached at mavuluri.pradeep@gmail for further discussion on this topic.