This blog discuss about the empirical aspects of business analytics and addresses the same through Data Science, Machine Learning and Deep Learning solutions via open source tool viz. R/Spark/Python.
Showing posts with label wrong practices in analytics. Show all posts
Showing posts with label wrong practices in analytics. Show all posts
Apr 9, 2020
Oct 31, 2014
Is data mining more about fitting data well? - Exercise Results
Today, I am going to share
results of an exercise that I carried out recently for a start-up. Intention of
the study was to extract those major attributes that are generally driving less/in
experienced (or) re-skilled data miners towards the given objective and to
understand where they are failing back. Herein, twist is majority of them have
given same conclusion or explanation for the given objective. Results highlight
or comment on, those important aspects of the practice where most of them failed
to cognize for the sake of quick answer/solution.
Sample Observed:
All members of the sample had experience
both with R and data mining solutions; either through course projects
(free/paid/part-of-curriculum) or through industry experience, however, industry
experienced sample have been limited between minimum of 1 year to maximum of 3 years
from whatever domain. Details of sample are as below:
- 17 - Fresher’s from various engineering background (both Graduates and Post-Graduates)
- 12 - Fresher’s from various quantitative background (Maths, Stats, MBAs, Econometrics, etc.)
- 18 - Experienced from different industry background (data management related, programming, consulting, etc.)
- All members of the sample belong to two major cities of India.
About Test Data:
Bank data of customers belonging
to a particular city branch having around 17000 observations for a period of one
month, which as information about customer’s age, few demographics, no of transactions
they did in that month, whether they visited branch in that month, etc., total
of 12 variables.
Infrastructure Provided:
Computing machine with a pre-installed latest R (3.1.1)
& RStudio that has 8GB RAM and Intel Core i7 Processor.
Objective:
“Comment about the variables ‘visiting branch’ and ‘age’ relationship”.
Time Limit:
A time limit of 20 minutes was given, which was almost two
and half times more than average time of experienced people, took to give their
comments.
Highlights from the Exercise:
- As mentioned earlier, almost all except few has given same inference that ‘numbers of visits to branch’ have positive relationship with ‘age’ of the customers. In other words, as age is increasing, customers are preferring to visit the branch. Not to forget to mention, interestingly most of them are comfortable with R programming except few typo errors, kudos to all developers making it more user friendly.
- Astonishingly, only 21% of the sample, has done some data understanding after reading the data, i.e. looking into descriptive stats either through summary functions or plots before moving to the modeling part. In these 21%, not even a single sample member is from engineering background (by saying this I am not generalizing it, nor against engineering background, but commenting from sample perspective). Also, perceptibly, another 15% came back to data understanding after fitting at least one or two models.
- One more astonishment is, type of techniques employed by participants went onto deep learning methods. Average number of models applied by all participants was near to 3, herein, there are few participants, who didn’t even fitted a single technique/model.
- Only 15% of the sample, had clearly mentioned that result may be spurious or declined to comment on relationship due to noise in the data; however, only half of them came out with explanations for the same.
- Notable fact from our exercise is that, many of them directly applied the techniques they are aware (few among them directly fitted neural networks, and then came back to machine learning classification techniques as they need to comment on relationship). And, more than half of the sample first directly test with a variant of Generalized Linear Model and then went to applications of other techniques as they found explanatory power of the model was low and they were behind all data mining techniques till time limit ends.
What was wrong in the data?
When this data was originally received, I observed that due
to a machine/man-made mistake, column ‘age of the customer’ in the data was
having representation of an additive nature, for instance, if customer has
visited the branch twice in the month and his original age is 25, it appeared
as 50. Hence, positive relationship as age increased, however, it was not the
case after the noise removal.
Summary:
Data Mining is a process of many stages as depicted in CRISP-DM1
and data understanding is key of them, I always suggest process your data
incrementally, if you want efficient analytical solution, ignoring it, and employing
which fits the data well practice, may not work in all situations.
Author thank management of start-up for allowing to publish exercise highlights. He undertook several programs towards analytical talent development, views expressed here are from his industry experience. He can be reached at mavuluri.pradeep@gmail for more details.
Oct 14, 2014
Lynchpins for Analytical Skill Development
As business are adopting more and more data-driven strategies (analytics) in their day to day life, I keep on listening from leadership or concerned people that training provided towards it, are not having anticipated impact. Herein, pragmatic confession would be happy with thought that 'it is not a pure science' (or) let’s appreciate the concepts and different relationships involved for their success:
Author has developed and undertook several programs towards
analytical talent development, views expressed here are from his industry experience
that lead him to develop/design analytical training's as fun concepts with
games having clues. He can be reached at mavuluri.pradeep@gmail for more
details.
Oct 11, 2014
Adoption of in-memory computing, a better choice for SMEs analytical capabilities
Delivering analytical solutions using in-memory computing can be a better choice for small and medium data enterprises (SMEs) if followed few good practices:
Author has worked and implemented in-memory analytical solutions and views expressed here are from his industry experience, he can be reached at mavuluri.pradeep@gmail for more details.
May 2, 2013
Step Towards Making Data Analysts More Productive in Shorter Time?
Today lots of reports have been published and expected more; highlighting analytical skills shortage, however, rare of them address, how it can be achieved in short term as in long-term supply and demand factors will match. Below are few things, which I have implemented in the past which helped in short term stint of less than three months for data analysts to become more productive:
1) Focusing on business problems and help analysts define them appropriately. It worked superbly, since addressing or questioning required analysis helps them to quickly grasp complexity.
2) Moving away from process of ETL.
3) Make them learn and comfortable with automating repetitive tasks during the analytical cycle.
4) Make them map business questions to analytical solutions using several mind map games (can be in both class-room environment or offline/online group environment)
5) Leverage them against appropriate analytical tools.
Please feel free to reach me at mavuluri.pradeep@gmail.com to know about process more.
1) Focusing on business problems and help analysts define them appropriately. It worked superbly, since addressing or questioning required analysis helps them to quickly grasp complexity.
2) Moving away from process of ETL.
3) Make them learn and comfortable with automating repetitive tasks during the analytical cycle.
4) Make them map business questions to analytical solutions using several mind map games (can be in both class-room environment or offline/online group environment)
5) Leverage them against appropriate analytical tools.
Please feel free to reach me at mavuluri.pradeep@gmail.com to know about process more.
Mar 30, 2009
Analytic Companies should avoid following practices for a better growth
Hi,
Welcome after a gap, yes I was busy with just collecting some detailed info on analytics practices in Indian companies.
I found following listed 'common wrong practices' made by several companies who say they offer analytic services.
1) Engagement of Non Analytical background person as a Manger or Head for analytical team:
a) This first creates a confusion of where exactly the analytics practice comes into picture as a part of Business solutions when proposed to the clients.
b) Mainly in the nature of the work to be offered or proposed, most non analytical background person does get confuse and most of the times offer the work related to data warehousing or management related.
2) One thing all the companies or managers should note that analytics starts with data not with data management.
Others common wrong practices will continued in next posts.
Welcome after a gap, yes I was busy with just collecting some detailed info on analytics practices in Indian companies.
I found following listed 'common wrong practices' made by several companies who say they offer analytic services.
1) Engagement of Non Analytical background person as a Manger or Head for analytical team:
a) This first creates a confusion of where exactly the analytics practice comes into picture as a part of Business solutions when proposed to the clients.
b) Mainly in the nature of the work to be offered or proposed, most non analytical background person does get confuse and most of the times offer the work related to data warehousing or management related.
2) One thing all the companies or managers should note that analytics starts with data not with data management.
Others common wrong practices will continued in next posts.
Subscribe to:
Posts (Atom)