Follow by Email

Jul 4, 2018

Data Summary in One Go

Data Description R Code

This function and package is long pending for publishing from my side, this time expecting soon to put as package for quick usage, before that thought releasing it for feedback.

Below function provides R code for getting data description details like missing, distinct, min, max, mean, median, mode in one go for ready to use and for quick interpretation purposes.

This provide regular data summary stats needed (as shown in below image) in *.csv format which can be copied an pasted to excel as per your needs.

To use it follow the syntax:

source("https://raw.githubusercontent.com/pradeepmav/data_description_function/master/data_description.R")
data_description("datasetname")

 
Happy R Programming!


Author trains & develops Machine Learning (AI) applications, and can be reached at info@tatvaai.com or besteconometrician@gmail.com for more details.
Find more about author at http://in.linkedin.com/in/pradeepmavuluri

Jun 27, 2018

Commit to Memory: What you cannot expect with TensorFlow for Automated Machine Learning?

Views expressed here are from author's industry experience. Author trains on Machine Learning applications and can be reached at info@tatvaai.com or besteconometrician@gmail.com for more details. Find more about author at http://in.linkedin.com/in/pradeepmavuluri

Jun 21, 2018

AI/ML Talent Availability Against Market Expectations

Views expressed here are from author's industry experience. Author can be reached at info@tatvaai.com or besteconometrician@gmail.com for more details.
 
Find more about author at http://in.linkedin.com/in/pradeepmavuluri

Mar 26, 2018

Why Record Linkage needs a scalable computing power?


Though, “Record Linkage” is a popular word among statisticians, and epidemiologists - “the problem of matching/joining records from one data source to another which describe the same entity”; has a long historical attention from the time since data collection gained (1960s) and continues to gain attention as new methods of collection, formats and stacks of data being added to the existing. The other popular terms for the same are deduplication, data matching, entity/name resolution, record matching, etc. Please, refer to the following paper https://homes.cs.washington.edu/~pedrod/papers/icdm06.pdf, for one of the good works in this field. Also, one can look at the below google trends graph for the attention to this filed from 2014 to the present.



The purpose of this blog is to bring forth, why record linkage needs a scalable computing power, for which I present my observations with an simple example as show below:



Views expressed here are from his industry experience. He can be reached at mavuluri.pradeep@gmail or besteconometrician@gmail.com for more details.

Find more about author at http://in.linkedin.com/in/pradeepmavuluri

Feb 15, 2018

Python package maintenance GUI’s like R ones



It is quite often I observed people approaching me to help out in understanding whether Python has good Graphical User Interface (GUI) for package maintenance like R ones:




Answer for it would be; currently few like this one “pips” are still under development, and still I am using “pip-upgrader” for the maintenance.

Dec 19, 2017

Read and write using fst & feather for large data files.




For past few years , I was using feather as my favorite data writing and reading option in R (one reason was its cross platform compatible across Julia, Python and R), however, recently, observed it’s read and write time lines were not at all effective with large files of size > 5 GB. And found fst format to be good for both read and write of large files in R, only disadvantage is it not cross platform compatible as feather. Especially, fst compression with 100% is good for storage of large files and it is more efficient in reading the same into R environment. Also, I feel no point in re-testing and giving benchmarks as they are already available at fst site.


                                                                     ------ Happy R programming let me know your experiences.