Mar 31, 2019

Tips for R to Python and Vice-Versa seamlessly

When we TATVA AI visit our clients, often both data scientists and higher management ask us, how we deal with both  Python and R simultaneously for client requests; as there is no universal preference among clients.


Though solution is not straight forward, however, I suggest to exploit common libraries for quick deployments, such as, dfply (python) and dplyr (R). Below is a quick example:


## R Code
library(dplyr)
testdata %>% filter(col.1==10)

## Python Code
from dfply import *
testdata >> filter_by(X.col.1==10)


Those who have not yet experience this can try out now and let me know your experiences at either info@tatvaai.com or mavuluri.pradeep@gmail.com

Happy R and Python Programming!

Mar 27, 2019

Record Linkage, Entity Matching, Data De-Duplication, when free, why paid services?

Recurrence of records that refer to same values / entities across different data bases is a concern in modern data age; as they are not ready for next level of analysis and criticized for creating noise in the models. The task of finding or cleaning such tasks are known as record linkage, data matching, entity resolution, etc., assumed importance from past few decades.

One can find here, list of few matching software that has both commericial and open source versions. Still, TATVA AI finds its clients’ having appropriate criticisim about data security, cost, scalability and accuracy.

So, why this is arising, since, each domian or data bases has its unique proposistion on data storage and identifiers. They have not been been designed to cater directly for solving business problems through data science or for machine learning algorithms, thus, one type of solution will not fit for different designed data.

Hence, TATVA AI suggests clients to exploit the open source technologies to have custom built for their own needs.

Reach out us at info@tatvaai.com or mavuluri.pradeep@gmail.com for more information.