This blog discuss about the empirical aspects of business analytics and addresses the same through Data Science, Machine Learning and Deep Learning solutions via open source tool viz. R/Spark/Python.
Showing posts with label Big Data. Show all posts
Showing posts with label Big Data. Show all posts
Dec 18, 2018
High Frequency Forecasting - Pragmatic Confessions
Views expressed here are from author’s industry experience. Author also trains on Machine (Deep) Learning applications; for further details, he will be available at info@tatvaai.com or mavuluri.pradeep@gmail.com for more details.
Find more about author at http://in.linkedin.com/in/pradeepmavuluri
Mar 26, 2018
Why Record Linkage needs a scalable computing power?
Though, “Record Linkage” is a popular
word among statisticians, and epidemiologists - “the problem of
matching/joining records from one data source to another which describe the
same entity”; has a long historical attention from the time since data collection
gained (1960s) and continues to gain attention as new methods of collection, formats and
stacks of data being added to the existing. The other popular terms for the
same are deduplication, data matching, entity/name resolution, record matching,
etc. Please, refer to the following paper https://homes.cs.washington.edu/~pedrod/papers/icdm06.pdf, for
one of the good works in this field. Also, one can look at the below
google trends graph for the attention to this filed from 2014 to the present.
The purpose of this blog is to bring
forth, why record linkage needs a scalable computing power, for which I present
my observations with an simple example as show below:
Views expressed
here are from his industry experience. He can be reached at mavuluri.pradeep@gmail or
besteconometrician@gmail.com for more details.
Find more about author at http://in.linkedin.com/in/pradeepmavuluri
Dec 19, 2017
Read and write using fst & feather for large data files.
For past few years , I was using feather as my favorite data writing and reading option in R (one reason was its cross platform compatible across Julia, Python and R), however, recently, observed it’s read and write time lines were not at all effective with large files of size > 5 GB. And found fst format to be good for both read and write of large files in R, only disadvantage is it not cross platform compatible as feather. Especially, fst compression with 100% is good for storage of large files and it is more efficient in reading the same into R environment. Also, I feel no point in re-testing and giving benchmarks as they are already available at fst site.
------ Happy R programming let me know your experiences.
Subscribe to:
Posts (Atom)