Though Arrow was for around from some time (more than a year), in its recent release they came up with compression support that makes it be more user friendly for all data science activities across different languages. In my earlier posts, I have encouraged R users to use fst, however, in recent days, as Data Scientists frequency use both R and Python, an effective small files that can be easily read and write are useful in both development and testing phase.
In, my current testings where we are using a specific customer data of size about > 3 GB (CSV)
for developing and testing a new algorithm, recent “arrow” version's write to parquet has
yielded in half the size of R’s fst
when compressed with “gzip”
compressor. Similarly, results are obtained even with python.
--- Happy R & Python
programming let me know your experiences.
Views expressed here are
from author’s industry experience. Author trains and blog’s on Machine (Deep)
Learning applications with various programming languages; for further details,
he will be available at mavuluri.pradeep@gmail.com for more details. Find more about author at
http://in.linkedin.com/in/pradeepmavuluri