Feb 16, 2020

Tip (3), Arrow, columnar (binary file) formats for both R and Python


Though Arrow was for around from some time (more than a year), in its recent release they came up with compression support that makes it be more user friendly for all data science activities across different languages. In my earlier posts, I have encouraged R users to use fst, however, in recent days, as Data Scientists frequency use both R and Python, an effective small files that can be easily read and write are useful in both development and testing phase.







In, my current testings where we are using a specific customer data of size about > 3 GB (CSV) for developing and testing a new algorithm, recent “arrow” version's write to parquet has yielded in half the size of R’s fst when compressed with “gzip” compressor. Similarly, results are obtained even with python.



--- Happy R & Python programming let me know your experiences.




Views expressed here are from author’s industry experience. Author trains and blog’s on Machine (Deep) Learning applications with various programming languages; for further details, he will be available at mavuluri.pradeep@gmail.com for more details. Find more about author at http://in.linkedin.com/in/pradeepmavuluri



No comments: