Rstudio has changed its website extension from rstudio.org to
rstuido.com/ide and kept its new preveiw version at below link, few new
things as "package development" tools are included in this version that makes job easy to develop new package.
http://www.rstudio.com/ide/download/preview
Enjoy R programming.
This blog discuss about the empirical aspects of business analytics and addresses the same through Data Science, Machine Learning and Deep Learning solutions via open source tool viz. R/Spark/Python.
Oct 28, 2012
Apr 27, 2012
Read Big Text Files Column by Column
Dear R Programmers,
There is new package "colbycol" on CRAN, which makes our jobs easier when we have large files i.e. more than a GB to be read in R. Especially, when we don't need all of the columns/variables for our analysis. Kudos for author, Carlos J. Gil Bellosta.
I have tried it on a 1.72 GB data, where in my main interest was "few columns" where it has more 300 columns and 500,000 rows. Since, it is easy to know about how many columns exist by reading few lines of data (also refer to my earlier post http://costaleconomist.blogspot.in/2010/02/easy-way-of-determining-number-of.html and ?readLines), R job of getting what I want was completed with few lines as below (and also in quicker time):
library(colbycol)
cbc.data.7.cols <- cbc.read.table("D:/XYZ/filename.csv", just.read = c(1, 3, 21, 34, 108, 205, 227), sep = ",")
nrow(cbc.data.7.cols)
colnames(cbc.data.7.cols)
# then on can convert simply to data.frame as follows
train.data <- as.data.frame(cbc.data.7.cols, columns = 1:7, rows = 1:50000)
Also, refer to http://colbycol.r-forge.r-project.org/ for quick intro by author.
Have a nice programming with R. Author can be reached at mavuluri.pradeep@gmail.com.
There is new package "colbycol" on CRAN, which makes our jobs easier when we have large files i.e. more than a GB to be read in R. Especially, when we don't need all of the columns/variables for our analysis. Kudos for author, Carlos J. Gil Bellosta.
I have tried it on a 1.72 GB data, where in my main interest was "few columns" where it has more 300 columns and 500,000 rows. Since, it is easy to know about how many columns exist by reading few lines of data (also refer to my earlier post http://costaleconomist.blogspot.in/2010/02/easy-way-of-determining-number-of.html and ?readLines), R job of getting what I want was completed with few lines as below (and also in quicker time):
library(colbycol)
cbc.data.7.cols <- cbc.read.table("D:/XYZ/filename.csv", just.read = c(1, 3, 21, 34, 108, 205, 227), sep = ",")
nrow(cbc.data.7.cols)
colnames(cbc.data.7.cols)
# then on can convert simply to data.frame as follows
train.data <- as.data.frame(cbc.data.7.cols, columns = 1:7, rows = 1:50000)
Also, refer to http://colbycol.r-forge.r-project.org/ for quick intro by author.
Have a nice programming with R. Author can be reached at mavuluri.pradeep@gmail.com.
Subscribe to:
Posts (Atom)