R is one of the trending programming language for
Data Analysis with enormous amount of packages contributed by developers from
all prospects and background. Around 4000
packages are listed on CRAN website itself, but are they all determined?
Certainly not.
This
blog familiarizes you with some of the essential packages in R programming language from different domains that are most extensively used during Data
Analysis.
sqldf
I It is one of the core package used by the Analysts
to perform SQL queries on R data frames. Sqldf uses SQLite syntax. If you want
to load data from an external source or databases, then R has connecting
drivers to most of them. Some of the instances for this are:-
RODBC, RMySQL, RPostgresSQL, RSQLite for reading data from
the database.
·
XLConnect, xlsx for reading and writing
Microsoft Excel files from R.
·
foreign
reads SAS and SPSS datasets in R. it also helps you load data files from other
programs in R.
ggplot2
Most essential package among all data visualization
packages widely used by the R programmers. It is fundamentally an application
of the grammar of graphics in R to present your results in more understandable
way by building customizable plots.
plyr
Data
manipulation in R is the most essential step for
reforming your data according to your requirements. It is manifested that
almost 80% of the time is devoted in data preparation however data manipulation
is one of the step incorporated while preparing data.
Plyr
package widely helps in manipulation of data by
contributing essential functions that it contains for repositioning, subsetting, combining datasets together, summarizing
etc. it is recommended using plyr if you are dealing with apply family of
functions for data manipulation in R.
Some of the other essential packages used for data
manipulation are:-
·
lubridate mainly deals with dates and
times handling.
·
stringr works with regular expression
and characters.
randomForest
·
One of the major package used for building non-linear models. It is
simple to use and works on peculiar types of datasets.
·
Added advantage about this package is it
can be used as a feature reduction
algorithm.
·
For instance when your dataset has more
than 200 variables and you need to find the most remarkable ones, randomForests
package has a variable importance function which will only list out the
important variables in the dataset. If you are willing to start working on
non-linear models, you must start with this package initially.
caret
·
caret package is used for building
better predictive models it deals with data
handling, feature selection,
building multiple predictive models
using various techniques.
·
Performs validation checks and prints
out the model performance diagnostics.
·
This assuredly looks like a lot and
getting used to all these functions would take some time too, but once you are
through with that it will make your model building skills more enjoyable. And due
to this fact, caret has become popular in recent years amongst R programmers
especially in Predictive Analytics field.
0 comments:
Post a Comment