Tuesday 2 June 2015

Essential Packages in R Programming Language



R is one of the trending programming language for Data Analysis with enormous amount of packages contributed by developers from all prospects and background. Around 4000 packages are listed on CRAN website itself, but are they all determined? Certainly not.

This blog familiarizes you with some of the essential packages in R programming language from different domains that are most extensively used during Data Analysis.

sqldf

I    It is one of the core package used by the Analysts to perform SQL queries on R data frames. Sqldf uses SQLite syntax. If you want to load data from an external source or databases, then R has connecting drivers to most of them. Some of the instances for this are:- 

           RODBCRMySQLRPostgresSQLRSQLite for reading data from the database.
·         XLConnectxlsx for reading and writing Microsoft Excel files from R.
·         foreign reads SAS and SPSS datasets in R. it also helps you load data files from other programs in R.

ggplot2

Most essential package among all data visualization packages widely used by the R programmers. It is fundamentally an application of the grammar of graphics in R to present your results in more understandable way by building customizable plots.  

  

plyr

Data manipulation in R is the most essential step for reforming your data according to your requirements. It is manifested that almost 80% of the time is devoted in data preparation however data manipulation is one of the step incorporated while preparing data.
Plyr package widely helps in manipulation of data by contributing essential functions that it contains for repositioning, subsetting, combining datasets together, summarizing etc. it is recommended using plyr if you are dealing with apply family of functions for data manipulation in R.
Some of the other essential packages used for data manipulation are:-
·         lubridate mainly deals with dates and times handling.
·         stringr works with regular expression and characters.

randomForest

·         One of the major package used for building non-linear models. It is simple to use and works on peculiar types of datasets.
·         Added advantage about this package is it can be used as a feature reduction algorithm.
·         For instance when your dataset has more than 200 variables and you need to find the most remarkable ones, randomForests package has a variable importance function which will only list out the important variables in the dataset. If you are willing to start working on non-linear models, you must start with this package initially.

caret

·         caret package is used for building better predictive models it deals with data handling, feature selection, building multiple predictive models using various techniques.
·         Performs validation checks and prints out the model performance diagnostics.
·         This assuredly looks like a lot and getting used to all these functions would take some time too, but once you are through with that it will make your model building skills more enjoyable. And due to this fact, caret has become popular in recent years amongst R programmers especially in Predictive Analytics field.

Unknown

Author & Editor

Has laoreet percipitur ad. Vide interesset in mei, no his legimus verterem. Et nostrum imperdiet appellantur usu, mnesarchum referrentur id vim.

0 comments:

Post a Comment