Launching our R Training Path

Become a proficient R user in no time via DataCamp’s newly launched guided R Training Path!

We’re very excited to announce that we have just launched the R Training Path! This guided training path will take you from getting to know R, the leading open-source programming language in statistics and data science, over manipulation and visualization tools, to working with big data at your own pace.  All courses are top-quality thanks to our partners such as RStudio and Revolution Analytics.

You can access all current and future R tutorials in the R Training Path by taking a monthly subscription plan of just $25. No contract or commitment, so you are free to pause or cancel whenever you want. Current R tutorials include data manipulation tools such as dplyr and data.table, visualizations with ggvis, and many more. Courses on R Markdown and the RStudio IDE will be launched soon.

The R Training Path will give you access to over 100 short videos and slides, and more than 400 interactive coding challenges. The online interactive learning platform offers a unique, hands-on learning experience, and doesn’t require you to do any installation or set-up. Everything  takes place in the comfort of your own browser.

Check it out.

rocket
Share Button

A data.table R tutorial by DataCamp: intro to DT[i, j, by]

This data.table R tutorial explains the basics of the DT[i, j, by] command which is core to the data.table package. If you want to learn more on the data.table package, DataCamp provides an interactive R course on the data.table package. The course has more than 35 interactive R exercises – all taking place in the comfort of your own browser – and several videos with Matt Dowle, main author of the data.table package, and Arun Srinivasan, major contributor. Try if for free.

A data.table R tutorial by DataCamp

If you have already worked with large datasets in RAM (1 to more than 100GB), you know that a data.frame can be limiting: the time it takes to do certain things is just too long. Data.table solves this for you by reducing computing time. Evenmore, it also makes it easier to do more with less typing. Once you master the data.table syntax from this data.table R tutorial, the simplicity of doing complicated operations will astonish you. So you will not only be reducing computing time, but programming time as well.

The DT[i,j,by] command has three parts: i, j and by. If you think in SQL terminology, the i corresponds to WHERE, j to SELECT and by to GROUP BY. We talk about the command by saying “Take DT, subset the rows using ‘i’, then calculate ‘j’ grouped by ‘by’”. So in a simple example and using the hflights dataset (so you can reproduce all the examples) this gives:

library(hflights)
library(data.table)
DT <- as.data.table(hflights)
DT[Month==10,mean(na.omit(AirTime)),
                  by=UniqueCarrier]
UniqueCarrier               V1
1:         AA         68.76471
2:         AS         255.29032
3:         B6         176.93548
4:         CO         141.52861
...

Where we subsetted the data table to keep only the rows of the 10th Month of the year, calculated the average AirTime of the planes that actually flew (that’s why na.omit() is used, cancelled flights don’t have a value for their AirTime) and then grouped the results by their Carrier. We can see for example that AA (American Airlines) has a very short average AirTime compared to AS (Alaska Airlines).  Did you also notice that R base functions can be used in the j part? We will get to that later.

The i part

The ‘i’ part is used for subsetting on rows, just like in a data frame.

DT[2:5]
#selects the second to the fifth row of DT
Year Month DayofMonth DayOfWeek DepTime ArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime AirTime
1: 2011 1 2 7 1401 1501 AA 428 N557AA 60 45
2: 2011 1 3 1 1352 1502 AA 428 N541AA 70 48
3: 2011 1 4 2 1403 1513 AA 428 N403AA 70 39
4: 2011 1 5 3 1405 1507 AA 428 N492AA 62 44
ArrDelay DepDelay Origin Dest Distance TaxiIn    TaxiOut Cancelled CancellationCode Diverted
1: -9 1 IAH DFW 224 6 9 0 0
2: -8 -8 IAH DFW 224 5 17 0 0
3: 3 3 IAH DFW 224 9 22 0 0
4: -3 5 IAH DFW 224 9 9 0 0

But you can also use column names, as they are evaluated in the scope of DT.

DT[UniqueCarrier=="AA"]
#Returns all those rows where the Carrier is
American Airlines
Year Month DayofMonth DayOfWeek DepTime ArrTime
UniqueCarrier FlightNum TailNum
ActualElapsedTime
1: 2011 1 1 6 1400 1500 AA 428 N576AA 60
2: 2011 1 2 7 1401 1501 AA 428 N557AA 60
3: 2011 1 3 1 1352 1502 AA 428 N541AA 70
4: 2011 1 4 2 1403 1513 AA 428 N403AA 70
5: 2011 1 5 3 1405 1507 AA 428 N492AA 62
---
3240: 2011 12 27 2 1021 1333 AA 2234 N3ETAA 132
3241: 2011 12 28 3 1015 1329 AA 2234 N3FJAA 134
3242: 2011 12 29 4 1023 1335 AA 2234 N3GSAA 132
3243: 2011 12 30 5 1024 1334 AA 2234 N3BAAA 130
3244: 2011 12 31 6 1024 1343 AA 2234 N3HNAA 139
AirTime ArrDelay DepDelay Origin Dest Distance
TaxiIn TaxiOut Cancelled CancellationCode
Diverted
1: 40 -10 0 IAH DFW 224 7 13 0 0
2: 45 -9 1 IAH DFW 224 6 9 0 0
3: 48 -8 -8 IAH DFW 224 5 17 0 0
4: 39 3 3 IAH DFW 224 9 22 0 0
5: 44 -3 5 IAH DFW 224 9 9 0 0
---
3240: 112 -12 1 IAH MIA 964 8 12 0 0
3241: 112 -16 -5 IAH MIA 964 9 13 0 0
3242: 110 -10 3 IAH MIA 964 12 10 0 0
3243: 110 -11 4 IAH MIA 964 9 11 0 0
3244: 119 -2 4 IAH MIA 964 8 12 0 0

Notice that you don’t have to use a comma for subsetting rows in a data table. In a data.frame doing this DF[2:5] would give all the rows of the 2nd to 5th column. Instead (as everyone reading this obviously knows), we have to specify DF[2:5,]. Also notice that DT[,2:5] does not mean anything for data tables, as is explained in the first question of the FAQs of the data.table package.
Quirky and useful: when subsetting rows you can also use the symbol .N in the DT[…] command, which is the number of rows or the last row. You can use it for selecting the last row or an offset from it.

DT[.N-1]
#Returns the penultimate row of DT
Year Month DayofMonth DayOfWeek DepTime ArrTime
UniqueCarrier FlightNum TailNum ActualElapsedTime AirTime
1: 2011 12 6 2 656 812 WN 621 N727SW 76 64
ArrDelay DepDelay Origin Dest Distance
TaxiIn TaxiOut Cancelled CancellationCode
Diverted
1: -13 -4 HOU TUL 453 3 9 0 0

The j part

The ‘j’ part is used to select columns and do stuff with them. And stuff can really mean anything. All kinds of functions can be used, which is a strong point of the data.table package.

DT[, mean(na.omit(ArrDelay))]
[1] 7.094334

Notice that the ‘i’ part is left blank, and the first thing in the brackets is a comma. This might seem counterintuitive at first. However, this simply means that we do not subset on any rows, so all rows are selected. In the ‘j’ part, the average delay on arrival of all flights is calculated. It appears that the average plane of the hflights dataset had more than 7 minutes delay. Be prepared when catching your next flight!

When selecting several columns and doing stuff with them in the ‘j’ part, you need to use the ‘.()’ notation. This notation is actually just an alias to ‘list()’. It returns a data table, whereas not using ‘.()’ only returns a vector, as shown above.

DT[, .(mean(na.omit(DepDelay)),
mean(na.omit(ArrDelay)))]
         V1       V2
1: 9.444951 7.094334

Another useful feature which requires the ‘.()’ notation allows you to rename columns inside the DT[…] command.

DT[, .(Avg_ArrDelay =
    mean(na.omit(ArrDelay)))]
Avg_ArrDelay
1: 7.094334
DT[, .(Avg_DepDelay = mean(na.omit(DepDelay)),
    avg_ArrDelay = mean(na.omit(ArrDelay)))]
Avg_DepDelay Avg_ArrDelay
1:  9.444951     7.094334

Of course, new column names are not obligatory.

Combining the above about ‘i’ and ‘j’ gives:

DT[UniqueCarrier=="AA", .(Avg_DepDelay =
mean(na.omit(DepDelay)),
Avg_ArrDelay = mean(na.omit(ArrDelay)),
plot(DepTime,DepDelay,ylim=c(-15,200)),
abline(h=0))]
Avg_DepDelay Avg_ArrDelay   V3  V4
1: 6.390144     0.8917558 NULL NULL

Learn the data.table package with DataCamp
Here we took DT, selected all rows where the carrier was AA in the ‘i’ part, calculated the average delay on departure and on arrival, and plotted the time of departure against the delay on departure in the ‘j’ part.

To recap, the ‘j’ part is used to do calculations on columns specified in that part. As the columns of a data table are seen as variables, and the parts of ‘j’ are evaluated as expressions, virtually anything can be done in the ‘j’ part. This significantly shortens your programming time.

The by part

The final section of this data.table R tutorial focuses on the ‘by’ part. The ‘by’ part is used when we want to calculate the ‘j’ part grouped by a specific variable (or a manipulation of that variable). You will see that the ‘j’ expression is repeated for each ‘by’ group. It is simple to use: you just specify the column you want to group by in the ‘by’ argument.

DT[,mean(na.omit(DepDelay)),by=Origin]
Origin       V1
1: IAH 8.436951
2: HOU 12.837873

Here we calculated the average delay before departure, but grouped by where the plane is coming from.  It seems that flights departing from HOU have a larger average delay than those leaving from IAH.

Just as with the ‘j’ part, you can do a lot of stuff in the ‘by’ part. Functions can be used in the ‘by’ part so that results of the operations done in the ‘j’ part are grouped by something we specified in the DT[…] command. Using functions inside DT[…] makes that one line very powerful. Likewise, the ‘.()’ notation needs to be used when using several columns in the ‘by’ part.

DT[,.(Avg_DepDelay_byWeekdays =
   mean(na.omit(DepDelay))),
   by=.(Origin,Weekdays = DayOfWeek<6)]
Origin Weekdays Avg_DepDelay_byWeekdays
1: IAH FALSE     8.286543
2: IAH TRUE      8.492484
3: HOU FALSE    10.965384
4: HOU TRUE     13.433994

Here, the average delay before departure of all planes (no subsetting in the ‘i’ part, so all rows are selected) was calculated first, and grouped secondly, first by origin of the plane and then by weekday. Weekdays is False in the weekends. It appears that the average delay before departure was larger when the plane left from HOU than from IAH, and surprisingly the delays were smaller in the weekends.

Putting it all together a typical DT[i,j,by] command gives:

DT[UniqueCarrier=="DL", .(Avg_DepDelay =
mean(na.omit(DepDelay)),
Avg_ArrDelay = mean(na.omit(ArrDelay)),
Compensation = mean(na.omit(ArrDelay - DepDelay))),
by = .(Origin, Weekdays = DayOfWeek<6)]
Origin Weekdays Avg_DepDelay Avg_ArrDelay Compensation
1: IAH FALSE    8.979730     4.116751     -4.825719
2: HOU FALSE    7.120000     2.656566     -4.555556
3: IAH TRUE     9.270948     6.281941     -2.836609
4: HOU TRUE     11.631387    10.406593    -1.278388

Here the subset of planes flewn by Delta Air Lines (selected in ‘i’) was grouped by their origin and by Weekdays (in ‘by’). The time that was compensated in air was also calculated (in ‘j’). It appears that in the weekends, irrespective of the plane was coming from IAH or HOU, the time compensated while in air (thus by flying faster) is bigger.

There is much more to discover in the data table package, but this post illustrated the basic DT[i,j,by] command. The DataCamp course explains the whole data table package extensively. You can do the exercises at your own pace in your browser while getting hints and feedback, and review the videos and slides as much as you want. This interactive way of learning allows you to gain profound knowledge and practical experience with data tables.  Try it  for free.

Hopefully you know understand thanks to this data.table R tutorial the fundamental syntax of data.table, and are you ready to experiment yourself. If you have questions concerning the data.table package, have a look here. Matt and Arun are very active. One of the next blogposts on the data.table package will be more technical, zooming in on the wide possibilities with data tables. Stay tuned!

Share Button

Switch from SAS, SPSS or STATA to R with our latest course

If you already know SAS, SPSS or Stata, you don’t need to spend time learning how to analyze data. You need a course that focuses on translating your knowledge into R. A course that facilitates switching from SAS, SPSS or STATA to R. That’s why DataCamp’s latest interactive course focuses on statisticians, data analysts, academic institutions, and companies that are switching (or planning to switch) from these commercial statistical software packages to the free and powerful language R.

Like all DataCamp courses, this new course is self-paced, and offers you a great learning experience via a unique combination of challenging interactive exercises and to the point videos. It is given by Bob Muenchen, one of the leading instructors in the R community, and author of R for SAS and SPSS Users (Springer) and R for Stata Users (Springer).

Supplementary to the course content, Bob offers free email support to all course subscribers. Furthermore, many online classes are yours for only 30 days, or for as long as you make an annual payment. This course is yours “forever”.  So you can always go back to all the course materials when you need a refresher or some additional information.

Check out the full course, or take the free preview.  

About an introduction to R for SAS, SPSS, and STATA users

R is a free and powerful software for data analysis and graphics that is rapidly disrupting the market for data analytical tools and software. It is flexible (no need to wait 6 months for updates), extremely comprehensive (over 6000 packages), cross-platform, and has a great community. However, if you come from another statistical software tool it can be a challenge to master the versatility of R. Enter DataCamp’s new interactive course Introduction to R for SAS, SPSS, and STATA Users. An ideal course for those switching from SAS, SPSS or STATA to R. This course:

  • Introduces R jargon using language you’re familiar with.
  • Points out the errors you’re most likely to make. For example, many R functions let you specify which data set to use in a way that looks identical to SAS, but which differs in a way that is likely to lead to perplexing error messages.
  • Demonstrates add-on packages that produce output that is similar your current software’s. R’s built-in functions tend to provide surprisingly sparse output.
  • Covers material to help you migrate to R, or to integrate the use of R into your current software.

In total the course contains over 16 hours of material, 20 chapters, and over a 120 interactive exercises.

Check out the full course, or take the free preview.  

bob_gif

About the instructor

Robert A. Muenchen is the author of R for SAS and SPSS Users and R for Stata Users. He is a consulting statistician with over 30 years of experience and is currently the manager of the Research Computing Support at the University of Tennessee. Bob has conducted research for a variety of public and private organizations and has assisted on more than 1,000 graduate theses and dissertations. His workshops have been attended by people from over 500 organizations. He has written or co-authored over 70 articles published in scientific journals and conference proceedings.

Bob has served on the advisory boards of the SAS Institute, SPSS Inc., the Statistical Graphics Corporation and PC Week Magazine. His suggested improvements have been incorporated into SAS, SPSS, JMP, STATGRAPHICS and several R packages. In other words, he is the ideal instructor if you want to switch from SAS, SPSS or STATA to R.

About DataCamp

The course is set up in DataCamp’s interactive learning platform that aims to enhance your learning experience by allowing you to learn by doing. All the concepts that are introduced during the video lecture are directly tested through challenging interactive assignments with tailored feedback to consolidate your knowledge step by step. You will effectively learn hands on instead of losing time with suboptimal solutions like a four-hour screencast or webinar.

Check out the full course, or take the free preview.  

Share Button

Revolution R Enterprise tutorial: Free 8h interactive tutorial on Big Data Analytics

In need for better ways to handle large data sets? Interested in manipulating, visualizing, and analysing large datasets with RevoScaleR? Then make sure to have a look at this free hands-on Revolution R Enterprise tutorial on Big Data Analytics by Revolution Analytics and DataCamp. Everything takes place in the online interactive learning interface of DataCamp, so no need to do any installations. We’ve set-up the Revolution R Enterprise (RRE) software in the cloud, allowing you to explore the power of Revolution R Enterprise and big data analytics in the comfort of your own browser via a live R environment.

revolution r enterprise tutorial

No hassle, just learning. All the material is presented by short videos and slides to explain major elements. In order to consolidate your learning, every section ends with interactive exercises that let you practice the covered concepts while giving you tailored feedback. Like all DataCamp tutorials and courses, this is a stand-alone tutorial that can be taken wherever you want, whenever you want. You have unlimited access to videos, slides and related content.   

Course content

This interactive tutorial on Big Data Analytics (>8hours of material) gets you started with RRE and the RevoScaleR package, and is ideal for accomplished R users and data analysts that want to experience the functionality of Revolution R Enterprise. Learn how to use RRE to process, visualize, and model terabyte-class data sets at a fraction of the time of legacy products without requiring expensive or specialised hardware. This free Revolution R Enterprise enterprise tutorial covers:

  • The RevoScaleR package that ships with Revolution R Enterprise, and how it deals with Big Data challenges.
  • How to summarize, cross-tabulate and visualise variables in large data sets.
  • How to manipulate and transform large data sets.
  • Building statistical and machine learning models on large data sets.

This is just the first of many other Revolution Analtyics courses that will follow. New courses such as fundamentals of the R programming language, Introductory Statistics with Revolution R Enterprise, Predictive Modeling, and Advanced R Programming, are already in development.

Stay tuned for more, and don’t forget to give feedback on the current course!

Share Button

ggvis tutorial: become a data visualization expert with RStudio

The latest interactive course in the RStudio track is now available on DataCamp: ggvis tutorial. The first part of the tutorial is available for  free, so everyone can now learn interactively how to start creating stunning ggvis data visualizations in R. All courses in the RStudio track are self-paced, and combine challenging interactive exercises with to the point videos. Garrett Grolemund, master instructor at RStudio and R enthusiast, is your guide along this journey.

Check out the full course, or take the free preview.

ggvis tutorial

What is ggvis?

ggvis is the new standard tool for data visualization in R by RStudio. It lets you create static and interactive graphs to display distributions, relationships, model fits, and more. Similar to ggplot2, ggvis uses the grammar of graphics. The grammar provides an intuitive framework that lets you describe – and make – any plot that you can think of in your head. By learning the four components of the grammar, you empower yourself to make thousands of different types of ggvis data visualizations.

Best of all, ggvis plots are true web documents. You can save them as png’s for publication, but they come ready to be shared over the internet. Each ggvis plot can be viewed in a web browser, which opens opportunities not available in R’s native graphics device. For example, with a one or two lines of code, you can turn a ggvis plot into an animation or an interactive data exploration tool. This enables you to do rich data visualizations for analytics, communication and the web.

What is the ggvis tutorial?

This interactive ggvis tutorial will teach you how to use the ggvis package to make data visualizations like a pro. You’ll learn how to use the grammar of graphics to turn your data into sophisticated, layered graphics; and how to customize those graphics. Along the way, you will learn how to visualize statistical transformations of your data, as well as how to add interactive components to your graphs, such as sliders, checkboxes and more. Multiple ggvis examples are provided. Topics covered are:

  • Chapter One: The Grammar of Graphics
    Learn the philosophy that guides ggvis and discover a clear, logical way to think about data visualization.
  • Chapter Two: Lines and Syntax
    Examine each part of the grammar and learn the special syntax that ggvis introduces to make it easier to think about plots.
  • Chapter Three: Transformations
    Learn to build statistical transformations with ggvis’ compute functions, visualize the results, and how to integrate the dplyr package.
  • Chapter Four: Interactivity and Layers
    Create graphs that can be controlled through sliders, text fields, and other widgets. Build sophisticated, multi-layered graphs.
  • Chapter Five: Customizing Axes, Legends and Scales
    Change the appearance of axes and legends in your plots, and use ggvis’ scale system.

The DataCamp ggvis tutorial provides 10 videos, 30 exercises, and a surprise interview with one of the co-creators of ggvis. Go to the full course, and take the free preview. 

Share Button

The data.table R package cheat sheet

The data.table R package provides an enhanced version of data.frame that allows you to do blazing fast data manipulations. The data.table R package is being used in different fields such as finance and genomics, and is especially useful for those of you that are working with large data sets (e.g. 1GB to 100GB in RAM).

Although its typical syntax structure is not hard to master, it is unlike other things you might have seen in R. Hence the reason to create this cheat sheet. DataCamp’s data.table cheat sheet is a quick reference for doing data manipulations in R with the data.table R package and syntax, and is a free-for-all supplement to DataCamp’s interactive course Data Analysis the data.table Way.

data.table R package tutorial cheat sheet

The cheat sheet will guide you from doing simple data manipulations using data.table’s basic i, j, by syntax, to chaining expressions, to using the famous set()-family. You can learn more about data.table at DataCamp.com or read all about it in this data.table tutorial post.

Share Button

Become a data scientist in 8 steps: the infographic

This post was written by the team behind DataCamp, the online interactive learning platform for data science.  

After being dubbed “sexiest job of the 21st Century” by Harvard Business Review, data scientists have stirred the interest of the general public. Many people are intrigued by this job, namely because the name has an interesting ring to it. But it is exactly the name that also raises a lot of questions. Because what is a data scientist and what do data scientists do exactly? Many of us who devote their lives to data science have frequently been confronted with questions like these.

The answers to these questions are mostly not as straightforward as you would expect: a short search on Google with the string of words “How to become a data scientist” shows that the concept has different meanings to different people. In addition, many articles indeed suggest various tools, courses and applications for people to become a data scientist, and with good reason: the options are unlimited. But let’s face it, for someone that is not familiar with the field, this advice may sometimes seem like a jungle of information. What’s more, they could work demotivating: the descriptions are sometimes fearfully long and the many details often hit the readers as an overwhelming avalanche.

DataCamp’s Guide to Become a Data Scientist

With all this in mind, DataCamp decided to help those who can’t see the forest for the trees: we designed a step-by-step infographic that clearly outlines how you can become a data scientist in 8 easy steps.  This visual guide is meant for everyone that is interested in learning data science or for everyone that has already become a data scientist but wants some additional resources for further perfection.  The infographic is called “Become a data scientist in 8 easy steps”. Have a look at it!

How to become a data scientist

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Source: blog.datacamp.com

If you are thinking about becoming a data scientist, do not be taken aback by the eight steps that are presented in the infographic. We would like to emphasize that becoming a data scientist takes time and personal investment, but that the journey is everything but dull! And don’t forget, there are plenty of courses available to set you on the right way.

If you are already a data scientist, drop us a line at info@datacamp.com if you think of other steps that you have undertaken in your professional journey.

Feel free to share!

Embed Code:

<a href="http://blog.datacamp.com/how-to-become-a-data-scientist-in-8-easy-steps-the-infographic/" ><img src="http://blog.datacamp.com/wp-content/uploads/2014/08/How-to-become-a-data-scientist.jpg" alt="Become a data scientist in 8 easy steps" /></a><br/>Source: <a href="http://blog.datacamp.com">blog.datacamp.com</a><br/>
Share Button

Complete dplyr tutorial for data analytics and data manipulation in R

DataCamp just launched its latest interactive course: dplyr. This new course was developed in close collaboration with Garrett Grolemund, RStudio’s master instructor. By taking this dplyr tutorial, you will be challenged one step at a time to master the essentials about transforming data sets fast and intuitively with the dplyr package. Start the course here.

The dplyr package is an exciting new chapter in the mission to bring painless data manipulation to the crowd. It is an R package that provides you with a fast and intuitive way to transform data sets with R. dplyr is the successor of plyr and is mainly authored by Hadley Wickham and Romain Francois. It is designed to be intuitive and easy to learn, thereby making “doing things” in R more user friendly.

This dplyr tutorial introduces five key functions to straightforwardly manipulate data: select, mutate, filter, arrange and summarize. Thanks to optimization in C++, these functions allow you to work extremely fast with larger data sets. These ‘dplyr verbs’ can be understood as the atoms that combine to powerful molecular operations which can handle around 90% of data manipulation tasks. As such, dplyr lets you, as a data scientist, accomplish more things, with more data, in less time. However, dplyr isn’t limited to these five functions; it also enables automated groupwise operations in R, it provides a standard syntax for accessing and manipulating database data with R, and much more. All of this and more is covered and explained in this DataCamp course (check out the contents of the course).

To help you fully grasp the power and ease-of-use of dplyr, DataCamp has developed a brand new interactive course together with Garrett Grolemund. Garrett is a Data Scientist and Master Instructor at RStudio, holds a Ph.D. in Statistics, and specializes in teaching. He is the author of Hands on Programming with R, as well as Data Science with R, an upcoming book from O’Reilly Media. He taught people how to use R at over 50 government agencies, small businesses, and multi-billion dollar global companies.

dplyr tutorial
A video of Garrett Grolemund explaining the dplyr package in the DataCamp course

 

The dplyr tutorial

In the dplyr tutorial, you will learn how to use dplyr to perform basic data manipulation tasks using the five dplyr verbs, as well as combining these to solve challenging problems. You’ll also learn about groupwise operations using group_by(), about the pipe operator to chain your operations, and about the tbl structure which provides a cleaner layout so you can better understand your data. Finally, you will learn how to use the dplyr syntax to access data stored in a database outside R.

The course is set up in DataCamp’s interactive learning platform that aims to enhance your learning experience by allowing you to learn by doing. The course is comprised of 10 sections distributed over five chapters and each section has an instructional video by Garrett, followed by a vast set of interactive exercises. As such, the concepts that are introduced during the video lecture are directly tested through challenging assignments with tailored feedback to consolidate your knowledge step by step. You will effectively learn hands on instead of losing time with suboptimal solutions like a four-hour screencast or webinar.

dplyr tutorial
The DataCamp interactive learning environment

This is the first course of the RStudio datacamp track that will cover some of the company’s flagship products: dplyr, ggvis, rmarkdown, and the RStudio IDE. These other courses are scheduled to launch later this year.

So, if you want to learn more about the powerful dplyr package to solve challenging data analysis problems, head over to DataCamp and start right away!

dplyr tutorial

Share Button

New Course! A hands-on introduction to statistics with R by A. Conway (Princeton University)

The best way to learn is at your own pace. Combining the interactive R learning environment of DataCamp and the expertise of Prof. Conway of Princeton, we offer you an extensive online course on introductory statistics with R.  Start learning now…

Whether you are a professional using statistics in your job, an academic wanting a refresher on specific statistical topics, or a student taking statistics classes, this new DataCamp course will match your needs. It is a comprehensive and friendly course, that requires no background knowledge in statistics or R. The aim is to provide you with a solid foundation for future learning, as well as being able to put one’s work into context. All this takes place in your browser thanks to the DataCamp online learning environment. Try it for free!

Statistics with R

So, how does it all work? You can choose to subscribe to the course as a whole, or to take individual modules according to your own specific needs. The course consists of 7 modules, ranging from the Student’s T-test over ANOVA to simple and multiple linear regression, finally ending with a last module on Moderation and Mediation.  In total there are more than 250 interactive R exercises, which are accompanied by videos and slides. This adds up to 24 hours of material on statistics with R .

statistics with R

Interested?  To give you the opportunity to get a taste of the course content and to try out the DataCamp learning experience, we present you the first module for free. Furthermore, if you are a student, we want you to know that you get a  75% discount on the whole course.

So what are you waiting for? Grab this learning opportunity and check out the course! Remember that the first module is free, that you can buy separate modules according to your needs, and if you buy all 7 modules at once, you get a significant discount.  On top of that, students can get a 75% reduction on the whole statistics with R course.

On Professor Andrew Conway

Prof. Conway is a Senior Lecturer at Princeton and has been teaching to undergrads and graduate students for 20 years. His experience is reflected in the quality of this course. The content of this course has been on Coursera, and back then more than 200,000 individuals followed it, making it the second most popular Coursera course using R.  Psychology students at Princeton are already following the DataCamp course this semester.

On DataCamp

The course is set up in DataCamp’s interactive platform that aims to enhance the learning experience by offering a learning-by-doing approach. The material is presented by short videos and slides to explain major elements. In order to consolidate your learning, every section ends with interactive exercises that let you practice the covered concepts while giving you tailored feedback.

You will discover R’s capabilities and how they interplay with each other step by step. You can learn at your own pace, stopping to take a break or replay a segment at any time. The system tracks your progress so you can stop at any time; it will start up where you left off. This way, you will learn effectively instead of losing time with one-speed-fits-all solutions like a four-hour screencast or webinar. What’s more, in order to consolidate your learning, every section ends with interactive exercises that let you practice the covered concepts while giving you tailored feedback.

 

 

Share Button

Complete data.table tutorial: Data analysis the data.table way

Together with the key people behind the data.table package, Matt Dowle and Arun Srinivasan,  DataCamp developed a brand new interactive course to bring your data analysis skillset up to date with the essentials of the powerful data.table package. Learn more on the data.table tutorial… 

The popularity of the data.table package is increasing and with good reason. Not only is the number of package downloads rising rapidly, but data.table is also talk of the R town given the numerous presentations of Matt and Arun at conferences such as useR!2014, EARL, R/Insurance and R/Finance.

Data.table allows you to reduce your programming time as well as your computing time considerably, and it is especially useful if you often find yourself working with large datasets.  For example, to read in a 20GB .csv file with 200 million rows and 16 columns, data.table only needs 8 minutes thanks to the fread()function.  This  is instead of the hours it would take you with the read.csv() function. Once you understand its concepts and principles, the speed and simplicity of the package are astonishing!

On the data.table tutorial

However, to get the most out of data.table’s functionalities, you first have to overcome its learning curve: even though the syntax is not extremely difficult, it does take some practice to fully grasp it so its built-in functionalities can make your life easier. This is exactly why DataCamp has made an interactive online course on the data.table package for R and it has done so in collaboration with the key people behind it, namely Matt Dowle, main author, and Arun Srinivasan, co-author and major contributor. The data.table tutorial, which is unique as it is the only one of its kind, is called Data Analysis: the data.table way. It is designed to help you get started with the essentials of the data.table package. Among other things, you will learn all there is to know about operations such as selection and grouping in DT[i, j, by], and intermediate topics like chaining, setting keys and the different join types.

data.table tutorial

The course is set up in DataCamp’s interactive learning platform that aims to enhance the learning experience by centering on learning-by-doing. The course is supplemented by short videos and slides to explain major elements.  You will discover the functionalities and how they interplay with each other step by step. This way, you will effectively learn hands on instead of losing time with suboptimal solutions like a four-hour screencast or webinar. What’s more, in order to consolidate your learning, every section ends with interactive exercises that let you practice the covered concepts while giving you tailored feedback.

So, if you are looking for a qualitative course that brings you up to speed with one of the hottest packages in R today, go to DataCamp, take the data.table tutorial, and add the power of data.table to your data analytical skillset!

Share Button