New Course! A hands-on introduction to statistics with R by A. Conway (Princeton University)

The best way to learn is at your own pace. Combining the interactive R learning environment of DataCamp and the expertise of Prof. Conway of Princeton, we offer you an extensive online course on introductory statistics with R.  Start learning now…

Whether you are a professional using statistics in your job, an academic wanting a refresher on specific statistical topics, or a student taking statistics classes, this new DataCamp course will match your needs. It is a comprehensive and friendly course, that requires no background knowledge in statistics or R. The aim is to provide you with a solid foundation for future learning, as well as being able to put one’s work into context. All this takes place in your browser thanks to the DataCamp online learning environment. Try it for free!

So, how does it all work? You can choose to subscribe to the course as a whole, or to take individual modules according to your own specific needs. The course consists of 7 modules, ranging from the Student’s T-test over ANOVA to simple and multiple linear regression, finally ending with a last module on Moderation and Mediation.  In total there are more than 250 interactive R exercises, which are accompanied by videos and slides. This adds up to 24 hours of material .

Try the DataCamp course co-developed by Prof. Conway

Interested?  To give you the opportunity to get a taste of the course content and to try out the DataCamp learning experience, we present you the first module for free. Furthermore, if you are a student, we want you to know that you get a  75% discount on the whole course.

So what are you waiting for? Grab this learning opportunity and check out the course! Remember that the first module is free, that you can buy separate modules according to your needs, and if you buy all 7 modules at once, you get a significant discount.  On top of that, students can get a 75% reduction on the whole course.

On Professor Andrew Conway
Prof. Conway is a Senior Lecturer at Princeton and has been teaching to undergrads and graduate students for 20 years. His experience is reflected in the quality of this course. The content of this course has been on Coursera, and back then more than 200,000 individuals followed it, making it the second most popular Coursera course using R.  Psychology students at Princeton are already following the DataCamp course this semester.

 On DataCamp
The course is set up in DataCamp’s interactive platform that aims to enhance the learning experience by offering a learning-by-doing approach. The material is presented by short videos and slides to explain major elements. In order to consolidate your learning, every section ends with interactive exercises that let you practice the covered concepts while giving you tailored feedback.

You will discover R’s capabilities and how they interplay with each other step by step. You can learn at your own pace, stopping to take a break or replay a segment at any time. The system tracks your progress so you can stop at any time; it will start up where you left off. This way, you will learn effectively instead of losing time with one-speed-fits-all solutions like a four-hour screencast or webinar. What’s more, in order to consolidate your learning, every section ends with interactive exercises that let you practice the covered concepts while giving you tailored feedback.

 

 

Share Button

Data analysis the data.table way: introducing DataCamp’s newest course

Together with the key people behind the data.table package, Matt Dowle and Arun Srinivasan,  DataCamp developed a brand new interactive course to bring your data analysis skillset up to date with the essentials of the powerful data.table package. Learn more… 

The popularity of the data.table package is increasing and with good reason. Not only is the number of package downloads rising rapidly, but data.table is also talk of the R town given the numerous presentations of Matt and Arun at conferences such as useR!2014, EARL, R/Insurance and R/Finance.

Data.table allows you to reduce your programming time as well as your computing time considerably, and it is especially useful if you often find yourself working with large datasets.  For example, to read in a 20GB .csv file with 200 million rows and 16 columns, data.table only needs 8 minutes thanks to the fread()function.  This  is instead of the hours it would take you with the read.csv() function. Once you understand its concepts and principles, the speed and simplicity of the package are astonishing!

However, to get the most out of data.table’s functionalities, you first have to overcome its learning curve: even though the syntax is not extremely difficult, it does take some practice to fully grasp it so its built-in functionalities can make your life easier. This is exactly why DataCamp has made an interactive online course on the data.table package for R and it has done so in collaboration with the key people behind it, namely Matt Dowle, main author, and Arun Srinivasan, co-author and major contributor. This course, which is unique as it is the only one of its kind, is called Data Analysis: the data.table way. It is designed to help you get started with the essentials of the data.table package. Among other things, you will learn all there is to know about operations such as selection and grouping in DT[i, j, by], and intermediate topics like chaining, setting keys and the different join types.

images

The course is set up in DataCamp’s interactive learning platform that aims to enhance the learning experience by centering on learning-by-doing. The course is supplemented by short videos and slides to explain major elements.  You will discover the functionalities and how they interplay with each other step by step. This way, you will effectively learn hands on instead of losing time with suboptimal solutions like a four-hour screencast or webinar. What’s more, in order to consolidate your learning, every section ends with interactive exercises that let you practice the covered concepts while giving you tailored feedback.

So, if you are looking for a qualitative course that brings you up to speed with one of the hottest packages in R today, go to DataCamp and add the power of data.table to your data analytical skillset!

Share Button

New R course on Coursera: Data Analysis and Statistical Inference

Yesterday (Monday 1st of September), a new session of Data Analysis and Statistical Inference, taught by Doctor Mine Çetinkaya-Rundel from Duke university, has started on Coursera. Just like with the previous run, all labs take place in DataCamp’s interactive learning environment.

Data Analysis and Statistical Inference will teach you how to make use of data in the face of uncertainty. Throughout the course, you will learn how to collect, analyze, and use data to make inferences and conclusions about real world phenomena. No formal background is required, but mathematical skills are definitely a plus.

coursera_logo_RGB

The course makes intensive use of R for its statistical computing; its corresponding interactive exercises, available on DataCamp, were developed in close collaboration with doctor Çetinkaya-Rundel. In sum, this course is perfectly tailored to your needs if you are a starting data scientist and you are looking to expand your basic statistical knowledge.

We hope to welcome you in our online classroom soon.

P.S. In case you prefer to complete the course self paced in your own time, we recommend you to have a look at the open intro course. Here you can find similar material that is also supplemented with the interactive exercises of DataCamp.

Share Button

Coursera course on computational finance with R

As of today (Tuesday 26th of August), a new session of Professor Eric Zivot’s course on computational finance and financial econometrics starts on Coursera. Just like the previous run of the course, most R labs and R assignments will take place in DataCamp’s interactive learning environment.

Designed by Professor Eric Zivot (University of Washington), Introduction to computational finance focuses on mathematical and statistical tools and techniques that are used in quantitative and computational finance. With the help of real-life examples, you will be introduced to the dos and don’ts of financial data analysis, estimations of statistical models and the construction of optimized portfolios. The course requires no formal background, but some basic mathematical skills will definitely come in handy.

zivot

DataCamp’s interactive R exercises are developed in close collaboration with Professor Zivot himself.  They therefore have the same high-quality standards as academic courses, but presented in DataCamp’s fun and learning-by-doing environment. All students that choose to enroll for the course on Coursera will be directed to DataCamp to practice their skills and to complete assignments.

If you always wanted to learn more about computational finance, or if you are just interested in doing financial econometrics with R, this course is a must-do for sure. We hope to welcome you in our online classroom soon!

PS. In case you prefer to only do the interactive exercises, the course is also available on DataCamp as a stand-alone version which does require prior knowledge about finance and R.

Share Button

Package rankings, task view rankings and much more: the Rdocumentation poster

Especially for useR! 2014 we created a poster on Rdocumentation.org, our R documentation aggregator that lets you search packages from CRAN, Bioconductor and GitHub. We received a lot of positive and valuable feedback on it, so we decided to share it again via our blog.

Screenshot 2014-06-28 16.36.26

The Rdocumentation poster answers questions such as:

  • What are the 10 most downloaded R packages of all time?
  • Who maintains the most  packages?
  • What are the most popular task views, and how did this ranking change over time?
  • How popular is GitHub in the R community?

So if you  did not make it to the useR! 2014 conference, or in case you could not attend the Wednesday evening poster session, you hereby have another chance to have a detailed look at the poster.  Feedback and questions can be sent to team@datacamp.com.

Feel free to  share it with your network!

On Rdocumentation.org: Rdocumentation is a web application that helps you to easily find and browse the documentation of R packages on CRAN, Bioconductor and GitHub. It enables you to instantly search for functions and use advanced search on the documentation of all R packages.

Share Button

Who wants to disrupt R training and R education?

Using EdTech applications as an academic, R trainer or training company is no longer a unique selling proposition, but a must-have commodity. In this post, we introduce the new DataCamp course creation tools for academics, trainers and enterprises, and make a call to those who are interested in using these tools. More information via team@datacamp.com.

Everyone that is involved in academic teaching or professional training is experiencing how a new wave of online education tools is changing the way things are done inside and outside the classroom. Using EdTech is no longer a differentiator, but a must-have.  One must dare to go beyond the standard offering of webinars and the traditional collection of online instruction videos. All these new tools are disrupting the current business models around trainings, as well as the educational and pedagogic models used today.

New Features

For over a year now, DataCamp has been working on building tailored and scalable EdTech tools for R education and training. We do these developments for our own R courses and tutorials -in 2014 alone we already trained over 42,000 new R enthusiast- and for academics, trainers and companies using R in a narrow or a broad sense. In the past months, we have been working hard on some serious improvements to our course creation tools and today we make these new developments available to the public. You can now:

  • Integrate video material and add slides,
  • Write better and more complex submission correctness tests (more on that in a follow-up blogpost soon),
  • And – upon request- set up private learning environments for your students and clients and track their individual performance. (We’ll make sure to cover the technicalities of these new features in more detail in one of our next posts.)          

Furthermore, we have created a whole new FAQ section that answers questions on the course creation process itself, how to track student and employee performance, ways to use DataCamp for your courses, books and trainings,  our offer for tailored online and live trainings, and much more. Additionally, to help future course creators, we have developed our own course creation style guide based on the style guide published by Hadley Wickham.

icon

For academics, trainers and enterprises

With these new functionalities and tools we want to meet the new needs and challenges of:

  • academics that want to complement their academic lectures with new and exciting interactive exercises,
  • trainers and training companies that need to adapt their online and live course portfolio to the changing business environment,
  • package authors and R enthusiasts that want to create their own online course on their favourite topic or package,
  • and (large) enterprises in need of cost-effective, but yet tailored and scalable trainings.

If you are an academic that is interested in using DataCamp, a professional trainer or training company that is ready to integrate EdTech into your course portfolio, or a company in need of high-quality cost-effective R training, contact us at team@datacamp.com or go to our teaching site.

Share Button

Including GitHub and Bioconductor on Rdocumentation: Technical Details

In our last blog post we announced the addition of GitHub and  Bioconductor R packages to Rdocumentation. For the more technical amongst you, I’ll give a short, high-level description of what’s under the hood of Rdocumentation. Along with that, I’ll zoom in on some of the challenges that I encountered while adding GitHub and Bioconductor repositories.

rdoc

Rdocumentation in a (technical) nutshell
In a nutshell, the Rdocumentation web server communicates with an R server that’s running in the background. Using a cron job, this R server executes the following steps on a daily basis:

  1. Check for all available packages and their version numbers using available.packages().
  2. Compare these with the ones on Rdocumentation.
  3. Install/update the ones that are out of sync.
  4. Generate the documentation for the newly installed/updated packages and store it in a zip file.

The Rdocumentation web server then picks up the newly generated documentation from the R server, parses it, and stores it in its database.

This setup effectively creates a fully automated documentation service. However, installing all R packages on a single machine is by no means a trivial task. Many packages depend on certain (often Linux-specific) libraries such as C++ header files from various develoment packages. These dependencies cause installation failure and require manual intervention. We hope to get to the point where we can run a setup procedure on a server to prepare it for installation of all R packages, but for now this is a work in progress. Another problem is that when R updates, many packages break on installation. We’ve opted to ignore this for now, and to not update packages that don’t install on R’s latest version.

Adding GitHub and Bioconductor repositories
The first version of Rdocumentation only included the packages available on CRAN.  Our latest update expanded the package portfolio with the available packages on Bioconductor and GitHub.

Implementing Bioconductor packages was very similar to implementing CRAN packages, but with a few caveats. The biggest one to overcome was that Bioconductor packages sometimes download massive datasets (> 1GB) upon installation, which makes installing and updating a very time consuming and storage space consuming task. To overcome this, we used the `parallel` package to run package installations in threads that were killed (with a SIGKILL signal to the process) if they didn’t terminate after some time. This way we avoided cluttering our machine, and the few packages we loose with this technique is worth the performance gain.

Adding GitHub support was very different. Credits go to Hadley Wickham’s r-on-github script. His script uses the GitHub api to search for all R repositories and their details (owner, stars, latest update, etc.). We only made some minor changes to his script to filter repositories on the amount of stars that they have, this to cut out the many test repositories. The following graph plots the amount of R repositories based on the amount of stars that they have.

Rplot

We decided that 3 or more stars was an acceptable metric to decide that a repository is “popular enough” for Rdocumentation. An arbitrary measure, but given the amounts shown in the graph above it seems that even taking 1 or more stars already discards the big majority of repositories. Once the repository information is collected, install_github() from devtools is used to install all of the packages on the server. After an initial install of all packages, only packages that have been updated/created within the last week on GitHub are considered for obvious performance reasons.

Any questions/remarks? Drop me a line at bram@datacamp.com

Share Button

New! Search GitHub and Bioconductor packages on Rdocumentation

As of today, you can search Rdocumentation not only for CRAN packages, but also for the R packages available on GitHub and Bioconductor. This is our largest update yet, and brings us one step closer to creating  a central place for all R documentation related info and questions.

Today, the rise of alternatives to CRAN package management system (Bioconductor, GitHub, …) can make finding and installing packages tedious. Documentation for CRAN packages is on cran.r-project.org, documentation for Bioconductor packages is on bioconductor.org, etc.

In addition, there is the current tendency of package developers to no longer (immediately) release their packages on CRAN, but to make use of GitHub. Just think of well-known packages such as ggvis and slidify that are only available on GitHub.  While the most popular GitHub R packages are passed on by word of mouth,  many good packages that are not on CRAN remain relatively unknown. Wouldn’t it be nice to be able to have an overview of all these packages that remain obscure in their GitHub repositories?

blog_git_bio

With this latest update, we aim to address these problems. Rdocumentation now supports automatic adding and updating of packages from both Bioconductor and GitHub, making us effectively the first R documentation aggregator that combines these sources into one searchable website.

If you have other suggestions, feel free to contact us via info@datacamp.com.

Share Button

Who wants to learn R? Sharing DataCamp’s user stats and insights.

When one builds an online education start-up for R, the number one criterion to meet is the following: identify an increasing interest in learning R online. Once this box is checked, it is time to start thinking of the second most important criterion: establish a teaching approach that makes people so excited that they keep coming back to learn more, thereby turning them, slowly but surely, into black-belt R masters.

In order to investigate how DataCamp is performing on both criteria, we decided to analyze our user data for February in more detail, and to open up and share the results via this (comprehensive) Slidify presentation. We put some effort in the visualizations as well, so all results are prettified via rMaps, rCharts and googleVis. (For the curious souls among us, the presentation also gives a unique view on the status of DataCamp back then.)

Screenshot 2014-05-01 23.53.22

For DataCamp, February has been one of the most interesting months so far in terms of user data, as we added two new and free online interactive courses to our curriculum: Data Analysis and Statistical Inference and Introduction to Computational Finance. Courses that are/were also used as interactive R complements to the like-named Coursera courses. In February we welcomed over 14,000 new R enthusiasts, from a total of 163 countries. Our servers handled peak traffic of 1,000 requests per minute, and hundreds of concurrent users. Other insights that you will find in the presentation are:

  • Number of chapters started and finished by course
  • Geographical distribution of the DataCamp user base
  • Spillover effect across courses

Make sure to have a look, and if you want more information, send your requests to info@datacamp.com.

Share Button

Decimal comma or decimal point? A googleVis visualization

As you all know, the decimal mark is a symbol used to separate the integer part from the fractional part of a number written in decimal form. Since I was born and raised in Continental Europe, I am quite fond of using the comma sign to indicate a decimal point. I’ve grown up with it, encountered this comma in both my literary and numerical escapades, and still shudder when thinking of its dual role in long divisions.

However, using the comma sign as a decimal mark has two consequences:

  • Since the comma “,” sign is already taken to mark the radix point, I’m obligated to use the dot “.” sign to separate the thousands.
  • As the comma “,” sign is only used by 24% of the world’s population, 76% of the people in the world are creating documents, writing texts, typing in working in spreadsheets, etc. that contain numbers with a fractional part that looks different from mine.

While the first consequence is the inevitable sacrifice one must make for having the privilege to use the comma for fractional numbers, the second is a much harder obstacle to deal with in the real world of professional number crunchers. More than necessary, I find myself struggling with sheets and documents that use the dot “.” sign to mark the radix point instead of my beloved comma. Why that often? Based on Wikipedia, roughly 60% of the world’s populations uses the dot “.” sign to mark the radix point. And when looking for these numbers, I learnt that there are even more dissidents. In the Arab world they use the Arabic decimal separator for Eastern Arabic numerals, in Persian the decimal mark is called momayyez, and in English Braille the decimal mark even has its own sign…

Since politely asking these people to change their disrupting behavior will most likely be hopeless, and finger pointing is only a prerogative of the majority, I was forced to make myself a little tool to guide me through of what at the beginning looked like an insurmountable problem. Using the data I found on Wikipedia and with the help of the googleVis package, I created a world map that indicates the decimal separator that is used in each country. That way, depending on the origin of the sheet I receive, I always know what decimal mark to expect.

The map (full-size) is not yet complete (mainly in Africa there are some blank spots left). So for those that are aware of the prevailing decimal mark culture in these regions, just let me know in the comment section and I’ll make sure to update them in case of a comma, or to proselytize them otherwise ;-). You can find the original code here.

Share Button