Get notified when R packages update

Today’s highly active R user base is developing, re-developing, and releasing R packages at a never-before-seen rate. While this is fantastic news for the R community as such, it inevitably also causes growing pains as mentioned before.

One of the often cited problems is the painful and time-consuming task to keep track of changes and version updates of packages and functions (see for example the paper of Jeroen Ooms in The R Journal). After all, nothing beats the fun of putting a lot of effort in a project or task, just to realize minutes after finishing the job that package xyz released its latest version. (To say nothing of the frightening but inevitable moment when loading in the new version, praying to God the fragile life of your precious code will be spared.)

A better way to deal with these package updates, is to be informed automatically when changes are made to the packages you depend on. This is exactly what the brand new notification feature of Rdocumentation does. It gives you the option to subscribe to the R-packages of your choice, and then when one of these packages gets updated on CRAN, Rdocumentation automatically sends an email to inform you.

Getting updates on future package versions via Rdocumentation is simple. Navigate to the package of your choice (let’s say ggplot2 on Rdocumentation), provide your email address, and hit the green subscribe button. A message will pop-up to confirm your subscription, and that’s it. This is also shown in the following screenshot:

ggplot2 rdocumentation

Rdocumentation is a tool that enables you to easily find and browse the documentation of all current packages and functions on CRAN. If offers features such as advanced search, package popularity rankings, community forums, and package download statistics. Rdocumentation is supported by DataCamp, provider of free R tutorials. 

April Fools’ Day: The 7 Funniest Data Cartoons

To give this years April Fools’ day a more analytical touch, we decided last week do a little poll on internet cartoons. We asked our friends and colleagues to select their favourite data related cartoon on the web, and organized a voting session to construct a top 5 list. (You can always share your own favourites in the comments.)

We proudly present you the winners of the April Fools’ 2014 Data Cartoon awards:

Number One: The Cloud 

cloud-cartoon

Number Two: A Study on Statistics

statistics

Number Three: Pacman Statistics

pacman

Number Four: Dilbert One

dilbertone

Number Five: Haloween Statistics

haloween

Number Six: Dilbert Two

dilbertwo

Number Seven: XKCD Correlation

correlation

Disqualified for the competition, but still funny:

big data

 

 

The Stack Overflow R Top 5

Like every start-up in the IT and data science sector, we often find ourselves spending more time on Stack Overflow than on our own site. For those of you who are not familiar with it, Stack Overflow is like a Q&A forum on steroids. It features questions and answers on a wide range of topics in programming, and it’s dedicated to answering any and all of these questions. Thanks to its clever reputation system based on points and badges, chances are high you will find a high-quality answer to your particular problem. Believe us, this will save you a lot of time!

Since we use it that often for DataCamp, we wanted to share our ‘Top Five’ list of the most popular R questions:

Position 5 = What statistics should a programmer (or computer scientist) know? (262 votes)     

This question is targeted at programmers who want to understand how their programming efforts can benefit from a more statistical approach. Not only does it provide an overview of statistical techniques, part of the answers also focus on the statistical tools programmers can use in their day-to-day activities.

Position 4 = R Grouping functions: sapply vs lapply vs apply vs tapply vs by vs aggregate (272 votes)     

This is something almost every new R programmer struggles with in the beginning:  how and when to use the functions in the apply family. If you are one of these, just check out this Stack Overflow post and it will be a lot clearer to you. Multiple individuals have responded to the question, and most of them provide very clear answers with some even including slide presentations.

Position 3 = How to sort a data frame by column(s) in R (302 votes)     

Again, a very easy but highly relevant question (certainly for new R users switching from Excel). Based on an example, the questioner wants to know how he can sort his data frame by multiple columns. This is a standard task in R, but if you’re not familiar with using functions, the barrier to entry might be high. (Spoiler alert: the order function will take you a long way)

Position 2 = How can we make xkcd style graphs in R (307 votes)     

Close, but no cigar. This question on xkcd style graphs reached the second place in our top five list. As a start-up we personally love xkcd style graphs since they have this arty-farty layer over them.  They allow you to provide information in a very clear way, but their unique and fun style just increases the chances your audience will pick them up. A must read for everyone!

Position 1 = How to make a great R reproducible example (525 votes)  

Simply put: great question and great answers! Reproducible examples are fundamental for teaching, research, and even when asking questions on for example Stack Overflow. However, the creation of reproducible examples is not that easy, and requires a certain finesse. This post will guide you through the ins and outs of creating such reproducible examples, so make sure to check it out since it will definitely help you to better understand R in the long run.

Bonus: What’s your favorite data analysis cartoon 

For the not so serious moments…

A new series: R-fiddle of the Week

Now that our ‘Learning R’ -series is coming to an end (for those who missed it, have a look at our Twitter or Facebook ), it is time to announce the start of a new series : R-fiddle of the Week. Every week, we will share an R-fiddle link that contains the code of some popular or well liked R blog posts. Since R-fiddle allows you to run and write R-code right inside your browser, it is then easy to start playing around with the code yourself and make your own versions and adaptions of it. You can even share your code experiments with your friends, colleagues, students…

For the first week we wanted to start with something very tailored and visual, so we made an R-fiddle that uses Google’s API and your personal input. Based on the address you provide, it will return the corresponding coordinates and shows you the location on a Google Map via a plot. It is easy to think of variations (e.g. multiple addresses), so we are curious to see what you will come up with.

With this new set of posts we aim to show new R users the power of R, and introduce experienced users to some nifty R features they might not be aware of. You can follow the ‘R-fiddle of the Week’ series via Twitter or Facebook.

If you have any suggestions or ideas on a R-fiddle we should make, just send them to info@datacamp.com  

R-fiddle provides you with a free and powerful environment to write, run and share R-code right inside your browser. We designed it for those situations where you have code that you need to prototype quickly and then possibly share it with others for feedback. All this without needing a user account, or any scrap projects or files! We even included a very-easy-to-use ‘embed’ function for blogs and website, so your visitors can edit and run R code on your own website or blog. 

LEARN TO USE QUANDL IN R: A FREE R TUTORIAL WITH A SHIELD

shield-quandl

Quandl is a “wikipedia” for numerical data that allows you to search rapidly through 8 million ready-to-use data sets. At DataCamp we created a free in-browser coding tutorial on how to use the corresponding R package to access Quandl data from within R.   

As every real world data analyst knows, finding and formatting numerical data for analysis in R is a often a hard and rigid task. Quandl wants to make this task less painful, by providing you with a ‘search engine” for numerical data . Not only does it allow you to find data fast, but once you find it, it is ready to use. This is because Quandl’s bot returns data in a standard format, meaning you can translate it to any format you want. One of the great things is that Quandl has its own R package. This package is built on top of the Quandl API, and allows you to access many of the Quandl functionalities right inside the R console.

quandl2

Our free interactive Quandl course introduces you to the main functionality in the Quandl R package. In two short chapters you learn how to search through Quandl’s data sets, how to access them, and how you can easily manipulate them for your own purposes. All exercises are based on real-life examples (e.g. Bitcoin exchange rates), and take place in the comfort of your own browser thanks to DataCamp’s interactive learning platform for R.

We hope you will enjoy the course! If you have suggestions on future courses we should develop, or if you want us to develop a course for you, just contact us via info@datacamp.com.

 

Two new free interactive courses with R on DataCamp

We’re happy to announce that as of today, DataCamp has added two new and free online interactive courses to its curriculum: ‘Data Analysis and Statistical Inference‘ and ‘Introduction to Computational Finance‘.  They will be the biggest DataCamp courses to date, so we’re very excited to find out what this will give.

We developed these courses in close collaboration with the teaching professors of the like-named Coursera courses. Hence, you can expect the same high-quality standards as from an academic course, but presented in DataCamp’s fun and learning-by-doing environment. Students that choose to enroll for the course on Coursera, will be directed to DataCamp to practice their skills and to complete assignments.

In ‘Data Analysis and Statistical Inference‘, taught by Dr. Mine Çetinkaya-Rundel from Duke University, you learn how to make use of data in the face of uncertainty. Throughout the course, you’ll understand how to collect, analyze, and use data to make inferences and conclusions about real world phenomena.

Introduction to Computational Finance‘ focuses on mathematical and statistical tools and techniques used in quantitative and computational finance. Professor Eric Zivot  (University of Washington) designed the course, and with the help of real life examples introduces you to the do’s and don’ts when analyzing financial data, estimating statistical models, and constructing optimized portfolios.

To follow the pace of the two Coursera courses, the different chapters will be released on DataCamp periodically over the next few weeks.  Once fully released, the courses will remain available on the DataCamp platform as a stand-alone version. The courses require no formal background, but some basic mathematical skills will come in handy. A genuine interest in data analysis is a plus!

We hope to welcome you in our online classroom  soon!

Any ideas on new courses we should launch? Let us know via Facebook or Twitter!

DataMind goes to DataCamp

We’re happy to announce that effective immediately, we’ve officially changed our startup’s name from DataMind to DataCamp.

It was very obvious from the start that we did not want to become the next consultancy firm -in a row of many- that offered training and learning services on the side. We believed the time was ripe to build a company within the field of data science that had education and training as its sole core. A company that would develop tailored educational technology, and use it to offer something more exciting than the traditional two-week seminars or long monotonous webinars (depending on which of the two you can afford). The vision was to build a tailored online learning platform that offered students and professionals an engaging, learning-by-doing environment were they could build their knowledge through in-browser coding and exercises.

Today, it seems like there is indeed room for a vision like ours. Everyday, more and more (soon-to-be) data analysts are finding their way to our free interactive intro to R course, and based on the increasing retention figures we have (at least the impression) that they like the interactive learning approach a lot. This traction allowed us to make improvements faster, and just recently we managed to get out of the beta stage.

So why the name change? In the process of building the learning platform, and spreading the message of it to students, professionals and academics, we learnt that a more professional image would benefit us if we wanted to access bigger players in the market, more funding sources, and better mentors. So for the benefit of the project’s growth and future we decided to do a name switch. Instead of the playful domain name DataMind.org you can now find us on the more professional DataCamp.com.

We felt the timing was right because in the upcoming months we’re releasing some interesting new features to the the online interactive learning platform (like a new gamfication system). Even more exciting is that we recently started working together with Coursera professors on how to integrate DataCamp with their course. This will hopefully allow even more students and starting data scientists to become familiar with the power and benefits of R. But more on that in our next post…

We hope you’ll love our new name as much as we do!

@DataCamp_com
Linked-in
Website

Complete list of Coursera courses using R ranked by “popularity”

Coursera – an online education startup – has rapidly expanded its curriculum of statistics and data analysis courses. Today, there are already 33 modules directly linked to the field, excluding the courses where statistics and data science are solely used as a supportive tool (e.g. finance). These courses make use of multiple statistical software packages like Python, MATLAB and of course R.

I decided to make a list of all Coursera courses that use R as either their first choice, or as one of the many statistical software packages allowed to use by students to perform the homework’s assignment. Coursera does not publish all data on how many students enroll in their courses, but most (some?) courses reach well over a hundred thousand students each year.

To have some kind of indication of their popularity, I list below all courses using R ranked by the number of facebook likes:

Ranking Courese title Professor University Facebook likes Tweets
1
Social Network Analysis
Lada Adamic University of Michigan 12000 3543
2
Statistics one
Andrew Conway Princeton University 9600 1421
3
Computing for Data Analysis
Roger Peng John Hopkins University 8500 1934
4
Data Analysis
Jeff Leek John Hopkins University 5200 1408
5
Introduction to Data Science
Bill Howe University of Washington 2600 1103
6
Introduction to Computational Finance and Financial Econometrics
Eric Zivot University of Washington 2100 351
7
Mathematical Biostatistics Boot Camp 1
Brian Caffo John Hopkins University 1400 239
8
Statistics: Making Sense of Data
Alison Gibs & Jeffey Rosenthal University of Toronto 1400 243
9
Asset Pricing
John H. Cochrane University of Chicago Booth 855 102
10
Mathematical Methods for Quantitative Finance
Kjell Konis University of Washington 635 92
11
Case-Based Introduction to Biostatistics
Scott L. Zeger John Hopkins University 424 110
12
Financial Engineering 2
Martin Haugh & Garud Iyengar Coumbia University 109 13
13
Data Analysis and statistical inference
Mine Çetinkaya-Rundel Duke University 80 18
14
Core Concepts in Data Analysis
Boris Mirkin Higher School of Economics 77 15
15
Mathematical Biostatistics Boot Camp 2
Brian Caffo John Hopkins University 60 21

Given the unwillingness of Coursera’s search function, I had to manually draft the list above. Therefore, it is possible I overlooked some of the courses. Feel free to mention them in the comment section, and I will make sure to update the list. In case you are interested in taking (or teaching) interactive data analysis courses, make sure to have a look at our own educational startup DataMind.

While I expect that most of you are familiar with Coursera, for those who don’t a quick summary: Coursera is one of the leading providers of Massive Open Online Courses (MOOCs). Today they have more then 100+ institutional partners offering 500+ courses to over 5 million students worldwide. So despite being criticized by some, it is becoming more and more clear that they are here to stay.

R-Fiddle: An online playground for R code

r-fiddle_logowww.R-fiddle.org is an early stage beta that provides you with a free and powerful environment to write, run and share R-code right inside your browser. It even offers the option to include packages. Since a couple of days it’s gaining more and more traction, and was mentioned on the frontpage of Hacker News.

We designed it for those situations where you have code that you need to prototype quickly and then possibly share it with others for feedback. All this without needing a user account, or any scrap projects or files! We even included a very-easy-to-use ‘embed’ function for blogs and website, so your visitors can edit and run R code on your own website or blog. This is the first version of R-fiddle, so do not hesitate to give us feedback.

Working together with the help of R-fiddle

You can use R-fiddle to share code snippets with colleagues when tossing around ideas, in order to find that annoying bug, or by making your own variations on others people code. It’s easy: Just go to www.R-fiddle.org, type your code, and get your public URL by pressing ‘share’. This is a lot easier for your potential troubleshooter/colleague/.. since (s)he can immediate run and check the code, save it once finished and share it again. So by sharing your R-code through R-fiddle, you can not only help others to better understand your code, but they can also help you!

Embedding an R-fiddle in your blog or website

Embedding the interactive code of your fiddle on a website or blog is easy. R-fiddle automatically generates a piece of code that you can then simply paste in your HTML at the desired place.

You can choose between two ways to embed the code: with or without the console. If you embed a fiddle with the console, your visitors can edit and run your code within the environment of your own site. If you embed a fiddle without the console, your visitors can see the code with a link to the r-fiddle website where they can edit and run it. For more information on how to embed interactive code, just check the documentation at http://www.r-fiddle.org/#/help

The R-fiddle working environment

Working with R-fiddle is very straightforward. The page exists out of two sections. The main section of the site (on the left) is divided into two areas: the editor and the console. Here is were you put your code. They work just like the standard editor and console you are familiar with from your IDE. For example, it colour-codes the syntax. The right pane is the discussion area. Here others can comment on your code, make suggestions, or ask questions. You can immediately see the comments others made, making collaboration easy.

rfiddle

The R-fiddle buttons

The R-fiddle interface provides plenty of features to assist in your development. The buttons at the top of the page include:

  • Save: By clicking save you activate the Embed and Share buttons. You always have to click save first, that’s when R-fiddle knows things are getting serious.
  • Embed: This allows you to embed your code on your website and blog with the help of an iframe.
  • Share: This allows you to share code from the R-fiddle page with other users. You can share it through a web link, Facebook and Twitter. These users can than provide feedback or even adapt/fix your code within their own browser.
  • Run:Executes the code entered in the editor, and displays the results in the console area.
  • Graph: Here you can find the graphs that are possibly created by your code.

 In conclusion:

With this quick tour on R-fiddle, we hope to have given you a better understanding of what it provides and why you should use it. Please be aware that R-fiddle is a hosted application in beta, so performance can degrade during peak usage. As R-fiddle usage increases, we will add more servers to it asap. Check out www.R-fiddle.org today, and you will discover its power!

For any questions or suggestions, do not hesitate to contact us at info@datamind.org

Building DataMind: FREE Online Interactive Learning Platform for R

DataMind is the first free interactive online learning platform for R. Through an in-browser coding environment we offer exercise-based learning-by-doing. Our goal is to build a fun learning experience for data analysis and R, while allowing anyone to create courses! You can check out an early stage beta version at www.DataMind.org !

With DataMind, we focus on three things: (1) make the educational experience interactive and fun for students, (2) make the platform and the content available for free, and (3) stimulate content creation by the community (you! Drop us a line if you are interested to create courses, the course creation interface is work in progress). Our focus on interactivity and fun is driven by our believe that you learn data analytics by doing! We do not believe in copying the classroom online. That is why all our courses are constructed around an in-browser coding interface, allowing users to start coding R from day one with the help of instant feedback. Over time, challenges and competitions will be added to courses as well, so users can also interact with each other.

We were inspired to start this project by innovative start-ups who offer interactive web development courses. These start-ups put a focus on learning-by-doing through in-browser coding, elements of gamification, and community provided content. It turned out this approach was a huge hit, but we got frustrated it didn’t exist for R and data analysis. Having experience in teaching statistics, we were convinced data analytics education could greatly benefit from such a didactic approach that focuses on learning-by-doing. Next, the data science industry itself is experiencing a huge increase in popularity. And last but not least, we strongly believe data analytics and its visualisation needs a somewhat tailored learning approach compared to web development.

So we started coding!

We are developing DataMind in such a way that it supports, and even stimulates, content creation by the community. The key succesfactor of an online learning platform depends on the strength of the available content. Today, R is used in many domains that are often relatively unrelated. (e.g. finance and biostatistics). With community content generation, experts of these diverse fields can share and create interactive content much faster and of much higher quality than we could ever do ourselves. For you as a course creator, it’s a scalable way to spread knowledge, build reputation and provide a fun learning experience to your students. In other words, we need you ;-)

Where do we stand today? At www.DataMind.org you can check out an early stage beta version of the platform and enroll in our first course ‘Summer of R‘. ‘Summer of R’ is aimed at those new to R that want to master the basics so they can start doing their own analysis. Furthermore, we’re working very hard on the course creation interface so everyone can start creating interactive courses soon.

If you feel enthusiastic about this project, and want to create interactive courses either for academic purposes, professional reasons or just for fun. Or if you have suggestions, feedback, questions… Do not hesitate to send an e-mail to info@datamind.org. (We would love feedback!)

www.DataMind.org

P.S. The technical infrastructure behind DataMind will be covered in a future post.

logo_big_transparant_capital_M_blue