Quandl is a “wikipedia” for numerical data that allows you to search rapidly through 8 million ready-to-use data sets. At DataCamp we created a free in-browser coding tutorial on how to use the corresponding R package to access Quandl data from within R.
As every real world data analyst knows, finding and formatting numerical data for analysis in R is a often a hard and rigid task. Quandl wants to make this task less painful, by providing you with a ‘search engine” for numerical data . Not only does it allow you to find data fast, but once you find it, it is ready to use. This is because Quandl’s bot returns data in a standard format, meaning you can translate it to any format you want. One of the great things is that Quandl has its own R package. This package is built on top of the Quandl API, and allows you to access many of the Quandl functionalities right inside the R console.
Our free interactive Quandl course introduces you to the main functionality in the Quandl R package. In two short chapters you learn how to search through Quandl’s data sets, how to access them, and how you can easily manipulate them for your own purposes. All exercises are based on real-life examples (e.g. Bitcoin exchange rates), and take place in the comfort of your own browser thanks to DataCamp’s interactive learning platform for R.
We hope you will enjoy the course! If you have suggestions on future courses we should develop, or if you want us to develop a course for you, just contact us via email@example.com.
We developed these courses in close collaboration with the teaching professors of the like-named Coursera courses. Hence, you can expect the same high-quality standards as from an academic course, but presented in DataCamp’s fun and learning-by-doing environment. Students that choose to enroll for the course on Coursera, will be directed to DataCamp to practice their skills and to complete assignments.
In ‘Data Analysis and Statistical Inference‘, taught by Dr. Mine Çetinkaya-Rundel from Duke University, you learn how to make use of data in the face of uncertainty. Throughout the course, you’ll understand how to collect, analyze, and use data to make inferences and conclusions about real world phenomena.
‘Introduction to Computational Finance‘ focuses on mathematical and statistical tools and techniques used in quantitative and computational finance. Professor Eric Zivot (University of Washington) designed the course, and with the help of real life examples introduces you to the do’s and don’ts when analyzing financial data, estimating statistical models, and constructing optimized portfolios.
To follow the pace of the two Coursera courses, the different chapters will be released on DataCamp periodically over the next few weeks. Once fully released, the courses will remain available on the DataCamp platform as a stand-alone version. The courses require no formal background, but some basic mathematical skills will come in handy. A genuine interest in data analysis is a plus!
We’re happy to announce that effective immediately, we’ve officially changed our startup’s name from DataMind to DataCamp.
It was very obvious from the start that we did not want to become the next consultancy firm -in a row of many- that offered training and learning services on the side. We believed the time was ripe to build a company within the field of data science that had education and training as its sole core. A company that would develop tailored educational technology, and use it to offer something more exciting than the traditional two-week seminars or long monotonous webinars (depending on which of the two you can afford). The vision was to build a tailored online learning platform that offered students and professionals an engaging, learning-by-doing environment were they could build their knowledge through in-browser coding and exercises.
Today, it seems like there is indeed room for a vision like ours. Everyday, more and more (soon-to-be) data analysts are finding their way to our free interactive intro to R course, and based on the increasing retention figures we have (at least the impression) that they like the interactive learning approach a lot. This traction allowed us to make improvements faster, and just recently we managed to get out of the beta stage.
So why the name change? In the process of building the learning platform, and spreading the message of it to students, professionals and academics, we learnt that a more professional image would benefit us if we wanted to access bigger players in the market, more funding sources, and better mentors. So for the benefit of the project’s growth and future we decided to do a name switch. Instead of the playful domain name DataMind.org you can now find us on the more professional DataCamp.com.
We felt the timing was right because in the upcoming months we’re releasing some interesting new features to the the online interactive learning platform (like a new gamfication system). Even more exciting is that we recently started working together with Coursera professors on how to integrate DataCamp with their course. This will hopefully allow even more students and starting data scientists to become familiar with the power and benefits of R. But more on that in our next post…
We hope you’ll love our new name as much as we do!
Coursera – an online education startup – has rapidly expanded its curriculum of statistics and data analysis courses. Today, there are already 33 modules directly linked to the field, excluding the courses where statistics and data science are solely used as a supportive tool (e.g. finance). These courses make use of multiple statistical software packages like Python, MATLAB and of course R.
I decided to make a list of all Coursera courses that use R as either their first choice, or as one of the many statistical software packages allowed to use by students to perform the homework’s assignment. Coursera does not publish all data on how many students enroll in their courses, but most (some?) courses reach well over a hundred thousand students each year.
To have some kind of indication of their popularity, I list below all courses using R ranked by the number of facebook likes:
Given the unwillingness of Coursera’s search function, I had to manually draft the list above. Therefore, it is possible I overlooked some of the courses. Feel free to mention them in the comment section, and I will make sure to update the list. In case you are interested in taking (or teaching) interactive data analysis courses, make sure to have a look at our own educational startup DataMind.
While I expect that most of you are familiar with Coursera, for those who don’t a quick summary: Coursera is one of the leading providers of Massive Open Online Courses (MOOCs). Today they have more then 100+ institutional partners offering 500+ courses to over 5 million students worldwide. So despite being criticized by some, it is becoming more and more clear that they are here to stay.
www.R-fiddle.org is an early stage beta that provides you with a free and powerful environment to write, run and share R-code right inside your browser. It even offers the option to include packages. Since a couple of days it’s gaining more and more traction, and was mentioned on the frontpage of Hacker News.
We designed it for those situations where you have code that you need to prototype quickly and then possibly share it with others for feedback. All this without needing a user account, or any scrap projects or files! We even included a very-easy-to-use ‘embed’ function for blogs and website, so your visitors can edit and run R code on your own website or blog. This is the first version of R-fiddle, so do not hesitate to give us feedback.
Working together with the help of R-fiddle
You can use R-fiddle to share code snippets with colleagues when tossing around ideas, in order to find that annoying bug, or by making your own variations on others people code. It’s easy: Just go to www.R-fiddle.org, type your code, and get your public URL by pressing ‘share’. This is a lot easier for your potential troubleshooter/colleague/.. since (s)he can immediate run and check the code, save it once finished and share it again. So by sharing your R-code through R-fiddle, you can not only help others to better understand your code, but they can also help you!
Embedding an R-fiddle in your blog or website
Embedding the interactive code of your fiddle on a website or blog is easy. R-fiddle automatically generates a piece of code that you can then simply paste in your HTML at the desired place.
You can choose between two ways to embed the code: with or without the console. If you embed a fiddle with the console, your visitors can edit and run your code within the environment of your own site. If you embed a fiddle without the console, your visitors can see the code with a link to the r-fiddle website where they can edit and run it. For more information on how to embed interactive code, just check the documentation at http://www.r-fiddle.org/#/help
The R-fiddle working environment
Working with R-fiddle is very straightforward. The page exists out of two sections. The main section of the site (on the left) is divided into two areas: the editor and the console. Here is were you put your code. They work just like the standard editor and console you are familiar with from your IDE. For example, it colour-codes the syntax. The right pane is the discussion area. Here others can comment on your code, make suggestions, or ask questions. You can immediately see the comments others made, making collaboration easy.
The R-fiddle buttons
The R-fiddle interface provides plenty of features to assist in your development. The buttons at the top of the page include:
Save: By clicking save you activate the Embed and Share buttons. You always have to click save first, that’s when R-fiddle knows things are getting serious.
Embed: This allows you to embed your code on your website and blog with the help of an iframe.
Share: This allows you to share code from the R-fiddle page with other users. You can share it through a web link, Facebook and Twitter. These users can than provide feedback or even adapt/fix your code within their own browser.
Run:Executes the code entered in the editor, and displays the results in the console area.
Graph: Here you can find the graphs that are possibly created by your code.
With this quick tour on R-fiddle, we hope to have given you a better understanding of what it provides and why you should use it. Please be aware that R-fiddle is a hosted application in beta, so performance can degrade during peak usage. As R-fiddle usage increases, we will add more servers to it asap. Check out www.R-fiddle.org today, and you will discover its power!
DataMind is the first free interactive online learning platform for R. Through an in-browser coding environment we offer exercise-based learning-by-doing. Our goal is to build a fun learning experience for data analysis and R, while allowing anyone to create courses! You can check out an early stage beta version at www.DataMind.org !
With DataMind, we focus on three things: (1) make the educational experience interactive and fun for students, (2) make the platform and the content available for free, and (3) stimulate content creation by the community (you! Drop us a line if you are interested to create courses, the course creation interface is work in progress). Our focus on interactivity and fun is driven by our believe that you learn data analytics by doing! We do not believe in copying the classroom online. That is why all our courses are constructed around an in-browser coding interface, allowing users to start coding R from day one with the help of instant feedback. Over time, challenges and competitions will be added to courses as well, so users can also interact with each other.
We were inspired to start this project by innovative start-ups who offer interactive web development courses. These start-ups put a focus on learning-by-doing through in-browser coding, elements of gamification, and community provided content. It turned out this approach was a huge hit, but we got frustrated it didn’t exist for R and data analysis. Having experience in teaching statistics, we were convinced data analytics education could greatly benefit from such a didactic approach that focuses on learning-by-doing. Next, the data science industry itself is experiencing a huge increase in popularity. And last but not least, we strongly believe data analytics and its visualisation needs a somewhat tailored learning approach compared to web development.
So we started coding!
We are developing DataMind in such a way that it supports, and even stimulates, content creation by the community. The key succesfactor of an online learning platform depends on the strength of the available content. Today, R is used in many domains that are often relatively unrelated. (e.g. finance and biostatistics). With community content generation, experts of these diverse fields can share and create interactive content much faster and of much higher quality than we could ever do ourselves. For you as a course creator, it’s a scalable way to spread knowledge, build reputation and provide a fun learning experience to your students. In other words, we need you
Where do we stand today? At www.DataMind.org you can check out an early stage beta version of the platform and enroll in our first course ‘Summer of R‘. ‘Summer of R’ is aimed at those new to R that want to master the basics so they can start doing their own analysis. Furthermore, we’re working very hard on the course creation interface so everyone can start creating interactive courses soon.
If you feel enthusiastic about this project, and want to create interactive courses either for academic purposes, professional reasons or just for fun. Or if you have suggestions, feedback, questions… Do not hesitate to send an e-mail to firstname.lastname@example.org. (We would love feedback!)
Last week, I was working on an educational R project when I needed to consult the help files of different R packages and functions online. After doing some Google searches, it appeared to me that finding an easy-to-use tool was not as simple as I had expected. The closest that I got, were the websites Inside-R and R search, but as a user it wasn’t as “smooth” as what I was looking for. (I needed something really user-friendly for this educational project). Therefore, inspired by the documentation websites of programming languages/frameworks such as Ruby on Rails and AngularJS, I decided to build an online documentation search interface for R myself together with colleagues. Check the result on www.Rdocumentation.org!
Checking R documentation online instead of with the built-in R help function, can often provide some extra benefits. First, you are capable of searching through the latest version of all R packages, even those that are not installed on your device. This makes it not only a help tool, but also a tool for discovery. Second, I added the discussion system ‘Disqus’. For every function and package, Disqus allows users to ask questions, add extra examples to the documentation, etc. Furthermore, today’s web development tools allow you to build a more user-friendly interface. Especially for an R-beginner this can be helpful. And last but not least, since R is a “one letter word”, googling for “R” + “something” is always a challenge. Having all the documentation in one place can at least eliminate that frustration.
I wrote the code for www.Rdocumentation.org together with some colleagues. It is quite dirty code since it only needed to get the job done, but for those interested just send me a request. Also, while coding, we discovered the great staticdocs package of Hadley Wickham, it was not exactly what we needed but maybe it can be used for other/similar initiatives. For all packages on CRAN, the help files were generated in html. Next, these html files were parsed and inserted into an SQL database. We opted for Ruby on Rails to build the web app, that serves all the documentation on R packages and functions. Finally, using JQuery and Twitter Bootstrap, we built the instant search tool that allows you to see all R packages and R functions immediately while typing.
In this post, we briefly summarize and discuss the results of our survey on “R and education”. Before diving into the figures, we would like to express our sincere gratitude and appreciation to the 286 R enthusiasts that invested their valuable time to fill out this survey. Furthermore, you can download the complete dataset of the survey or browse an overview of all questions (see bottom of the post for more information), so feel free to do your own analysis, and share it. Note that the right panel of this page provides the answers to some open-ended questions in the survey.
Interestingly, respondents came from diverse backgrounds, both geographically as well as in terms of occupation. The left panel of Figure 1 illustrates respondents are mainly active as academics (50.5%), followed by professionals (30%) and students (19.5%). Academics from about 80 different universities, mainly located in the US and Europe, participated. About 24 respondents were R package authors.
The online survey was distributed through the R mailing lists and our personal contacts. Figure 1 demonstrates the geographical origin of the respondents. Individuals from all 4 continents participated, with the majority based in the US. Although there is selection bias when conducting an online survey in this way, we believe the current diversity of respondents is interesting and adds some flavor to the results.
Next, we first discuss the main takeaways regarding the respondent’s views on R in general. A more focused section follows on R and education. To end, we discuss the next steps we want to undertake based on this survey’s results.
Why you love R and expect its market share to go up
Respondents (from the group “professionals that use R”) are very optimistic when asked about the future spreading of R in the world, as illustrated in Figure 2. An impressive 79.7% of respondents expect the future usage to go up in comparison to other statistical packages such as SAS and SPSS, only 11.9% expects it will remain stable, and just 3.4% of the respondents take a pessimistic view, expecting it will go down.
Figure 3 shows that respondents (from the group “professionals that use R”) mainly love R because of its functionality (86.2%) and the community (65.5%). Other reasons to love R cited under “other” are (among others): “many packages”, “cross platform” and “wonderful for graphics”. All that glitters is not gold though. When asked about their biggest frustration when using R, only 19% answers “Nothing, R is perfect”. The biggest frustrations reported by respondents are “the lack of documentation” (29.3%) and “the lack of consistency” (22.4%). A large number of respondents (34.5%) provided an open-ended response on this question as well. We listed the open-ended responses to this question in the right panel of this page as well as the open-ended responses to what respondents consider as the main disadvantages of R.
Major interest in online learning and teaching R
“R best matches the concept of ‘computational thinking’, a core idea that my students need”
Whether you are completely new to R, or you are a veteran with multiple years of experience, there is always room to learn and improve. As illustrated in Figure 4, one of the main sources to develop new R skills are online resources such as websites and online communities. This is true for both academics (92.4%), and professionals (94.9%). The second most cited educational source is the build-in R help feature, mentioned by 77.2% of the academics, and 83.1% of the professionals. Textbooks, which can be seen as a more traditional way to learn and teach, are placed third.
Today, numerous online courses on statistics are already making use of the R language to explain data analytics concepts. Some of the most noteworthy and successful examples are the Coursera courses from Roger D. Peng (Computing for Data Analysis), and Eric Zivot (Introduction to Computational Finance and Financial Econometrics). This proven need for online educational sources for statistics and R, raises the question if it would be possible to identify different and even more engaging ways to learn R online. The ‘R in Education’ survey indicates over 75% of students are interested to take online courses with an interactive component. Of the Academic respondents, 68.6% shows interest in online interactive courses and 13% would be willing to pay for these courses (see Figure 5). Our survey results are thus in line with the observation that online interactive courses as offered by codecademy.com, codeschool.com, etc. have gained enormous popularity recently.
Naturally, in open-source communities most things are developed and offered for free. As noted in the previous paragraph, interactive online courses would be a valuable addition to the current spectrum of R’s educational sources. Since our results indicate that demand for free courses would be high, the question manifests itself: Who will develop these free courses? A reasonable assumption would be to look at people already developing free software such as the R package authors. Indeed, 70% percent of R package authors in the survey indicated that, given an easy-to-use development platform exists, they would be willing to create such interactive learning tools for their packages for free (note that the sample is small though). Therefore, it might be interesting to develop and eventually provide such a platform as a way to spread data analytics knowledge in general, and the R statistical programming language in specific.
New educational tools to teach R and statistics?
This survey largely confirmed our believe that there is a need for more online educational tools to teach R. These tools should take into account the added value of an interactive approach, as well as the characteristics and benefits of an open-source community. Therefore, we started working on an open interactive exercise platform for statistics and R.
To receive updates on our future progress, or if you are willing to provide us with feedback while building this learning platform, please leave your e-mail address below.
Download the full dataset of the survey here. The dataset is structured as follows: qla is a list in which each list-item contains the information of exactly one question in the survey. Each list-item in qla is itself again a list with the following items:
First list-item: The question asked
Second list-item: The answer possibilities
Third list-item: The data with the answers. Rows for respondents, columns for answers.
NOTE: For privacy reasons we removed all information from the dataset that could result in identification of the respondents (e.g. emails, university affiliation,..). Please contact us in case we overlooked something.
We would like to offer our apologies for the following errors that ended up in the survey:
When selecting that R is more complex to learn than other statistical languages, one of the following questions stated that you indicated that R was less complex to learn.
In order to better target the questions and to avoid making the survey even longer, we opted to mostly ask different questions to each type of respondent (Students/Academics/Professionals). Therefore, it is not often possible to make comparisons of the different types of respondents, which is a pity in hindsight.