REPORTS

Workshop Report on “Introduction to Data Analysis with R”

Kayoung KIM (Project Researcher of the B’AI Global Forum)

・Date : April 2, 7, 9, 14, 16, 2021 10:00 am - 12:00 pm (JST)
・Venue : Zoom Meeting
・Language : Japanese
(Click here for details on the event)

In April 2021, the B’AI Global Forum, in collaboration with The Carpentries, held a hands-on workshop “Introduction to Data Analysis with R” for graduate students and other researchers at the University of Tokyo. The Carpentries is an international NPO that provides education on data analysis and programming skills, and has been working with various organizations including universities such as MIT and major government agencies. In this workshop, instructors from the Japanese team of Software Carpentry, one of the three groups that comprise The Carpentries, taught beginners how to use R. Due to the Covid-19 pandemic, the workshop was held online using Zoom.

 

R is an open-source programming language and environment for statistical computing and graphics providing a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, which has made it widely used in recent years for analyzing and visualizing data. As for the advantages of using R, the instructors mentioned that it is a free software so that a wide variety of statistical analysis programs are available for free, that using R code makes the reproducibility of statistical analysis high and thus the reliability of the analysis results is also improved, and that R makes it easy to produce well-designed, publication-quality plots.

 

The workshop consisted of five sessions (two hours each), each with 11-14 participants, two instructors and about four helpers from Software Carpentry Japan. Looking at the specialty of the participants, there were a wide range of fields such as sociometry, pedagogy, social psychology, socio-information studies, art, games, etc., indicating that there has been a growing interest in data analysis in various research fields in recent years.

 

The first session was an orientation session, in which the instructor introduced R and then the participants had time to prepare for the course by making sure that R and RStudio (free software to make R easy to use) were installed properly. At the second and subsequent sessions, the participants learned about variables and functions as the most basic knowledge of R, and then practiced how to extract only the necessary data from pre-prepared datasets and to create graphs with the extracted data. At the last session, there was a time to review all the contents learned so far, which deepened the participants’ understanding of R.

 

<Figure 1> A Screenshot from the RStudio

 

Here are more details about the workshop methods. Firstly, RStudio is an Integrated Development Environment (IDE), which is a combination of a space to write code (editor) and a program to execute them (R). RStudio consists of several parts such as an editor (upper left in Figure 1) and a console (lower left in Figure 1), and when the code created by combining variables and functions is entered in the editor and executed, the results of the calculation or the extraction of necessary data are displayed in the console. The workshop was conducted using the “live coding” method, in which the participants typed the same code as the instructor’s code on their own computers and checked if they got the same results. In addition, the instructors prepared quizzes for each function to see if the participants could create the code by themselves. Those who needed additional support at this stage were led to the breakout room by helpers and received individual explanations.

 

One of the major difficulties this time was brought about by the online environment. This hands-on workshop is originally a face-to-face course, but it had to be held online under the pandemic circumstance. Although Zoom’s functions such as reaction, chat, and breakout room were actively used to overcome the limitations of online, there were still problems such as difficulty in asking questions and providing support to those who had trouble in operating RStudio. Fortunately, the problems were greatly improved from the third session because the Software Carpentry took a questionnaire after each session and reflected the feedback from the participants in the next session. However, it is realized through this workshop that online format, which has become common with the pandemic, has many limitations as well as advantages.

 

In the questionnaire conducted by B’AI Global Forum after all the sessions were finished, there was a comment that indicated the challenges of online learning, such as “it was difficult to keep up with the class and ask questions.” However, the overall satisfaction with the workshop was very high, with comments such as, “I am new to R, but I was able to grasp the basics through the five sessions,” “while those who are familiar with R often don’t notice what beginners don’t understand, the instructors of this workshop answered every single question carefully no matter how basic it was, which was very impressive,” and “both the organization of the contents and the management of the class were excellent.” Some of the participants said they would like to participate in this kind of workshop again in the future, which confirmed the significance of organizing this workshop.