"Data is the sword of the 21st century, those who wield it the samurai."--Jonathan Rosenberg, former SVP of Google
While K-12 computer science has recently seen a growth in awareness and opportunities for learning in high schools through the efforts of programs like Exploring Computer Science and groups like Code.org, there is less public awareness and limited opportunities for learning in areas like Statistics, and in particular the burgeoning field of Data Science, which blends statistics and computing. While colleges, particularly at the graduate level, have been responding to this demand by increasing the number of courses and programs focusing on data science, the K-12 space has thus far seen very little activity that might help to introduce students to these concepts and learning experiences.
Into this mix comes Introduction to Data Science (IDS), a program funded by the National Science Foundation as part of a Math-Science Partnership grant called Mobilize, involving UCLA and LAUSD. IDS is being piloted this school year by ten LAUSD high school teachers, to roughly 365 students. The course has been approved by the University of California Office of the President as a college-prep core mathematics course. According to LeeAnn Trusela, Project Director for the Mobilize grant, "IDS is a ground-breaking course in that it not only addresses the Common Core standards for Statistics and Probability and provides an alternate pathway to AP Stats, but also strives to open up the world of statistical thinking and data science for students by encouraging them to think critically about the data that surrounds them." IDS also "fuses mathematics with computer science through the use of RStudio, an open-source programming language," giving students hands-on exposure to computing.
So what exactly is data science, and how does it differ from traditional Statistics? According to Amelia McNamara, a Statistics doctoral student at UCLA and one of the co-authors of IDS, "a data scientist knows how to get meaning from data, using techniques from statistics and computer science. Unlike statistics, though, which can sometimes be done by hand, data science requires the use of computational tools." James Molyneux, also a Statistics doctoral student and co-author, affirms that "data science is a mix of ideas from statistics and computer science. Whereas both of these fields on their own are interested in developing new methods to make inferences/predictions, data science is more interested in studying what we can learn from data."
With the IDS course, students are learning statistical concepts using the power of computing to gain insights into sets of data. McNamara explains that the students are "learning how to manipulate data with a computer, ask appropriate questions of data, and understand how to interpret the results you get back." To do so, they are engaging in hands-on labs using the R programming environment within the RStudio interface.
When the students engage in the activities within IDS, their own data are involved. Sometimes this is done through participatory sensing, where students use networked devices (smartphones) to gather data around topics relevant to their own lives, such as their snacking habits. As a result, they are better able to "see themselves" in the data and feel connected to the process. This is all done while connecting to the key Common Core standards involved with Statistics and Probability, including:
- Interpreting Categorical and Quantitative Data
- Making Inferences and Justifying Conclusions
- Conditional Probability and the Rules of Probability, and
- Using Probability to Make Decisions
The result of the IDS approach to content and pedagogy is that students are engaged in examining data, finding meaning in data, and revising initial questions in light of the new conclusions. Rob Gould, Professor of Statistics at UCLA indicates that this experience with "statistical modeling ties into one of the [Common Core] standards of practice and is one of the big selling points of IDS, since there aren't many curricula that actively promote modeling in a real way." Multiple student comments suggest that they appreciate the "real life" approach to learning, which "makes it seem like we were interacting and applying what we learn with the environment instead of our traditional way (of) staying in a classroom and just learning from a textbook."
To unpack this "real-life approach to learning," consider the following example from the IDS course as relayed by Suyen Moncada-Machado, the LAUSD lead for the IDS project. In the Nutrition Campaign, "students start with a discussion about nutrition and nutrition labels. During the discussion, students are informed that they will be collecting data about their own snacking habits. After the discussion, they generate statistical questions that they predict may be answered with the data they will collect. A simple example: Is there a relationship between the number of calories and the total fat of the snacks we eat?
"Students then begin collecting data about their snacking habits by answering questions in a survey with their phones or other networked devices. They collect data about grams of sugar, fat, carbohydrates, to name a few of the variables. Once the students collect this data over a period of five days (minimum), they download the data, upload it, load it into RStudio, and begin to do analysis on their own snacking habits and the class' habits as a whole. The data may prompt other questions, which allow them to think about ways to investigate the topic further."
In addition to the real world approach to learning, Molyneaux also reports that students like the coding aspects of the labs and want to know what other languages they could pick up. One female student had failed every math course she'd taken in high school and wasn't going to graduate because she lacked the necessary number of units. Taking the IDS course was the first math course she'd ever taken where she was doing really, really well. Another student, who is an English Language Learner, was in a similar situation. He struggled in high school courses, yet when given some data and a keyboard, found that he excelled in data analysis.
A striking facet of the student experiences is that none of the students who shared these experiences with Molyneux came from the white, male background that continues to prevail in the coding fields (save for statistics which is doing really well at appealing to really smart women). In fact, demographic data from IDS students in the Spring of 2015 indicates that 90% are Latino/a and 52% are female. Forty-eight percent of the students are in 11th grade, which positions them to take Advanced Placement Statistics should they desire to (and it is offered at their school site). These statistics are in striking contrast to participation in computing related courses for these groups and speak to the equity mission of the IDS program.
In order to enact the context and provide the content for the IDS learning experience, a cadre of ten pilot teachers met for a total of 11 days (66 hours) between last summer and April with members of the IDS team. According to Trusela, their learning has focused on:
- Thinking critically about and generating hypotheses based on data
- Building statistical models
- Statistical & Computational thinking
- Using R to develop graphical/numerical summaries to communicate findings & generate further questioning
- Common Core Standards for Statistics & Probability relevant to Data Science
- Increasing Classroom Discourse through the use of Talk Moves
Arlene Pascua, an IDS teacher from Narbonne High School, indicates that "the IDS training has helped me to become a better teacher. I learned different teaching strategies that I was able to use for IDS as well as for my Algebra 2 classes. My students are learning through their own experiences using surveys and campaigns; collecting, graphing, calculating, and interpreting their own data."
The dedication to learning of the IDS teachers and their pioneering spirit has been a critical factor in the early success of the program. Seven of the those teachers will be "IDS Fellows" and will help to facilitate the learning of a new cohort of twenty LAUSD teachers to attend the second IDS Summer Institute this summer. The hopes are that as the curriculum is refined, the program can expand nationally. In the meantime all of the materials are available at wiki.mobilizingcs.org
The IDS program is charting a new course for data science -- and potentially other STEM related disciplines -- at the high school level. By creating an integrating and applied approach to learning mathematics, statistics, and computer science in a way that connects to students' lives, they are equipping and apprenticing students to be more analytical and scientific in their thinking. While students are not necessarily going to be able to jump into an entry level data science job, their eyes are being opened to new vistas in STEM disciplines. In addition, as Moncada-Machado indicates, this learning is helping students to "ultimately become inspired to be civically engaged citizens." The four-fold connection to standards, technology as a tool to analyze and create meaning (not just for consuming information), within the context of tapping into things that are relevant to student's lives, and ultimately their civic engagement, is a well-rounded and comprehensive approach that is worthy of emulation by all who would be educational innovators in our schools today.