SC19 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Dask Processing and Analytics for Large Datasets

Workshop: Dask Processing and Analytics for Large Datasets

Abstract: This paper describes the assignment titled "Dask Analytics" that is used for student evaluation as part of a graduate data science and data mining course. For this assignment students are required to read, process and answer queries using a large dataset that does not fit in the RAM memory of a commodity laptop. Using the Python framework Dask, which extends a small set of Pandas's operations, students can become familiar with parallel and distributed processing. In addition, the assignment teaches students about the basics operations implemented in Dask in a very interesting and applied way, as well as operations and algorithms that are harder to parallelize.

Back to Workshop on Education for High Performance Computing (EduHPC) Archive Listing

Back to Full Workshop Archive Listing