2024

Cloud Computing and Virtualisation Project Report With Pyspark and Jupyter

Yiğit Çakmak / Emre Yapıcı / Taha Ahmet Ok

Project Topic: Dockerizing Apache Spark in a portable way and making it available to everyone.

How to Use Docker for This Project?

Our project was built using Apache Spark in all 3 ways, using Docker to provide convenience to the users.

Important Note on Using requirements.txt

In Python projects, the requirements.txt file is used to list all the dependencies required by the project along with their specific versions. This file allows developers to ensure that the same versions of dependencies are installed in different environments, ensuring consistency and avoiding potential issues caused by version incompatibilities.

Editing requirements.txt for Custom Versions

To change the versions of dependencies listed in requirements.txt, you simply need to update the version numbers next to each package name.

First way: Using .py file with Docker

1- Start Docker.

2- Create a data folder within the pyspark folder.

3- Upload the txt you want to use into the pyspark folder.

4- Right click on the folder named spark-env-main. Then press open in the terminal.

5- Write “cd pyspark” at terminal. Then press the enter.

6- Write "docker build -t pyspark-app.” at terminal. Then press enter. Wait to end of

build.

7- Write “docker run pyspark-app” at terminal.

8- Finish!! You can see the result at terminal.

Second way: Using Jupyter with Docker

1- Start Docker.

2- Right click on the folder named spark-env-main. Then press open in the terminal. 3- Write “cd pyspark” at terminal. Then press the enter.

4- Write “docker-compose up –build” at terminal. Then press the enter. Wait to end of

build.**

5- Go to this website http://127.0.0.1:8888/lab

6- Click this icon.

7- Click this icon.**

8- Select your Ipynb and txt file.**

9- Finish!! You can see the result.

Third way: Using web-app with Docker

We use Aws S3 to store the txt file and the web page shows the number of words in the

loaded text.

1-Start Docker.

2- Right click on the folder named spark-env-main. Then press open in the terminal. 3- Write “cd web-app” at terminal. Then press the enter.

4-Write “docker-compose up –build” at terminal. Then press the enter. Wait to end of build.

5- Go to this website http://127.0.0.1:8000/

6- Press the “Choose File” button. Then Choose txt for counting word.

7- After selecting the file, click the “File Upload and View” button and here is the result!

Comparison Using Docker and Without Using Docker

The report includes screenshots and explanations for the lab of the Big Data course conducted using Docker and without Docker. Our goal is to show that the work done using Docker is correct. We made 3 way of making word count application with Docker. Therefore, there are 3 way of making word count application; working in terminal with .py file, working in jupyter notebook and showing in internet tab.

Jupyter Notebook Without Using Docker for 124 Mb

Jupyter Notebook With Using Docker for 124 Mb

Jupyter Notebook Without Using Docker for 1 GB

Jupyter Notebook With Using Docker for 1 GB

Jupyter Notebook Without Using Docker for 12 GB

Jupyter Notebook With Using Docker for 12 GB

Showing Results at Terminal With Using Docker For 124 Mb

Showing Results at Terminal With Using Docker For 1 GB

Showing Results at Terminal With Using Docker For 12 GB

Showing Results at Web-App With Using Docker For 124 Mb

Showing Results at Web-App With Using Docker For 1 Gb

Showing Results at Web-App With Using Docker For 12 Gb