2024
Cloud Computing and Virtualisation Project Report With Pyspark and Jupyter
Yiğit Çakmak / Emre Yapıcı / Taha Ahmet Ok
Project Topic: Dockerizing Apache Spark in a portable way and making it available to everyone.
How to Use Docker for This Project?
Our project was built using Apache Spark in all 3 ways, using Docker to provide convenience to the users.
Important Note on Using requirements.txt
In Python projects, the requirements.txt file is used to list all the dependencies required by the project along with their specific versions. This file allows developers to ensure that the same versions of dependencies are installed in different environments, ensuring consistency and avoiding potential issues caused by version incompatibilities.
Editing requirements.txt for Custom Versions
To change the versions of dependencies listed in requirements.txt, you simply need to update the version numbers next to each package name.
First way: Using .py file with Docker
1- Start Docker.
2- Create a data folder within the pyspark folder.
3- Upload the txt you want to use into the pyspark folder.
4- Right click on the folder named spark-env-main. Then press open in the terminal.
5- Write “cd pyspark” at terminal. Then press the enter.
6- Write "docker build -t pyspark-app.” at terminal. Then press enter. Wait to end of
build.
7- Write “docker run pyspark-app” at terminal.
8- Finish!! You can see the result at terminal.
Second way: Using Jupyter with Docker
1- Start Docker.
2- Right click on the folder named spark-env-main. Then press open in the terminal. 3- Write “cd pyspark” at terminal. Then press the enter.
4- Write “docker-compose up –build” at terminal. Then press the enter. Wait to end of
build.**
5- Go to this website http://127.0.0.1:8888/lab
6- Click this icon.
7- Click this icon.**
8- Select your Ipynb and txt file.**
9- Finish!! You can see the result.
Third way: Using web-app with Docker
We use Aws S3 to store the txt file and the web page shows the number of words in the
loaded text.
1-Start Docker.
2- Right click on the folder named spark-env-main. Then press open in the terminal. 3- Write “cd web-app” at terminal. Then press the enter.
4-Write “docker-compose up –build” at terminal. Then press the enter. Wait to end of build.
5- Go to this website http://127.0.0.1:8000/
6- Press the “Choose File” button. Then Choose txt for counting word.
7- After selecting the file, click the “File Upload and View” button and here is the result!
Comparison Using Docker and Without Using Docker
The report includes screenshots and explanations for the lab of the Big Data course conducted using Docker and without Docker. Our goal is to show that the work done using Docker is correct. We made 3 way of making word count application with Docker. Therefore, there are 3 way of making word count application; working in terminal with .py file, working in jupyter notebook and showing in internet tab.
Jupyter Notebook Without Using Docker for 124 Mb
Jupyter Notebook With Using Docker for 124 Mb
Jupyter Notebook Without Using Docker for 1 GB
Jupyter Notebook With Using Docker for 1 GB
Jupyter Notebook Without Using Docker for 12 GB
Jupyter Notebook With Using Docker for 12 GB
Showing Results at Terminal With Using Docker For 124 Mb
Showing Results at Terminal With Using Docker For 1 GB
Showing Results at Terminal With Using Docker For 12 GB
Showing Results at Web-App With Using Docker For 124 Mb
Showing Results at Web-App With Using Docker For 1 Gb
Showing Results at Web-App With Using Docker For 12 Gb