@author: Minghao WANG
Update on 11/Sep/2023: We'll soon publish a portable python environment for geneformer.
This tutorial includes several parts, including:
How to install python, R environment and relavent dependencies / packages on hpc3
how to transform Seurat data to Anndata format, environment: R
How to convert Anndata to .loom format, environment: Python 3.10
How to use the .loom format data to generate .dataset data, environment: Python 3.10
How to use the tokenizer to finetune your own model, environment: Python 3.10
How to use the finetuned / pretrained model to extract and plot cell embeddings, environment: Python 3.10
The above is the pipeline for the whole process. Step 6
is optional.
Python is mainly used in deep learning. Anaconda is a powerful tool to manage python packages. So we just need to install Anaconda, then Python is automatically installed by it.
First, log on your hpc3 cluster, and move the directory to Prof. WANG's directory by: cd /scratch/PI/jgwang/
.
If you don't have your own folder, just run the following command: mkdir <your name>
. Then your own folder is successfully created.
Then move into it by cd <your name>
.
We should install anaconda here. Run the following command: wget -c https://repo.anaconda.com/archive/Anaconda3-2023.07-2-Linux-x86_64.sh
.
Then, run it by sh Anaconda3-2023.07-2-Linux-x86_64.sh
. After pressing a lot of ENTER/RETURN, there is a step to change the install path. You can change the path to /scratch/PI/jgwang/<your name>
. Then Anaconda is successfully installed. Please also remember to change the python system path by vim ~/.bashrc
, then press i
, and use the arrow to move to a new line and add the following: export PATH=/scratch/PI/jgwang/<your name>/anaconda3/bin:$PATH
. Then press esc
and input :wq
then press enter. Then you should use source ~/.bashrc
to activate it.
To check the availability of python, just run python --version
. If there are some feedbacks instead of command not found
, it means you successfully install Python!
Python env: run conda create -n py310 python=3.10 anaconda
, and follow the instructions to create the environment. To dive into it, just use conda activate py310
.
R env: conda create -n r_env r-essentials r-base
.
Since it's too much, just search according to the error message. E.g. if it says No module named A, you should search How to install A with conda? on Google.