Skip to content
Snippets Groups Projects
Commit a71dd9df authored by Marie Weiel's avatar Marie Weiel :zap:
Browse files

add submit scripts and job outputs

parent 5066c26a
No related branches found
No related tags found
No related merge requests found
[1/2]: Loading data using truly parallel dataloader...
[0/2]: Loading data using truly parallel dataloader...
File size is 2390277560 bytes.
After Allgatherv: All line starts: [ 0 479 957 ... 2390276125 2390276602 2390277082]
[0/2]: Construct array with line starts and lengths in bytes.
[1/2]: Construct array with line starts and lengths in bytes.
[0/2]: Make global train-test split.
[1/2]: Make global train-test split.
[0/2]: Decode 1250000 test samples from file.
[1/2]: Decode 1250000 test samples from file.
[0/2]: Draw local 1875000 train indices.
[0/2]: Decode train lines from file.
[1/2]: Draw local 1875000 train indices.
[1/2]: Decode train lines from file.
Elapsed time truly parallel data loading: global average 2.5e+02s, local 2.5e+02s
[0/2]: Loading data using root-based dataloader...
[1/2]: Loading data using root-based dataloader...
There are 3750000 train and 1250000 test samples.
Local train samples: [1875000 1875000]
train_indices have shape (3750000,).
Elapsed time root-based data loading: global average 36s, local 36s
[0/2]: DONE.
Parallel: Local train samples / targets have shapes (1875000, 18) / (1875000,).
Parallel: Global test samples / targets have shapes (1250000, 18) / (1250000,).
Root: Local train samples / targets have shapes (1875000, 18) / (1875000,).
Root: Global test samples / targets have shapes (1250000, 18) / (1250000,).
[1/2]: DONE.
Parallel: Local train samples / targets have shapes (1875000, 18) / (1875000,).
Parallel: Global test samples / targets have shapes (1250000, 18) / (1250000,).
Root: Local train samples / targets have shapes (1875000, 18) / (1875000,).
Root: Global test samples / targets have shapes (1250000, 18) / (1250000,).
============================= JOB FEEDBACK =============================
NodeName=uc2n[113,116]
Job ID: 22915408
Cluster: uc2
User/Group: ku4408/scc
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 80
CPU Utilized: 00:09:44
CPU Efficiency: 1.12% of 14:26:40 core-walltime
Job Wall-clock time: 00:05:25
Memory Utilized: 2.58 GB
Memory Efficiency: 1.47% of 175.78 GB
[1/2]: Loading data...
######################################################
# Distributed Random Forest in Scikit-Learn with MPI #
######################################################
[0/2]: Loading data...
Using truly parallel dataloader...
File size is 2390277560 bytes.
[1/2]: Construct array with line starts and lengths in bytes.
After Allgatherv: All line starts: [ 0 479 957 ... 2390276125 2390276602 2390277082]
[0/2]: Construct array with line starts and lengths in bytes.
[1/2]: Make global train-test split.
[0/2]: Make global train-test split.
[1/2]: Decode 1250000 test samples from file.
[0/2]: Decode 1250000 test samples from file.
[1/2]: Draw local 1875000 train indices.
[1/2]: Decode train lines from file.
[0/2]: Draw local 1875000 train indices.
[0/2]: Decode train lines from file.
Elapsed time data loading: global average 2.5e+02s, local 2.5e+02s
[0/2]: DONE.
Local train samples and targets have shapes (1875000, 18) and (1875000,).
Global test samples and targets have shapes (1250000, 18) and (1250000,).
Labels are [0. 0. 0. ... 0. 0. 1.]
Elapsed time forest creation: global average 2.8e-05s, local 3.8e-05s
[0/2]: Set up and train local random forest with 50 trees and random state 2.
[1/2]: DONE.
Local train samples and targets have shapes (1875000, 18) and (1875000,).
Global test samples and targets have shapes (1250000, 18) and (1250000,).
Labels are [1. 0. 0. ... 0. 0. 1.]
[1/2]: Set up and train local random forest with 50 trees and random state 3.
Elapsed time training: global average 8.8e+02s, local 8.8e+02s
[0/2]: Evaluate random forest.
[0/2]: Get predictions of individual sub estimators.
[1/2]: Evaluate random forest.
[1/2]: Get predictions of individual sub estimators.
[1/2]: Calculate majority vote via histograms.
[0/2]: Calculate majority vote via histograms.
[1/2]: Local accuracy is 0.7971136, global accuracy is 0.7998568.
[0/2]: Local accuracy is 0.79728, global accuracy is 0.7998568.
Elapsed time test: global average 56s, local 56s
[1/2]: Loading data...
######################################################
# Distributed Random Forest in Scikit-Learn with MPI #
######################################################
[0/2]: Loading data...
Using root-based dataloader with Scatterv...
There are 3750000 train and 1250000 test samples.
Local train samples: [1875000 1875000]
train_indices have shape (3750000,).
Elapsed time data loading: global average 35s, local 35s
[0/2]: DONE.
Local train samples and targets have shapes (1875000, 18) and (1875000,).
Global test samples and targets have shapes (1250000, 18) and (1250000,).
Labels are [0. 0. 0. ... 1. 0. 0.]
Elapsed time forest creation: global average 2e-05s, local 1.9e-05s
[0/2]: Set up and train local random forest with 50 trees and random state 2.
[1/2]: DONE.
Local train samples and targets have shapes (1875000, 18) and (1875000,).
Global test samples and targets have shapes (1250000, 18) and (1250000,).
Labels are [0. 1. 1. ... 0. 1. 0.]
[1/2]: Set up and train local random forest with 50 trees and random state 3.
Elapsed time training: global average 8.8e+02s, local 8.8e+02s
[0/2]: Evaluate random forest.
[0/2]: Get predictions of individual sub estimators.
[1/2]: Evaluate random forest.
[1/2]: Get predictions of individual sub estimators.
[0/2]: Calculate majority vote via histograms.
[1/2]: Calculate majority vote via histograms.
[0/2]: Local accuracy is 0.7977704, global accuracy is 0.800064.
[1/2]: Local accuracy is 0.7974024, global accuracy is 0.800064.
Elapsed time test: global average 57s, local 57s
============================= JOB FEEDBACK =============================
NodeName=uc2n[222,237]
Job ID: 22915390
Cluster: uc2
User/Group: ku4408/scc
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 80
CPU Utilized: 01:11:56
CPU Efficiency: 1.23% of 4-01:36:00 core-walltime
Job Wall-clock time: 00:36:36
Memory Utilized: 3.12 GB
Memory Efficiency: 1.77% of 175.78 GB
########################################
# Serial Random Forest in Scikit-Learn #
########################################
Loading data...
DONE.
Train samples and targets have shapes (3750000, 18) and (3750000,).
First ten elements are: [[ 1.43997777e+00 1.63248479e+00 -9.67991173e-01 5.68317890e-01
1.46224272e+00 1.36972353e-01 8.19436967e-01 9.11546707e-01
1.21039116e+00 -5.06843209e-01 9.31933105e-01 1.27073431e+00
1.21000016e+00 1.47810447e+00 8.52468789e-01 1.11379075e+00
2.47141235e-02 6.13238990e-01]
[ 3.43844175e-01 -7.04108357e-01 -1.51597571e+00 5.51238716e-01
5.92378020e-01 1.29997504e+00 6.11836970e-01 2.91105419e-01
9.18442786e-01 -1.56051174e-01 4.62793738e-01 6.52638137e-01
1.25141346e+00 1.32741857e+00 4.72825408e-01 1.00110984e+00
9.48828042e-01 1.48614004e-01]
[ 5.07223248e-01 3.59210372e-01 4.22794193e-01 7.39285111e-01
8.21336746e-01 -2.84344971e-01 6.44737303e-01 -1.26930571e+00
8.19674611e-01 -1.98983908e-01 5.15161872e-01 7.95682371e-01
1.37060583e+00 1.57789564e+00 4.93557423e-01 1.17942798e+00
7.52048492e-01 2.89388001e-01]
[ 5.56809664e-01 -1.57331979e+00 1.35683000e+00 7.13156343e-01
7.61644185e-01 1.53109705e+00 7.97575951e-01 -2.99961656e-01
1.19725823e+00 -4.84998196e-01 1.09257638e+00 9.92057800e-01
8.05751562e-01 2.16137958e+00 1.08722878e+00 1.62078691e+00
4.51993123e-02 3.25273015e-02]
[ 3.59211493e+00 2.13520462e-03 -6.44264281e-01 1.90623689e+00
3.20609003e-01 3.08485538e-01 3.18263960e+00 -2.74604857e-01
2.91751528e+00 3.33075738e+00 2.50526094e+00 1.63983834e+00
5.80852270e-01 0.00000000e+00 1.69048905e+00 5.32282293e-01
7.83510923e-01 1.63441002e-01]
[ 7.58597434e-01 -1.89242971e+00 -1.67973864e+00 6.62953973e-01
-1.17093217e+00 -1.02506316e+00 1.43694293e+00 3.01254123e-01
2.15702868e+00 -1.12557161e+00 6.50241256e-01 1.39164054e+00
1.89918971e+00 2.74402332e+00 7.75830925e-01 2.08014536e+00
1.54032695e+00 4.80755001e-01]
[ 9.87198830e-01 1.21461833e+00 1.33140802e-01 1.17224276e+00
-9.93977427e-01 -9.41504121e-01 6.46450996e-01 1.56007898e+00
9.70402718e-01 -3.25230628e-01 1.75528979e+00 1.04119718e+00
5.26380122e-01 1.83662927e+00 1.74301934e+00 1.45607185e+00
2.37897053e-01 4.61494997e-02]
[ 1.70994639e+00 -8.22762012e-01 -1.13427246e+00 9.53481674e-01
-1.90318573e+00 4.29294139e-01 1.42728555e+00 3.82643938e-01
2.05661044e-01 -9.03856218e-01 1.40107632e+00 1.72245789e+00
1.09095061e+00 0.00000000e+00 1.44269097e+00 1.23851955e+00
1.05762923e+00 4.25258994e-01]
[ 9.43593442e-01 1.99409112e-01 8.14792871e-01 7.85433173e-01
-4.54714209e-01 -1.14372945e+00 5.78886461e+00 6.90133393e-01
2.04307199e+00 5.17880249e+00 7.89597631e-01 1.86898780e+00
2.10046601e+00 1.05997956e+00 1.11436117e+00 1.30555189e+00
1.50189817e+00 7.06036985e-01]
[ 7.64213026e-01 1.81087300e-01 -1.32228279e+00 6.47381306e-01
-7.69020736e-01 -2.73123175e-01 1.21376109e+00 7.54100859e-01
1.82200515e+00 -8.98288131e-01 6.92662835e-01 1.20266879e+00
1.54078197e+00 2.04965854e+00 7.89022744e-01 1.62554586e+00
1.52046657e+00 3.61533999e-01]] and [0. 0. 1. 0. 1. 0. 0. 1. 1. 0.]
Test samples and targets have shapes (1250000, 18) and (1250000,).
First ten elements are: [[ 6.91279709e-01 1.76920056e+00 -1.71181107e+00 4.76238310e-01
1.54056251e+00 -2.67858446e-01 8.72983932e-01 2.78744638e-01
1.08713210e+00 -5.51878333e-01 5.12543857e-01 8.91626298e-01
1.54371738e+00 8.60364318e-01 5.83059669e-01 6.91559553e-01
1.50008941e+00 5.59077024e-01]
[ 4.17667389e-01 -1.88861191e+00 -2.25399002e-01 5.72387338e-01
-1.19409788e+00 7.04542026e-02 6.95429265e-01 1.66337061e+00
1.04392505e+00 -3.82097363e-01 4.27827954e-01 8.09540808e-01
1.67913723e+00 1.76087189e+00 4.39944327e-01 1.31074250e+00
1.36796749e+00 1.91391006e-01]
[ 2.74744093e-01 -1.38396299e+00 -1.03303587e+00 4.39079940e-01
1.67037070e+00 -6.75462365e-01 7.48247623e-01 9.07953739e-01
1.12321198e+00 -4.34616148e-01 9.00804639e-01 7.02887774e-01
6.92424834e-01 1.52986670e+00 9.32529151e-01 1.16853487e+00
1.54546940e+00 2.30877008e-03]
[ 1.16500580e+00 1.00420511e+00 -1.22337127e+00 1.62781668e+00
-3.76755238e-01 8.39448214e-01 2.00318694e+00 1.60608697e+00
2.76874542e+00 2.29274392e+00 1.51524031e+00 1.37557471e+00
8.05598021e-01 0.00000000e+00 1.52915168e+00 1.25636113e+00
1.52380025e+00 4.66813985e-03]
[ 1.40449703e+00 -1.45557001e-01 5.21603674e-02 6.55963719e-01
1.24605799e+00 1.40364683e+00 6.71340346e-01 -1.68536985e+00
6.29944921e-01 -3.38737279e-01 1.19020391e+00 1.10914612e+00
8.26955855e-01 7.59544134e-01 1.14556456e+00 1.11023533e+00
2.60205507e-01 3.26079011e-01]
[ 2.63595748e+00 6.21905744e-01 1.27720702e+00 2.45305467e+00
1.10806942e+00 -5.16619682e-01 7.82781899e-01 1.04785419e+00
4.86200094e-01 9.83858526e-01 2.23186541e+00 1.23496139e+00
4.91022170e-01 6.09043658e-01 2.15673685e+00 4.74038422e-01
5.26653826e-01 4.24163006e-02]
[ 8.90232325e-01 -9.47344065e-01 6.88447535e-01 7.76006341e-01
-1.44117546e+00 -4.05649900e-01 4.55961943e-01 3.85926753e-01
3.63408327e-01 7.74472058e-01 7.38129973e-01 3.90631944e-01
4.69624609e-01 0.00000000e+00 6.06057763e-01 2.01749176e-01
2.38361638e-02 6.27153963e-02]
[ 8.09924126e-01 1.76636660e+00 1.50027168e+00 8.24404478e-01
9.41239297e-01 -9.60501552e-01 1.00532985e+00 -1.67402878e-01
1.49302173e+00 -5.72425246e-01 7.54992843e-01 1.13767517e+00
1.33718264e+00 1.82880926e+00 7.92772830e-01 1.42058229e+00
1.07851815e+00 3.78154993e-01]
[ 2.71482444e+00 -1.04809391e+00 -9.96524235e-04 2.07914996e+00
-7.31364310e-01 -1.69832718e+00 1.54993641e+00 1.68483472e+00
3.84886861e-01 -1.23952270e+00 2.09488463e+00 2.30906534e+00
9.78124380e-01 3.79600048e-01 2.12716794e+00 5.43927491e-01
1.45123398e+00 3.90201986e-01]
[ 6.09283328e-01 -8.08349609e-01 2.94039518e-01 9.17876959e-01
-3.87946656e-03 1.57775986e+00 9.13081244e-02 1.04072893e+00
1.14134960e-01 4.03893739e-01 6.74048424e-01 2.03140706e-01
2.67436147e-01 0.00000000e+00 6.38434172e-01 3.05932641e-01
7.21062347e-02 2.93387994e-02]] and [0. 0. 1. 1. 1. 1. 1. 1. 1. 0.]
Time for data loading is 39.535260654985905 s.
Set up classifier.
Train.
Time for training is 4739.061137255281 s.
Accuracy is 0.8003664.
============================= JOB FEEDBACK =============================
NodeName=uc2n378
Job ID: 22915357
Cluster: uc2
User/Group: ku4408/scc
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 40
CPU Utilized: 01:19:24
CPU Efficiency: 2.47% of 2-05:34:40 core-walltime
Job Wall-clock time: 01:20:22
Memory Utilized: 7.58 GB
Memory Efficiency: 17.25% of 43.95 GB
#!/bin/bash
#SBATCH --job-name=RF2 # Job name
#SBATCH --partition=multiple # Queue for the resource allocation
#SBATCH --nodes=2 # Number of nodes
#SBATCH --time=70:00 # Wall-clock time limit
#SBATCH --ntasks-per-node=1 # maximum count of tasks per node
#SBATCH --cpus-per-task=40 # Number of CPUs per task
#SBATCH --mail-type=ALL # Notify user by email when certain event types occur.
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export VENVDIR=<path/to/your/venv> # Export path to your virtual environment.
export PYDIR=<path/to/your/python/script> # Export path to directory containing Python script.
# Set up modules.
module purge # Unload all currently loaded modules.
module load compiler/gnu/13.3 # Load required modules.
module load mpi/openmpi/4.1
module load devel/cuda/12.4
module load lib/hdf5/1.14.4-gnu-13.3-openmpi-4.1
source ${VENVDIR}/bin/activate # Activate your virtual environment.
mpirun python ${PYDIR}/distributed_forest.py --dataloader parallel # Use truly parallel dataloader.
mpirun python ${PYDIR}/distributed_forest.py --dataloader root # Use root-based dataloader.
#!/bin/bash
#SBATCH --job-name=RF1 # Job name
#SBATCH --partition=single # Queue for the resource allocation
#SBATCH --time=24:00:00 # Wall-clock time limit
#SBATCH --cpus-per-task=40 # Number of CPUs per task
#SBATCH --mail-type=ALL # Notify user by email when certain event types occur.
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export VENVDIR=<path/to/your/venv> # Export path to your virtual environment.
export PYDIR=<path/to/your/python/script> # Export path to directory containing Python script.
# Set up modules.
module purge # Unload all currently loaded modules.
module load compiler/gnu/13.3 # Load required modules.
module load mpi/openmpi/4.1
module load devel/cuda/12.4
module load lib/hdf5/1.14.4-gnu-13.3-openmpi-4.1
source ${VENVDIR}/bin/activate # Activate your virtual environment.
python -u ${PYDIR}/serial_forest.py # Run your Python script.
#!/bin/bash
#SBATCH --job-name=dataloader_test # Job name
#SBATCH --partition=dev_multiple # Queue for the resource allocation
#SBATCH --nodes=2 # Number of nodes
#SBATCH --time=30:00 # Wall-clock time limit
#SBATCH --ntasks-per-node=1 # Maximum count of tasks per node
#SBATCH --cpus-per-task=40 # CPUs per task
#SBATCH --mail-type=ALL # Notify user by email when certain event types occur.
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export VENVDIR=<path/to/your/venv> # Export path to your virtual environment.
export PYDIR=<path/to/your/python/script> # Export path to directory containing Python script.
# Set up modules.
module purge # Unload all currently loaded modules.
module load compiler/gnu/13.3 # Load required modules.
module load mpi/openmpi/4.1
module load devel/cuda/12.4
module load lib/hdf5/1.14.4-gnu-13.3-openmpi-4.1
source ${VENVDIR}/bin/activate # Activate your virtual environment.
mpirun python ${PYDIR}/test_dataloaders.py
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment