How to Access Rubin DP1 at LIneA¶

This guide provides a clear step-by-step workflow for accessing the Rubin Data Preview 1 (DP1) using the LIneA infrastructure, including the HPC environment and the Open OnDemand platform. All links below refer to the official LIneA documentation.

Step-by-step instructions¶

1) Register as a LIneA user¶

Follow the onboarding instructions and complete your registration:
https://docs.linea.org.br/en/primeiros_passos.html

2) Request access to the LIneA HPC environment¶

Open a ticket on the Helpdesk requesting HPC access, specifically stating your interest in using Open OnDemand:
https://docs.linea.org.br/en/suporte.html

3) Access the Open OnDemand platform¶

Once access is granted, follow the guide to log in:
https://docs.linea.org.br/en/processamento/uso/openondemand.html

4) Create a Conda environment and expose the kernel to Open OnDemand¶

Full instructions are available here:
https://docs.linea.org.br/en/processamento/uso/openondemand.html#creating-python-kernels

Inside your Conda environment, install LSDB, Dask, and Dask-Jobqueue:

conda install -c conda-forge lsdb dask dask-jobqueue distributed

5) Start a Jupyter Lab session¶

Launch a session following the official documentation:
https://docs.linea.org.br/en/processamento/uso/openondemand.html#jupyterlab

6) Open a notebook and select your custom Conda kernel¶

Once Jupyter Lab is open, select the kernel corresponding to your Conda environment.

Configuring a Dask cluster¶

The example below includes two options: a local cluster for testing and a SLURM cluster for real workloads on the HPC system.

from dask.distributed import Client

CLUSTER_TYPE = "slurm"

if CLUSTER_TYPE == "local":
    from dask.distributed import LocalCluster

    cluster = LocalCluster(
        n_workers=3,
        threads_per_worker=1,
        memory_limit="2GB"
    )

    client = Client(cluster)

if CLUSTER_TYPE == "slurm":
    from dask_jobqueue import SLURMCluster

    minimum_jobs = 5
    maximum_jobs = 10

    cluster = SLURMCluster(
        n_workers=minimum_jobs,
        queue="cpu",
        account="hpc-bpglsst",
        cores=8,
        memory="16GB",
        processes=1,
        interface="ib0",
        walltime="01:00:00",
        job_extra_directives=[
            "--propagate"
        ]
    )

    cluster.adapt(minimum_jobs=minimum_jobs, maximum_jobs=maximum_jobs)

    client = Client(cluster)

    client.wait_for_workers(minimum_jobs, timeout=15)

Reading the Rubin DP1 catalog using LSDB¶

You can open the Rubin DP1 catalog directly using the global environment variable DP1_HATS, which should point to the base directory containing the DP1 collections. Then, simply use LSDB to load the catalog.

import os
import lsdb

dp1_base = os.environ.get("DP1_HATS")

catalog_path = os.path.join(dp1_base, "object_collection")

cat = lsdb.open_catalog(catalog_path)