Ctrl K

Google Cloud SDK and Local BigQuery Setup

Install Google Cloud CLI globally on the host while keeping each client's gcloud config, Python venv, and BigQuery auth isolated per project.

This setup assumes the host may hold several Google Cloud accounts and projects over its lifecycle. The Google Cloud CLI is installed once globally, but each project keeps its own gcloud config directory, Python venv, and notebook kernel. CLOUDSDK_CONFIG is what keeps client accounts from mixing.

Install gcloud globally (Arch)

Install the Google Cloud CLI system-wide, not inside a Python venv. On Arch, the AUR package is the simplest route.

yay -S google-cloud-cli

Verify the install. Expected path is /usr/bin/gcloud.

gcloud version
which gcloud

Create an isolated gcloud config directory

Each client or project gets its own gcloud config directory. Pointing CLOUDSDK_CONFIG at this directory keeps accounts, projects, and ADC credentials separated from other work on the same host.

mkdir -p ~/gcloud-configs/client1

Create the project venv and notebook kernel

Python dependencies stay inside the project venv, separate from the global gcloud install.

cd ~/projects/client1_data_analysis
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip

Install notebook and BigQuery packages. google-cloud-bigquery-storage is optional but speeds up to_dataframe() downloads.

pip install ipykernel pandas db-dtypes pyarrow jupyter google-cloud-bigquery google-cloud-bigquery-storage

Register the venv as a notebook kernel so VS Code and Jupyter can pick it up by name.

python -m ipykernel install --user --name client1-bq

Create a sourced project shell file

A small shell file activates the venv and exports the isolated gcloud config and project ID in one step. It is meant to be sourced, not executed.

nano ~/projects/client1_data_analysis/client1.sh

Paste the following and save.

cd ~/projects/client1_data_analysis
source .venv/bin/activate
export CLOUDSDK_CONFIG=$HOME/gcloud-configs/client1
export GOOGLE_CLOUD_PROJECT=your-gcp-project-id

Source it in any new shell or tmux pane for this project.

source client1.sh  # cd first

Verify the current shell context points at the right venv, config, and project.

echo $VIRTUAL_ENV
echo $CLOUDSDK_CONFIG
echo $GOOGLE_CLOUD_PROJECT
which python

Authenticate gcloud (two logins)

Two separate logins are required. gcloud auth login authenticates the CLI. gcloud auth application-default login creates Application Default Credentials used by Python client libraries like google-cloud-bigquery.

Always source the project shell file first so both logins land inside the isolated config directory.

# Check the Extras section for more info about double login

source ~/projects/client1_data_analysis/client1.sh

# CLI auth
gcloud auth login
gcloud config set project your-gcp-project-id

# Application Default Credentials for Python libraries
gcloud auth application-default login

Verify auth, active config, and that the isolated config directory is populated.

gcloud auth list
gcloud config list
gcloud auth application-default print-access-token
ls -la ~/gcloud-configs/client1

Open your code editor from the sourced shell

Launch your code editor (VS Code used here) from a shell that has already sourced the project file so the environment is inherited cleanly. Then select the project kernel or .venv interpreter from the notebook kernel picker.

source ~/projects/client1_data_analysis/client1.sh
code .

Validate BigQuery access from a notebook

Creating a client does not guarantee the auth chain works. Always validate with a real table query.

import os, sys
from google.auth import default
from google.cloud import bigquery

print(sys.executable)
print(os.getenv("GOOGLE_CLOUD_PROJECT"))

creds, project = default()
print("ADC project:", project)

client = bigquery.Client(project="your-gcp-project-id")
print("BigQuery client created")

Run a small query against a known table to confirm end-to-end access.

query = """
SELECT *
FROM `your-gcp-project-id.your_dataset.your_table`
ORDER BY data_date DESC
LIMIT 10
"""
df = client.query(query).to_dataframe()
display(df)

Fix the BigQuery Storage warning

If you see 'UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead', install the storage client and restart the notebook kernel.

pip install google-cloud-bigquery-storage

Daily workflow

cd ~/projects/client1_data_analysis
source ~/projects/client1_data_analysis/client1.sh
code .

Operational notes

  • gcloud is installed globally on the host and Python packages stay inside the project .venv.
  • CLOUDSDK_CONFIG is what isolates this client's account and config from others on the same machine.
  • gcloud auth login handles the CLI and later gcloud auth application-default login is what makes Python client libraries authenticate in notebooks.
  • A successful bigquery.Client() constructor does not prove auth works. Always confirm with a real table query.

Extras: Understand the two auth tracks

In this case, two separate auth tracks run in parallel on the same machine. gcloud CLI auth is used by gcloud commands. ADC (Application Default Credentials) is used by Python client libraries like bigquery.Client() and storage.Client(). The two logins populate two different stores, and either can be logged in without the other.

  • Browser UI for BigQuery uses the browser's Google account.
  • gcloud auth login sets the CLI account.
  • gcloud auth application-default login sets the ADC account for Python libraries.
  • When the same account is used in all three places, UI, CLI, and Python all act as the same IAM principal.

Verify the CLI side.

gcloud auth list
gcloud config get-value project

Verify ADC exists. This prints an access token but does not show the email behind it.

gcloud auth application-default print-access-token

Check whether Python is being forced to a specific credential file. If this is empty, Python falls back to the ADC file created by application-default login, which is what we want for local development.

echo $GOOGLE_APPLICATION_CREDENTIALS

Resolve ADC in Python directly. This shows the credential class, the detected project, and whether the identity is a user account (service_account_email is None) or a service account.

import google.auth

creds, project = google.auth.default()
print("ADC credential class:", type(creds).__name__)
print("ADC project:", project)
print("service_account_email:", getattr(creds, "service_account_email", None))
print("quota_project_id:", getattr(creds, "quota_project_id", None))

Expected output for a local user-credential setup:

ADC credential class: Credentials
ADC project: your-gcp-project-id
service_account_email: None
quota_project_id: your-gcp-project-id
  • service_account_email: None confirms ADC is using user credentials, not a service account. Thus, IAM permissions follow the user permissions as well.

Resources