Multi-user workspace management — concurrency, quotas, cleanup

## Current State (already done)

- DB isolation: all entities have user_id FK
- File isolation: data/projects/{project_id}/
- Auth: get_current_user on all routes
- Project sharing model exists

## Issues at Scale

### 1. No job concurrency limit (critical)
Every run spawns a subprocess via BackgroundTasks. 10 simultaneous users = 10 gpredomics processes = OOM.

**Fix:** Add a job queue with configurable max concurrent workers:
- Simple: asyncio.Semaphore(max_concurrent_jobs) around _run_job
- Better: Use scitq (already partially integrated) or Celery
- Track: running/queued/completed job counts per user

### 2. No disk quotas
Users can upload unlimited data and run unlimited jobs. Disk fills up.

**Fix:**
- Per-user disk quota (e.g., 1GB default)
- Auto-cleanup: delete job results older than N days
- Show disk usage in user profile
- Warn at 80%, block uploads at 100%

### 3. No job timeout per user
A single user can monopolize the server with a long-running GA (200 epochs, 5000 pop).

**Fix:**
- Per-user max job duration (e.g., 30 minutes)
- Per-user max concurrent jobs (e.g., 3)
- Kill long-running jobs with SIGTERM

### 4. Dataset deduplication
Multiple users uploading the same Qin2014 dataset wastes storage.

**Fix:**
- Content-hash datasets on upload
- Share identical datasets across users (copy-on-write)
- Reference counting for cleanup

### 5. Admin dashboard
- View all running jobs, users, disk usage
- Kill stuck jobs
- Set per-user quotas
- View system health (CPU, RAM, disk)

## Quick Win: Semaphore-based concurrency limit

```python
# In analysis.py
import asyncio

_JOB_SEMAPHORE = asyncio.Semaphore(int(os.environ.get("MAX_CONCURRENT_JOBS", "4")))

async def _run_job_limited(...):
    async with _JOB_SEMAPHORE:
        _run_job(...)  # existing sync function
```

This alone prevents OOM from too many simultaneous jobs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-user workspace management — concurrency, quotas, cleanup #4

Current State (already done)

Issues at Scale

1. No job concurrency limit (critical)

2. No disk quotas

3. No job timeout per user

4. Dataset deduplication

5. Admin dashboard

Quick Win: Semaphore-based concurrency limit

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multi-user workspace management — concurrency, quotas, cleanup #4

Description

Current State (already done)

Issues at Scale

1. No job concurrency limit (critical)

2. No disk quotas

3. No job timeout per user

4. Dataset deduplication

5. Admin dashboard

Quick Win: Semaphore-based concurrency limit

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions