Soham Roy: Researcher & Engineer

Research & Publications

ICCVW 2025 · Accepted

Guardians of Generation: Dynamic Inference-Time Copyright Shielding with Adaptive Guidance for AI Image Generation

Soham Roy, Abhishek Mishra, Shirish Karande, Murari Mandal

A method for keeping copyrighted material out of diffusion-model outputs at the point of generation, rather than by retraining the model. The work intervenes during sampling through adaptive guidance, so an already-trained model can be steered away from protected content without the cost of fine-tuning or a fresh training run.

arXiv

EMNLP (ARR) · Under Review

"I Strongly Suspect This Website Is a Scam": Benchmarking PII Leakage and Detection without Defense in Autonomous Web Agents

A benchmark built around a single question: when an autonomous web agent is carrying out a task and lands on a website designed to manipulate it, will the agent give up the user's personal information? It reports two quantities together: how much private data actually leaks, and how reliably the agent recognises that it is under attack. The evaluation covers 91 adversarial environments, each paired with a benign control, organised under an eight-part taxonomy of manipulation tactics and tested on four current frontier models. The protocol was pre-registered before any data was collected.

arXiv

EMNLP (ARR) · Under Review

GeoAgent: A Benchmark for Active Visual Geolocation

GeoAgent introduces a benchmark for geolocalization in vision-language models based on embodied navigation rather than static image analysis. Instead of predicting a location from a fixed set of images, an agent actively explores Google Street View, moving and rotating before committing to a guess. The benchmark spans 1,200 navigable locations across 100 cities worldwide, covering both developed and developing regions and a range of recognisability. Across frontier VLMs, active exploration generally improves accuracy over static multi-view baselines, with Gemini 3.0 Flash and GPT-5 Mini performing best, though models rarely recover from a wrong initial hypothesis and tend to reinforce early mistakes. The results also reveal a marked geographic bias, with most models doing considerably better in developed regions than developing ones. The paper argues that embodied navigation exposes weaknesses in spatial reasoning that static geolocation benchmarks miss, positioning GeoAgent as a more realistic test of agentic geospatial intelligence.

Current Research Areas

01

Agentic Systems

AI systems that take actions in an environment, such as browsing, navigating, or calling tools, rather than only returning text. My interest is in how they behave when conditions are adversarial or simply unlike their training data.

02

Benchmarking & Evaluation

Designing evaluations that say something honest about a model: controlled comparisons against a baseline, a clear taxonomy of how a system fails, and human reference points, instead of a single headline number.

03

Retrieval-Augmented Generation

Grounding a model's answers in retrieved source material, so that what it says can be traced back to real documents rather than produced from its parameters alone.

Experience

Working on CVP Insights, an internal platform that pulls network telemetry from across Cisco's product lines into a single data warehouse and makes it queryable through a natural-language chat interface.
Designed and documented the system's retrieval-and-grounding (RAG) architecture, so the assistant's answers are backed by actual Cisco documentation rather than unsupported generation.
Took on upstream data-quality work: detecting junk and unidentified telemetry, and classifying customers by industry, so the analytics layer stays trustworthy.
Work across the full stack, from the semantic data definitions and orchestration through the agent-and-analyst split to the React and FastAPI frontend.

Built EVA, an automated hiring pipeline on FastAPI, MongoDB Atlas, and AWS.
Built the system to handle resumes at scale, exposing more than 80 REST API endpoints.
Integrated the VAPI.ai, Twilio, and OpenAI APIs, with logging and monitoring throughout.

Work on copyright protection and machine unlearning for diffusion models, framed around safety in generative systems.
Co-authored the ICCVW 2025 paper on inference-time copyright shielding, and contributed to benchmark work now under review at EMNLP.

Built data pipelines for fine-tuning Llama 3 on travel-planning tasks.
Set up the preprocessing workflows that fed the model, which improved the quality of its outputs on the task.

Lead the AI/ML team of six, handling project planning and technical direction.
Run workshops for other students, including InferenceX, a session on the fundamentals of large language models and how inference actually works underneath.
Maintain an open-source Stable Diffusion fine-tuning framework that has grown to more than ten contributors.

Technical Skills

ML & AI

PyTorchVision-Language ModelsDINOv2Diffusion ModelsRAGFAISS

Agentic Tooling

LangChainLangGraphCrewAIPlaywrightClaude CodeGemini APIs

Data & Backend

Snowflake (Cortex)FastAPIPython

Frontend

React

Achievements

Amazon ML Summer School

Selected for Amazon ML Summer School, 2025.

AIR 82, Amazon ML Challenge

All India Rank 82 in the Amazon ML Challenge, 2025.

Department Rank 1

Department rank 1 with an SGPA of 10.00 in the fifth semester of Computer Science & Engineering.

HackNITR 5.0 Finalist

Finalist at HackNITR 5.0, the national hackathon hosted by NIT Rourkela.

MLSA Hackathon Winner

First place at the Microsoft Learn Student Ambassadors hackathon, in both 2024 and 2025.

Education

Kalinga Institute of Industrial Technology (KIIT DU)

B.Tech in Computer Science & Engineering

CGPA: 9.34 / 10.00 2023 - 2027

Let's Connect

I'm glad to talk about research, the evaluation of AI systems, or relevant roles and collaborations. Email is the most reliable way to reach me.

sohamroy.dev@gmail.com

GitHub LinkedIn

Soham Roy