Cover Image for REX - Data Provenance and Responsible AI for Social Media Privacy and Workforce Intelligence
Cover Image for REX - Data Provenance and Responsible AI for Social Media Privacy and Workforce Intelligence
Avatar for datacraft
Presented by
datacraft
Le club des data scientists, chercheurs et ingénieurs en IA
Hosted By

REX - Data Provenance and Responsible AI for Social Media Privacy and Workforce Intelligence

Register to See Address
Paris, Île-de-France
Registration
Welcome! Please choose your desired ticket type:
About Event

REX - Data Provenance and Responsible AI for Social Media Privacy and Workforce Intelligence

by Guilherme Machado Medeiros, associate professor & Subhankar Maity, post-doc researcher - ECE

— Presentation In English —

As large language models (LLMs) increasingly power intelligent systems, from conversational interfaces to code generation, the need to understand the provenance of their training data has become critical.

Our guests' research tackles whether specific data samples contribute to LLM behaviour, focusing on code repositories and social media content that differ sharply from traditional text datasets.

Through data provenance analysis and membership inference attacks, we deliver measurable transparency into how models learn, memorise, and potentially expose sensitive information. The project tests authentic code and social media samples across state-of-the-art LLMs (GPT-5, Gemini 2.5 Flash, Mistral 24B, LLaMA 3.3 70B, DeepSeek, etc.).

We develop statistical and visual frameworks to detect the presence of training data reliably.

In professional environments, employee-contributed code often feeds LLMs for code assistance, creating blind spots in workforce capabilities. Our detection reveals when proprietary code influences outputs beyond documented team knowledge, highlighting specific learning gaps.

This drives targeted upskilling programs, ensuring organisations maintain competitive expertise while protecting intellectual property from unauthorised model memorisation.

Social media content creates acute privacy risks when absorbed into LLMs, as user-generated posts can reveal personal details through model outputs. Our methods allow individuals to flag whether their personal content was used in training, enabling proactive privacy protection. This user-centric approach demands accountability from AI systems trained on public data, aligning with GDPR requirements for transparency and data rights.

This research converts regulatory challenges into strategic opportunities:

• Privacy-first AI respecting user consent and personal data rights
• IP protection for proprietary code and internal knowledge assets
• Workforce analytics linking AI outputs to actionable training needs
• Market differentiation through transparent, auditable intelligence systems


datacraft* est le club des Data Scientists, Chercheurs et Ingénieurs en IA. Rejoignez-nous !

Location
Please register to see the exact location of this event.
Paris, Île-de-France
Avatar for datacraft
Presented by
datacraft
Le club des data scientists, chercheurs et ingénieurs en IA
Hosted By