Research Demo Β· SRM Institute of Science and Technology

CrossFakeNet

Multimodal fake news detection β€” select input type to route through the appropriate AI model pipeline

// select input modality
Pipeline: RoBERTa-Large β†’ Sentiment β†’ Credibility Scorer
Pipeline: Whisper ASR β†’ Wav2Vec2 prosody β†’ RoBERTa-Large
πŸŽ™οΈ
Click to upload

Supports MP3, WAV, M4A, OGG β€” max 50MB

Pipeline: ViT-L/16 β†’ CLIP alignment β†’ GradCAM manipulation detection
πŸ–ΌοΈ
Click to upload

Supports JPG, PNG, WEBP β€” max 10MB

Pipeline: ViT-L/16 + RoBERTa β†’ CMAF Cross-Attention β†’ Social GAT
πŸ’¬
Click to upload

Supports JPG, PNG, WEBP β€” max 10MB

Pipeline: ViT frames + Whisper + Wav2Vec2 β†’ BiLSTM-GCN β†’ CMAF fusion
🎬
Click to upload

Supports MP4, AVI, MOV β€” max 100MB

// models in this system

πŸ”€
RoBERTa-Large
Text encoding (text, audio, video)
πŸ‘οΈ
ViT-L/16
Visual encoding (image, video)
πŸŽ™οΈ
Whisper ASR
Audio transcription (audio, video)
πŸ”Š
Wav2Vec2
Prosody features (audio, video)
πŸ”—
CLIP proxy
Image–text alignment (image)
πŸ•ΈοΈ
Graph ATN (GAT)
Social context (image+comments, video)
πŸ”€
CMAF module
Cross-modal fusion (image+comments, video)
⏱️
BiLSTM-GCN
Temporal + spatial encoding (video)