CrossFakeNet — Multimodal Fake News Detector

// select input modality

Pipeline: RoBERTa-Large → Sentiment → Credibility Scorer

News article / headline / claim

Source URL (optional)

Language

Pipeline: Whisper ASR → Wav2Vec2 prosody → RoBERTa-Large

Upload audio file

🎙️

Click to upload

Supports MP3, WAV, M4A, OGG — max 50MB

News context / title (optional)

Pipeline: ViT-L/16 → CLIP alignment → GradCAM manipulation detection

Upload image

🖼️

Click to upload

Supports JPG, PNG, WEBP — max 10MB

Image caption / accompanying text (optional)

Pipeline: ViT-L/16 + RoBERTa → CMAF Cross-Attention → Social GAT

Upload image

💬

Click to upload

Supports JPG, PNG, WEBP — max 10MB

Post caption / headline

User comments (one per line)

Share count

Platform

Pipeline: ViT frames + Whisper + Wav2Vec2 → BiLSTM-GCN → CMAF fusion

Upload video

🎬

Click to upload

Supports MP4, AVI, MOV — max 100MB

Title / headline

Source domain

View count

Share count

// models in this system

🔤

RoBERTa-Large

Text encoding (text, audio, video)

👁️

ViT-L/16

Visual encoding (image, video)

🎙️

Whisper ASR

Audio transcription (audio, video)

🔊

Wav2Vec2

Prosody features (audio, video)

🔗

CLIP proxy

Image–text alignment (image)

🕸️

Graph ATN (GAT)

Social context (image+comments, video)

🔀

CMAF module

Cross-modal fusion (image+comments, video)

⏱️

BiLSTM-GCN

Temporal + spatial encoding (video)