deep learning MMBT: Supervised Multimodal Bitransformers for Classifying Images and Text How to use a BERT-like model with a convolutional network as image encoder to perform a classification task using images, texts and self attention over both modalities at the same time.
web How I created this app Which technologies are envolved here? Do you want to know how to create your own site like this one? Let's check how I created this server-side rendered page.