VQA for the Visually Impaired
We built a Visual Question Answering (VQA) model to answer navigation-related questions asked by the visually impaired. The dataset consists of images that are clicked by people who are visually impaired, hence the dataset also consists of images that are blurred, not properly focused etc. We perform clustering to find images suitable for our use-case - navigation. To build the model, we employed pre-trained VGGNet and BERT-large models for image feature extraction and text embedding extraction respectively. We finally add a feed forward neural network architecture on the concatenated embeddings to answer queries for the VQA task.