Optimization of Hugging Face Transformer models to get Inference < 1 Millisecond Latency + deployment on production ready inference server
Hi,
I just released a project showing how to optimize big NLP models and deploy them on Nvidia Triton inference server.