It's been long time that I haven't updated blogs in this space.
Recently, I finally understood what is the idea behind machine learning and what is CPU acceleration and why CUDA is so pop.
ML, in fact, ML is just to use calculus (matrices mainly) to make questions that lead too many solutions into some question that the final result can be "inferred/concluded" by probability problem + calculus. (e.g. how to identify a number in picture, this can analogized as how to make a complex matric that to check each single pixel value and formulize a formula so to say "modeling" to predict if new input can be predicted as well); (https://www.youtube.com/watch?v=_RPHiqF2bSs)
GPU acceleration, as ML in fact is just some very complex calculus operation, and for computer, calculus operation can be treated as quite a mount of arithmetical operations, of which, this is just fits into GPU scope. GPU is designed for very strong arithmetical operation but very weak on system controls. (https://www.youtube.com/watch?v=kUqkOAU84bA&t=879s)
Tensorflow or any ML framework, as data scientists or ML Engineers do not really care about how the "model" is trained by computer, therefore CUDA is not visible directly for ML engineer, the integration is done by Tensor flow (https://www.tensorflow.org/install/gpu). And for more detailed optimization, needs to be customized by different in-house system engineer for that.