My supervisor has instructed another person in my lab to use both training and testing data to impute missing values for use building a machine learning model. The results of the analysis haven't been put into a publication but my feeling was A) this is wrong, the model should be trained without receiving any information from the testing set and B) if this were to go into a publication it would be dodgey at best and potentially illegal if you didn't report it. I would be surprised if the results were published if it was reported.
Are my suspicions correct? Is there a rigorous reason why?