tag:blogger.com,1999:blog-17889588.post113145677088799244..comments2024-03-13T07:14:55.283+01:00Comments on chem-bla-ics: When to stop including QSAR model variables...Egon Willighagenhttp://www.blogger.com/profile/07470952136305035540noreply@blogger.comBlogger1125tag:blogger.com,1999:blog-17889588.post-1131460641143259322005-11-08T15:37:00.000+01:002005-11-08T15:37:00.000+01:00I agree with the observation that without values o...I agree with the observation that without values of the t-statistic and corresponding p-values, its difficult to say whether p3 (or p2 in the second case) really has any effect on the model.<BR/><BR/>In my opinion, lack of these statistics makes the model meaningless - yes, you could evaluate the range of the variable and look at the maximal influence as you have done. I don't think that should be required when a statistical model is presented.<BR/><BR/>OK, enough of the rant!<BR/><BR/>One question I have: were the input variables scaled? If not, that would explain the magnitude of order differences in the coefficients. And in such a case, it would not be wise to discount p3 (or p2 in the second case), since it is possible that these variables are explaining some of the variance, but due to lack of scaling this would not be apparent.<BR/><BR/>On the other hand if the data was scaled, then yes, I would agree that p2 in the second model could probably be dropped.<BR/><BR/>Apart from the use of statistical tests, a quick way to check tha that the model is not overfit (ie p3 or p2 are not extraneous variables) run a PLS using the descriptors in the models. If overfitting is not occuring, all 3 components (or 2 in the case of the second model) will be validated. If not, then you know theres some wrong!<BR/><BR/>But in the end, as I said above, reporting a regression model without supporting statistics makes the use of the model pretty shaky.Anonymousnoreply@blogger.com