Ʈ  |  Contact Us
 
Ȩ > SAS Tech & Tip > SAS Programming
[SAS α׷] IML Kernel Regression ϱ 2019.05.13
ڼ 44 0
http://www.mysas.co.kr/sas_tiptech/a_question.asp?b_no=11094&cmd=content&bd_no=5

Kernel Regression


 


1. Introduction


ð 츮 Kernel ȸ ˰ п ҽϴ. е ˴ٽ, Ŀ ȸʹ 谡 Ǵ Դϴ. / һȭ( ⿩ϴ) ͷ Kernel Regression غ ϰڽϴ.


 2. Data

 

data gas;

label NOx = "Nitric oxide and nitrogen dioxide";

label E = "Air/Fuel Ratio";

input NOx E @@;

datalines;

4.818  0.831   2.849  1.045   3.275  1.021   4.691  0.97    4.255  0.825

5.064  0.891   2.118  0.71    4.602  0.801   2.286  1.074   0.97   1.148

3.965  1       5.344  0.928   3.834  0.767   1.99   0.701   5.199  0.807

5.283  0.902   3.752  0.997   0.537  1.224   1.64   1.089   5.055  0.973

4.937  0.98    1.561  0.665


;


run;


ڵ带 ̿Ͽ ͸ ҷ ݴϴ. ʹ Ӻ Ǿ ֽϴ. Label ̸ ٿִ ֽϴ. ڵ Ǹ ì鼭 ̸ մϴ.


 


3. Implementation


Ŀ ȸ ϴ.


 


3.1 Ŀ ¿ Bandwidth(Smoothing) Parameter մϴ.


Ŀ е Լ ϴ ũ ߿ ʽϴ. ǽ е Լ Ŀη մϴ. ߿ Bandwidth(=h) ϴ ε, ÿ ó ū Bandwidth Underfit ǰ, ʹ Bandwidth մϴ.


 


3.2 ġ


Ȯ е Լ Ͽ ġ ֽϴ. SAS/IML ͸ PDF ġ ֽϴ.


3.3 ȸ


, x0 ġ ŷ0 ش Ҵġ ȸ Դϴ. , Bandwidth ϴ ̰, ȸͰ Ű Դϴ. n-> ϼ  ε巯ϴ.


 

proc iml;

 


/* ڵ忡 ռ, Weighted regression ð PolyRegEst ߽ϴ. ڼ

https://blog.naver.com/statpark1014/221530810392

Ͻñ ٶϴ.


*/


 


/* h h=H*range(X)DLQSLEK. H Proportion of range, x ɰִ մϴ.

X0 ġ N(x0, h) Ȯе Լ f(x; x0,h) ˴ϴ.

̴ ǥԺ (1/h)*pdf("Normal", (X-x0)/h) ϴ. */


 

start GetKernelWeights(X, x0, H);

   return pdf("Normal", X, x0, H*range(X));

/* bandwidth()h = H*range(X) */

finish;


 


start KernelRegression(X, Y, H, _t=X, deg=1);

  t = colvec(_t);

   pred = j(nrow(t), 1, .);

   do i = 1 to nrow(t);


/* Kernel regression ۼմϴ.

     t is a vector of values at which to evaluate the fit. By default t=X*/

      /* deg Ͽ Ŀο ȮеԼ մϴ. */

      W = GetKernelWeights(X, t[i], H);

      b = PolyRegEst(Y, X, W, deg);   

      pred[i] = PolyRegScore(t[i], b);  /* PolyRegScore Weighted regression ÿ Ͽϴ. ٷ ٿ ȸͰ b Ǿ, ̸ Pred[i] Ͽϴ */

   end;

   return pred;

finish;

 


Լ ƽϴ. ͸ ҷ, H Ķͳ ϸ м ϰڽϴ.


 

use gas;  read all var {E NOx} into Z;  close;

call sort(Z);   /* ӽ Z X,Y մϴ.*/

X = Z[,1];  Y = Z[,2];


H = 0.1;        /* H ݴϴ. H ޶ϴ. */

eg = 1;        /* */

Npts = 201;     /* Smoothing ( ) for Uniform Grid */

t = T( do(min(X), max(X), (max(X)-min(X))/(Npts-1)) ); /* Uniform Grid */

pred = KernelRegression(X, Y, H, t);                 /* */

Z=t||pred;

create RegFit from Z[c={"t" "pred"}];

append from Z;

quit;


IML մϴ. Ͽ ׷ غڽϴ.


4. Visualization


data RegAll; /* 򰡸 ͼ մϴ*/

 LABEL pred = 'Kernel model';

merge gas regfit; /* Ϳ */

run;



title1 'Kernel Estimator ';

title2 'h=0.1*Range(x)';

proc sgplot data=RegAll;

scatter x=E y=NOx / filledoutlinedmarkers markerattrs=(size=6 symbol=CircleFilled);

              series x=t  y=Pred / curvelabel;

xaxis grid; yaxis grid;

run;


 

H 1, 0.5 ׷Դϴ. ʽϴ.


0.1, 0.05 Դϴ.

H 0.1϶ յ˴ϴ. 0.05 ϴϱ Ա. 

 

Ϲȭ H=0.1 Kernel äϵ ϰڽϴ.


̻ Kernel ȸ͸ ϴ ͱ ϷϿϴ. ̾, IML ȸ͸ ϴ ־ϴ.


 


 


5. Appendix


ȭ н ؼ, Naradya-Watson Ŀ ÷ϰڽϴ. غ е Ͽ غñ ٶϴ.


 

proc iml;

/* Nadaraya-Watson Ŀ ġ ϴ Դϴ.

ڼ

https://blog.naver.com/statpark1014/221500082987 

Ͻñ ٶϴ.*/

 

start NWKerReg(Y, X, x0, H);

K = colvec( pdf("Normal", X, x0, H*range(X)) );

return( K`*Y / sum(K) );

finish;


start NWKernelSmoother(X, Y, H, _t=X);

   t = colvec(_t);

   pred = j(nrow(t), 1, .);

   do i = 1 to nrow(t);

      pred[i] = NWKerReg(Y, X, t[i], H);

   end;

   return pred;

finish;

quit;

 

 


6. Restriction


̷ Ŀ ȸͿ ֽϴ. 뿪 ʹ Ұմϴ. (Ex 0.001 . ) , h d/10 midpoint ϴ. (ex {x[i+1]+x[i]}/2) 쿣 ġ 0 ˴ϴ. ν ȸͰ ִµ, ν ȸʹ ̿ ϹǷ ̷ غ ֽϴ.


 

 
 
޴ ȣ
޴ ȣ
 
 [SAS α׷ ǽ] ƽ ȸ͸ ̿ PROC PLM ǽ
 [SAS α׷ ǽ] IML Weighted Regression ϱ