r - Peculiarity with Scale and Z-Score -
i attempting scale data in r after doing research on function (which seems follow (x - mean) / std.dev
. looking for, scaled dataframe in r. i'd want make sure assumptions correct don't have wrong conclusions.
assumption
r scales each column independently. therefore, column 1 have own mean , standard deviation. column 2 have own.
assuming have dataset of size 100,000 , scale 3 columns. if proceed remove columns z-score on 3 , less -3, have up to (100,000 * .003) = 900 rows removed!
however, when went truncate data, 100,000 rows left 94,798. means 5,202 rows removed.
does mean assumption scale wrong, , doesn't scale column?
update ran test , did z-score conversion on own. still same amount of columns removed in end believe scale work. i'm curious why more .3% of data removed when 3 standard deviations out removed.
Comments
Post a Comment