sql server - Count unique numbers in R and SQL -
i have data use in both r , sql server. problem when search unique numbers specific column in r shows 222 , in sql returns 216. what's problem caused difference?
the query used in sql:
select count(distinct colname) tablename
and in r:
length(unique(dataframename$colname))
it's difficult without actual data, r , sql @ unique values differently. r (more point - unique
) treat na , various sizes of spaces unique values:
> unique(c("f","g","f",""," ",na,null)) [1] "f" "g" "" " " na > length(unique(c("f","g","f",""," ",na,null))) [1] 5
sql treat various sizes of space being equal , non-unique:
create table persons ( personid int, lastname varchar(255)); insert persons (personid, lastname) values (1, 'rockwell'),(2,''),(4,'cohen'),(5,' '),(6,' '); select count(distinct lastname) persons
will give answer 3
you can trim trailing , leading whitespace str_trim
stringr
library in r:
library(stringr) <- str_trim(c("f","g","f",""," ",na,null)) unique(a) [1] "f" "g" "" na
Comments
Post a Comment