Simple outlook inbox analysis using R

Posted on 2015-06-01 by mike

Emails from eBay members asking questions come through as randomstring@members.ebay.com which can make them hard to count as a group
The following R script re-codes them so they can be counted.

install.packages("plyr")
library(plyr)

setwd("/home/mike/Desktop")
dir()
inbox<-read.csv("inbox.TXT", sep="\t")
names(inbox)

inbox$EADD <- ifelse(grepl("members.ebay.co.uk",inbox$From...Address.), 
                      "members.ebay.co.uk" , 
                      c(as.character(inbox$From...Address.)))   

str(inbox)

f <- ddply(inbox,c("EADD"),summarize,N=length(EADD))
plot(f$N)
head(f[order(-f$N),])
str(f)

Once you've identified the biggest culprits, make a rule to move or delete them from your email.

Network Diagram in R

Posted on 2014-08-05 by mike

 library(network)
m <- matrix(rbinom(1000,1,.4),15,5)
diag(m) <- 0
g <- network(m, directed=FALSE)
summary(g)
plot(g)

Recoding a factor

Posted on 2014-06-05 by mike

When you have N levels of a factor but you would like M (M < N) you need to recode the data set.

when you run str(df) you get an idea that factors are numbered in any vetor or data frame.

We need to use a command to recode the levels. The command you use is ‘levels’:

levels(df$factor)[c(2,4,6,7)] = "Horse Whispering"

Which means: Take levels that have the internal numberings of 2,4,6,7 and convert them to being “Horse Whispering”.
To recode the rest you need to find the internal numbering of the new levels for the df:

levels(df$factor)

because the levels that were formally 2,4,6 and 7 have now been recoded into a single value and you’ll have to adjust the integers that you are using every time you run the command.

Continue on until all the necessary coding has been completed.

To make sure you have recoded properly you should make a copy of the first factor and recode the copy rather than the original. That way you can compare new and old later:

table(df$OrigFactor , df$RecodedFactor)

Which will print out a table of counts for OrigFactor Vs RecodedFactor

create a vector of dates in R Cran

Posted on 2014-02-18 by mike

dt <- as.Date('2010-01-01')
dts <- seq(dt,length =17, by="+1 month")

treemap in R

Posted on 2013-09-21 by mike

library(RODBC)
library(lattice)
library(treemap)

ch<-odbcConnect("mike_db",uid="mike")
c<-sqlQuery(ch, paste("select" 
,"ward,year(end_Dttm) as [year]"
,",sum(datediff(mi,start_Dttm,end_Dttm)/1440.0) as LOS"
,"from [wardstays_examples]"
,"GROUP BY ward ,year(end_Dttm)" 
))
str(c)

treemap (c
         ,index=c("year","ward") # the different levels
         ,vSize = "LOS" # the value on which to scale the squares
         )

GIS in R cran

Posted on 2013-07-14 by mike

library(maptools)
library(Cairo)
walesCoast<-readShapeSpatial("Z:/MAPPING DATA/Meridian 2 Shape/data/coast_ln_polyline.shp", proj4string=CRS("+init=epsg:27700"))
walesUA<-readShapeSpatial("Z:/MAPPING DATA/Meridian 2 Shape/data/district_region.shp", proj4string=CRS("+init=epsg:27700"))
x1x2<-c(221000,346594)
y1y2<-c(269406,395000)

plot(walesUA,xaxs="i",yaxs="i",xlim=x1x2,ylim=y1y2,lwd=1)
plot(walesCoast,xaxs="i",yaxs="i",xlim=x1x2,ylim=y1y2,lwd=3,col="red", add=TRUE)

mtext("upvar",side=2,line=2,col=1)
mtext("Bottom",side=1,line=2,col=2)
mtext("Top",side=3,line=2,col=3)
mtext("Right",side=4,line=1,col=4)

Generate Random data in R

Posted on 2012-03-28 by mike

Generate a set of data where the distribution parameters change part way through:

d1<-rnorm(n,mean,sd)

d1<-rnorm(65,98,45)
d2<-rnorm(35,67,35)
d3<-c(d1,d2)
plot(d3,type="b")

The following table gives the distribution and the command for generating n data from each distribution.

Gaussian	rnorm(n, mean=0, sd=1)
Exponential	rexp(n, rate=1)
Gamma	rgamma(n, shape, scale=1)
Poisson	rpois(n, lambda)
Weibull	rweibull(n, shape, scale=1)
Cauchy	rcauchy(n, location=0, scale=1)
Beta	rbeta(n, shape1, shape2)
'Student' (T)	rt(n, df)
Fisher-Snedecor (F)	rf(n, df1, df2)
Pearson (Chi-squared)	rchisq(n, df)
Binomial	rbinom(n, size, prob)
Multinomial	rmultinom(n, size, prob)
Geometric	rgeom(n, prob)
Hypergeometric	rhyper(nn, m, n, k)
Logistic	rlogis(n, location=0, scale=1)
Lognormal	rlnorm(n, meanlog=0, sdlog=1)
Negative Binomial	rnbinom(n, size, prob)
Uniform	runif(n, min=0, max=1)
Wilcoxon's statistics	rwilcox(nn, m, n), rsignrank(nn, n)

monkeymike.co.uk

…stuff…and more stuff

Tag Archives: R