Calculate Percentiles with MS-SQL 2005 using interpolation…

You’ll need MS SQL 2005 or above, it looks like version 2012 has it built in see here!
This is my first use of the OVER bit so it may be a little ugly, also I have been using the old way to pivot stuff so it’s a bit messy at the end.

declare @t table (percentile numeric(10,5))
insert into @t values (0)
insert into @t values (0.25)
insert into @t values (0.5)
insert into @t values (0.75)
insert into @t values (0.8)
insert into @t values (1.0)

--give rownumbers for your data over a partition
drop table ##mgw
SELECT 
	s.gppraccode
	,age
	,ROW_NUMBER() OVER (partition by gppraccode ORDER BY age) AS "RowNumber"
INTO ##mgw
FROM mgw_ane_base AS s 

--calculate the individual values per percentile
drop table ##mgw_percentileHolding
select	t.percentile*(max(rownumber)-1)+1 as act
		,floor(t.percentile*(max(rownumber)-1)+1) as K
		,((t.percentile*(max(rownumber)-1)+1) % 1) as D
		,count(*) as N
		,t.percentile
		,a.gppraccode
into ##mgw_percentileHolding
from ##mgw as a join @t as t on 1=1
group by t.percentile,a.gppraccode

--fold it all over in a manual-ish pivot
select 
	gppraccode
	,max(case when percentile=0 then PercentileValue end) as [0%]
	,max(case when percentile=0.25 then PercentileValue end) as [25%]
	,max(case when percentile=0.5 then PercentileValue end) as [50%]
	,max(case when percentile=0.75 then PercentileValue end) as [75%]
	,max(case when percentile=0.8 then PercentileValue end) as [80%]
	,max(case when percentile=1 then PercentileValue end) as [100%]
	,N
FROM
	(
	SELECT 
	*
	,(SELECT max(age) from ##mgw as b where b.rownumber<=a.act and a.gppraccode=b.gppraccode) as below
	,(SELECT min(age) from ##mgw as b where b.rownumber>=a.act  and a.gppraccode=b.gppraccode) as above
	,(SELECT min(age) from ##mgw as b where b.rownumber>=a.act  and a.gppraccode=b.gppraccode)-(select max(age) from ##mgw as b where b.rownumber<=a.act and a.gppraccode=b.gppraccode ) as f
	,case	when percentile=0 then (SELECT min(age) from ##mgw as b where a.gppraccode=b.gppraccode) 
			when percentile=1 then (SELECT max(age) from ##mgw as b where a.gppraccode=b.gppraccode)  
			else (SELECT max(age) from ##mgw as b where b.rownumber<=a.act and a.gppraccode=b.gppraccode)+D*((select min(age) from ##mgw as b where b.rownumber>=a.act and a.gppraccode=b.gppraccode)-(select max(age) from ##mgw as b where b.rownumber<=a.act and a.gppraccode=b.gppraccode)) end as PercentileValue
	FROM ##mgw_percentileHolding as a
) as internal
group by gppraccode,N

Row_number(), dense Rank rank etc

You need SQL 2005 or greater for this:
Rownumbers with partitions:

SELECT 
    s.gppraccode
    ,age
    ,ROW_NUMBER()OVER (PARTITION BY gppraccode ORDER BY age asc) As [RowNumber]
FROM mgw_ane_base AS s 
SELECT s.gppraccode, age
    ,ROW_NUMBER() OVER (ORDER BY age) AS "RowNumber"
    ,RANK() OVER (ORDER BY age) AS Rank
    ,DENSE_RANK() OVER (ORDER BY age) AS "DenseRank"
    ,NTILE(10) OVER (ORDER BY age) AS Quartile
INTO ##mgw
FROM mgw_ane_base AS s 

Ship wedi crashio

The ship ran aground in rough weather near the gravel pier.

This is about as close as you could get (unless you stand on the rainbow bridge) as ‘they’ had blocked off the paths about 500metres either side of the stricken wessel.

Generate Random data in R

Generate a set of data where the distribution parameters change part way through:

d1<-rnorm(n,mean,sd)
d1<-rnorm(65,98,45)
d2<-rnorm(35,67,35)
d3<-c(d1,d2)
plot(d3,type="b")


The following table gives the distribution and the command for generating n data from each distribution.

Gaussian rnorm(n, mean=0, sd=1)
Exponential rexp(n, rate=1)
Gamma rgamma(n, shape, scale=1)
Poisson rpois(n, lambda)
Weibull rweibull(n, shape, scale=1)
Cauchy rcauchy(n, location=0, scale=1)
Beta rbeta(n, shape1, shape2)
'Student' (T) rt(n, df)
Fisher-Snedecor (F) rf(n, df1, df2)
Pearson (Chi-squared) rchisq(n, df)
Binomial rbinom(n, size, prob)
Multinomial rmultinom(n, size, prob)
Geometric rgeom(n, prob)
Hypergeometric rhyper(nn, m, n, k)
Logistic rlogis(n, location=0, scale=1)
Lognormal rlnorm(n, meanlog=0, sdlog=1)
Negative Binomial rnbinom(n, size, prob)
Uniform runif(n, min=0, max=1)
Wilcoxon's statistics rwilcox(nn, m, n), rsignrank(nn, n)

So farewell then Facebook, you creepy peeping $*!&s.

Hi MonkeyMike,

We have received a request to permanently delete your account. Your account has been deactivated from the site and will be permanently deleted within 14 days.

If you did not request to permanently delete your account, please login to Facebook to cancel this request:

http://www.facebook.com/login.php

Thanks,
The Facebook Team

 

 

 

Dimensions and Measures with BIDS, MDX etc.

So how do you tell what dimension is associated with a measure?

Open up BIDS (SQL Server Business Intelligence Development Studio)
From the file menu select: Open > Analysis Service Database
and choose your databsae and server etc.
Let the old girl sort itself out and then in the Solution Explorer select your cube of interest.

You’ll get something like this interlinked view:

Just switch to the Dimension Usage tab:

and you can see that Department is NOT linked to most measures.

Simple MDX again

WITH
SET [a] AS [Product].[Category].Levels(1).Members
SET [b] AS [Date].[Calendar].Levels(1).Members
SELECT
     Order([a], [b].Item(0), BDESC ) ON ROWS
     ,[b]  ON COLUMNS
FROM [Adventure Works];

The “Category” above is not an MDX keyword, but a part of the hierarchy under the “product” dimension.


What happens if you change the dimension to one in which the measure does not spread over?

WITH
SET [a] AS [Product].[Category].Levels(1).Members
SET [b] AS [Date].[Calendar].Levels(1).Members
SELECT
     Order([a], [b].Item(0), BDESC ) ON ROWS
     ,[b]  ON COLUMNS
FROM [Adventure Works];

Which gives:

CY 2001 CY 2002 CY 2003 CY 2004 CY 2006
Bikes $7,395,348.63 $19,956,014.67 $25,551,775.07 $13,399,243.18 (null)
Components $615,474.98 $3,610,092.47 $5,482,497.29 $2,091,011.92 (null)
Clothing $34,376.34 $485,587.15 $871,864.19 $386,013.16 (null)
Accessories $20,235.36 $92,735.35 $296,532.88 $161,794.33 (null)

Now what do we get if we only look at the top level of the row dimenion?

WITH
SET [a] AS [Product].[Category].Levels(0).Members
SET [b] AS [Date].[Calendar].Levels(1).Members
SELECT
     Order([a], [b].Item(0), BDESC ) ON ROWS
     ,[b]  ON COLUMNS
FROM [Adventure Works];
CY 2001 CY 2002 CY 2003 CY 2004 CY 2006
All Products $8,065,435.31 $24,144,429.65 $32,202,669.43 $16,038,062.60 (null)

Which is just the total “whatever measure this is” for the column dimension.
Now let’s throw in a dimension which doesn’t split the measure.

WITH
SET [a] AS [Department].Levels(1).Members
SET [b] AS [Date].[Calendar].Levels(1).Members
SELECT
     Order([a], [b].Item(0), BDESC ) ON ROWS
     ,[b]  ON COLUMNS
FROM [Adventure Works];

you get this:

CY 2001 CY 2002 CY 2003 CY 2004 CY 2006
Executive General and Administration $8,065,435.31 $24,144,429.65 $32,202,669.43 $16,038,062.60 (null)
Inventory Management $8,065,435.31 $24,144,429.65 $32,202,669.43 $16,038,062.60 (null)
Manufacturing $8,065,435.31 $24,144,429.65 $32,202,669.43 $16,038,062.60 (null)
Quality Assurance $8,065,435.31 $24,144,429.65 $32,202,669.43 $16,038,062.60 (null)
Research and Development $8,065,435.31 $24,144,429.65 $32,202,669.43 $16,038,062.60 (null)
Sales and Marketing $8,065,435.31 $24,144,429.65 $32,202,669.43 $16,038,062.60 (null)

Which looks like a cartesian join enacted on a crosstab. So each example of the row dimention gets an instance of the column dimensions (which the measure is applicable to)
How do I find out how my measures and dimensions are related?
…for that you need BIDS.

MDX again

WITH
SET [a] AS [Product].[Subcategory].Levels(1).Members
SET [b] AS [Date].[Calendar].Levels(1).Members
SELECT
     Order([a],[b].Item(0), BDESC ) ON ROWS,
     Order([b],[a].Item(0), BDESC  ) ON COLUMNS
FROM [Adventure Works] ;

works the same as:

WITH
SET [RowSet]    AS [Product].[Subcategory].Levels(1).Members
SET [ColumnSet] AS [Date].[Calendar].Levels(1).Members
SELECT
     Order( [RowSet], [ColumnSet].Item(0), BDESC ) ON ROWS,
     Order([ColumnSet],[RowSet].Item(0), BDESC  ) ON COLUMNS
FROM [Adventure Works];