Festival travel
SLU family
- Nothing, not interesting
Tacoma
- Air show – very cool
- Sea Park – nice
Shopping
- Macy: Clothing
- Great Wall: Food
- Walmart: bedding
Install
[X]
apt-get install clojure[X]
apt-get install sbcl clisp[X]
emacs support http://dev.clojure.org/display/doc/Getting+Started+with+Emacs[X]
clojure mode: (setq inferior-lisp-program “sbcl”)[X]
swank clojure- Useful link: http://nklein.com/2010/05/getting-started-with-clojureemacsslime/
- stackoverflow setup: http://stackoverflow.com/questions/2285437/a-gentle-tutorial-to-emacs-swank-paredit-for-clojure
[ ]
babel support http://orgmode.org/worg/org-contrib/babel/languages/ob-doc-clojure.html[ ]
iclojure is available (like ipython): https://github.com/cosmin/IClojure- iclojure.el is not available. Maybe we can write the iclojure.el like ipython.el
Books
[ ]
Joy of Clojure[ ]
Seven languages in seven weeks – Reading now[ ]
Clojure programming[ ]
Clojure in Action
Feeling
- Simple: code is data
- Functional programming
- High performance
- Cross platform, based on jvm
- Lisp based
- Compojure: web framework to learn
Fun
- Yes. It worth learning.
Build the graph and show some basic statistics
Load the data, create graph using “igraph” package
The vertex type distribution
# basic statistics of this graph # vertics types library(stringr) vertex <- V(g) print(ascii(transform(ddply(transform(data.frame(vertex=vertex$name), type=ifelse(str_detect(vertex, 'hsa-'), 'microRNA', 'Gene')), .(type), summarise, n=length(type)), percent=sprintf('%.1f%%', n*100/sum(n))), include.rownames=F, digits=0, caption='regulator type distribution'), type='org')
type | n | percent |
---|---|---|
Gene | 9532 | 98.1% |
microRNA | 180 | 1.9% |
Edge type distribution
# edge types library(plyr) library(ascii) print(ascii(transform(ddply(reg2tar, .(type), summarise, n=length(type)), percent=sprintf('%.1f%%', n*100/sum(n))), include.rownames=F, digits=0, caption='regulation type distribution'), type='org')
type | n | percent |
---|---|---|
mir2gene | 516 | 2.7% |
tf2gene | 18455 | 95.3% |
tf2mir | 403 | 2.1% |
Render the network
Render MYC network
# generate the subgraph nodes.name <- c('MYC') nodes.id <- which(V(g)$name %in% nodes.name) - 1 neighbor.nodes <- neighbors(g, v=nodes.id) g.sub <- subgraph(g, c(nodes.id, neighbor.nodes)) # plot it library(stringr) plot(g.sub, layout=layout.fruchterman.reingold, vertex.size=ifelse(str_detect(V(g)$name, 'hsa-'), 3, 6), vertex.label=V(g.sub)$name, vertex.color=ifelse(str_detect(V(g)$name, 'hsa-'), 'pink', 'lightblue'), edge.color=ifelse(E(g.sub)$coef > 0, 'red', 'green'))
https://tninja1980msn.files.wordpress.com/2012/05/wpid-mycnetwork.pdf
Render TP53 network
# generate the subgraph nodes.name <- c('TP53') nodes.id <- which(V(g)$name %in% nodes.name) - 1 neighbor.nodes <- neighbors(g, v=nodes.id) g.sub <- subgraph(g, c(nodes.id, neighbor.nodes)) # plot it library(stringr) plot(g.sub, layout=layout.fruchterman.reingold, vertex.size=ifelse(str_detect(V(g)$name, 'hsa-'), 3, 6), vertex.label=V(g.sub)$name, vertex.color=ifelse(str_detect(V(g)$name, 'hsa-'), 'pink', 'lightblue'), edge.color=ifelse(E(g.sub)$coef > 0, 'red', 'green'))
https://tninja1980msn.files.wordpress.com/2012/05/wpid-tp53network.pdf
Render the whole network
Not able to do it since it just run out of my RAM (3-4G).
Potential solutions:
- It require some time to build a R-cytoscape pipeline to render it on low-RAM machine.
- Use a high-performance computer, for example, Amazon EC2 Large RAM Ultra-Large Instance is required for this job.
I am good!
Let’s do a linear regression
x <- runif(1000) * 100 y <- x * 5 + rnorm(1000) fit <- lm(y ~ x) library(ascii) print(ascii(summary(fit)), type='org')
| | Estimate | Std. Error | t value | Pr(> \vert t \vert ) | |-------------+----------+------------+---------+----------------------| | (Intercept) | -0.03 | 0.06 | -0.43 | 0.67 | | x | 5.00 | 0.00 | 4408.99 | 0.00 |
Let’s do a pca
x <- runif(1000) * 100 y <- x * 5 + rnorm(1000) z <- runif(1000); w <- rnorm(1000) df <- cbind(x, y, z, w) p <- prcomp(t(df)) plot(p)
Do some text-mining work
import nltk s = 'I love my wife pengpeng.' print nltk.pos_tag(s.split(' '))
[('I', 'PRP'), ('love', 'VBP'), ('my', 'PRP$'), ('wife', 'NN'), ('pengpeng.', 'NNP')]
plot a histogram
x=rnorm(100) hist(x)
do a linear regression
x <- runif(1000) y <- x^2 * 3 + x * 5 + rnorm(1000) library(ggplot2) g <- ggplot(data.frame(x, y), aes(x=x,y=y)) + geom_point() + geom_smooth() print(g)
cite from http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ProductionServer
Use a clean environment
wget http://bitbucket.org/ianb/virtualenv/raw/tip/virtualenv.py
python virtualenv.py –no-site-packages galaxy_env
Disable the developer settings
in universe_wsgi.ini:
debug = False
use_interactive = False
Switch to a database server (postgres suggested)
sudo apt-get install postgresql
follow the suggestion here to create galaxy database
modify universe_wsgi.ini:
postgres:///galaxy?host=/var/run/postgresql
Using a proxy server
follow instruction here: http://bitbucket.org/galaxy/galaxy-central/wiki/Config/ApacheProxy