Wednesday, April 13, 2016

Bibliografie R

http://stackoverflow.com/questions/16880411/r-tm-big-data-based-on-a-termdocumentmatrix-how-to-set-term-freq-bound-to-ext
Text mining Willian Shakespere:
http://www.r-bloggers.com/text-mining-the-complete-works-of-william-shakespeare/


http://www.r-bloggers.com/an-example-of-social-network-analysis-with-r-using-package-igraph/


Disimilarities Cosine
http://stats.stackexchange.com/questions/78321/term-frequency-inverse-document-frequency-tf-idf-weighting
library(proxy)
cosine_dist_mat <- as.matrix(dist(t(myDtm), method = "cosine"))
docsdissim <- dist(as.matrix(termDocMatrix), method = "cosine")
n <- as.matrix(docsdissim)
v <- sort(rowSums(n), decreasing=TRUE)
myNames <- names(v)
d <- data.frame(word=myNames, freq=v)
wordcloud(d$word, d$freq, min.freq=33)
termDocMatrix <- n
termDocMatrix[termDocMatrix>=1] <- 1
termDocMatrix[5:10,1:20]
termMatrix <- termDocMatrix %*% t(termDocMatrix)
# inspect terms numbered 5 to 10
termMatrix[5:10,5:10]
library(igraph)
# build a graph from the above matrix
g <- graph.adjacency(termMatrix, weighted=T, mode = "undirected")
# remove loops
g <- simplify(g)
# set labels and degrees of vertices
V(g)$label <- V(g)$name
V(g)$degree <- degree(g)
# set seed to make the layout reproducible
set.seed(3952)
layout1 <- layout.fruchterman.reingold(g)
plot(g, layout=layout1)
write.graph(g, "e:/siriacosinedisimilarity.graphml", format=c( "graphml"))

Cosine 1 =>mai degraba disimilaritate :)


termDocMatrix <- n
#termDocMatrix[termDocMatrix>=1] <- 1
termDocMatrix[termDocMatrix<1] <- 0
termDocMatrix[5:10,1:20]
termMatrix <- termDocMatrix %*% t(termDocMatrix)
# inspect terms numbered 5 to 10
termMatrix[5:10,5:10]
library(igraph)
# build a graph from the above matrix
g <- graph.adjacency(termMatrix, weighted=T, mode = "undirected")
# remove loops
g <- simplify(g)
# set labels and degrees of vertices
V(g)$label <- V(g)$name
V(g)$degree <- degree(g)
# set seed to make the layout reproducible
set.seed(3952)
layout1 <- layout.fruchterman.reingold(g)
plot(g, layout=layout1)
write.graph(g, "e:/siriacosinedisimilaritynonbollean.graphml", format=c( "graphml"))

No comments:

Post a Comment