r - Phylogenetic tree -

February 15, 2015

I am working to keep a phosphatic tree based on the data-data of genes. Below is my subset of data (test txt). The tree is not built on any DNA sequences, but it is considered as just words.

  id gene 1 gene 2 1 ADRA1D ADK 2 ADRA1B ADK 3 ADRA1A ADK 4 ADRB1 ASIC 1 5 ADRB 1 ADK 6 ADRB 2 ASIC 1 ADRB 2 ADK 8 AGTR 1 ACH 9 AGTRRA 1 ADK 10 ALLOX 5 ADRB 1 11 ALOX 5 ADRB 2 12 ALPPL 2 ADRB 1 13 ALPPL 2 ADRB 2 14 AMI 2 AGR 1 15 AR ADRAA 1 AR ADRAID 17 AR ADARA 1B 18 AR ADARA 1A AR ADRAAAA A20 AR ADA 2B

Below is my code r

  Library (Ape) tab = reed CSV ("test.tx      Code>  
 My data is attached here   < P> I have a question about how the cluster is. Since the pair  
  17 AR ADRA1B 18 AR ADRA1A  
  and 
   
  should be compressed closely as they have a common gene. 17 and 2 should be together, and 18 and 3. 
  Should I use any other method, if I am wrong to use this method (Euclidean distance)? 
  Should I convert my data to the matrix of rows and columns, where the gene is 1 x-axis, and the gene is 2 y-axis, each cell is filled with 1 or 0? (Basically if they are added then it will be 1, and if not, then 0) 
  Updated code: 
   table = table (tab $ gene1, ( D, method = "ward") plot (as.phylo (fit))  
   P> However, I do not get only genes and genes 2 columns from genes 1 The data given below is actually what I want, but genes should also have genes from 2 columns 
   
  My answer is only valid if in fact only two genes present in each person are present and one person in each line If, however, each line means that  gene1  happens with  gene2 , certainly no useful clustering can be done, In my opinion, in that case, I expect an extra column that their common decline Stating the possibility of receiving and is a key component could be like some kind of analysis (PCA), but I am very far from being an expert on clustering (hierarchical). 
  Before you can use the  dist  function, you have brought your data in a suitable format: 
   # Change in format genes.mats & lt; - cbind (tab [, "id"], matrix (0l, gda, ("id", gene.names) lapply (seq_len (nrow (tab)), function (x) nno = nrow (tab), ncol = Length (gene.names)) colnames (gene.matrix) & lt; -c jean matrix [x, match (tab [x, c ("gene1", "gene2")], colnames (gene.matrix) ]  
  Received App ADK ADORA1 ADRA1A ADRA1B ADRA1D ADRA2 A ... [1,] 1 0 0 0 0 0 0 0 [0] [0 ] 2,] 2 0 1 0 0 1 0 0 [3,] 3 0 1 0 1 0 0 0 [4,] 4 0 0 0 0 0 0 ...

So each line represents an overview (= personal), where the person is identified in the first column and in each subsequent column Ains 1 if the genes are present If this is missing then the dist function on this matrix can be appropriately applied (ID column removed):

 < Code> d & lt; - dist (gene.matrix [, - 1], method = "eclidian") fit & lt; - hclust (d, method = "ward") plot (as.phylo (fit))

Perhaps, it is a good idea to read the differences, for example, Between the distance between eclidan distance between individuals with id = 1 and id = 2 euclidean , between Manhattan etc. is:

  euclidean_dist = sqrt ((0-0) ^ 2 + (1-1) ^ 2 + (0-0) ^ 2 + (0-0) ^ 2 + (0-1) ^ 2 + ...)

While Manhattan distance

  Manhattan_dist = asp (0-0) + abs (1 -1) + ABS (0-0) + ABS (0-0) + ABS (0-1) + ...

Search This Blog

LAva

r - Phylogenetic tree -

Comments

Post a Comment

Popular posts from this blog

c# - Reactive Extensions ControlScheduler -

scala - Play Framework - how to bind form to a session field -

c++ - Why does Visual Studio Release build break on non-executing code line -