r - Phylogenetic tree -


I am working to keep a phosphatic tree based on the data-data of genes. Below is my subset of data (test txt). The tree is not built on any DNA sequences, but it is considered as just words.

  id gene 1 gene 2 1 ADRA1D ADK 2 ADRA1B ADK 3 ADRA1A ADK 4 ADRB1 ASIC 1 5 ADRB 1 ADK 6 ADRB 2 ASIC 1 ADRB 2 ADK 8 AGTR 1 ACH 9 AGTRRA 1 ADK 10 ALLOX 5 ADRB 1 11 ALOX 5 ADRB 2 12 ALPPL 2 ADRB 1 13 ALPPL 2 ADRB 2 14 AMI 2 AGR 1 15 AR ADRAA 1 AR ADRAID 17 AR ADARA 1B 18 AR ADARA 1A AR ADRAAAA A20 AR ADA 2B  

Below is my code r

  Library (Ape) tab = reed CSV ("test.tx   

Code>

My data is attached here Enter image details here < P> I have a question about how the cluster is. Since the pair

  17 AR ADRA1B 18 AR ADRA1A  

and

  

should be compressed closely as they have a common gene. 17 and 2 should be together, and 18 and 3.

Should I use any other method, if I am wrong to use this method (Euclidean distance)?

Should I convert my data to the matrix of rows and columns, where the gene is 1 x-axis, and the gene is 2 y-axis, each cell is filled with 1 or 0? (Basically if they are added then it will be 1, and if not, then 0)

Updated code:

  table = table (tab $ gene1, ( D, method = "ward") plot (as.phylo (fit))  
P> However, I do not get only genes and genes 2 columns from genes 1 The data given below is actually what I want, but genes should also have genes from 2 columns

IM is the place for explanation in the example of Enter Question.

My answer is only valid if in fact only two genes present in each person are present and one person in each line If, however, each line means that gene1 happens with gene2 , certainly no useful clustering can be done, In my opinion, in that case, I expect an extra column that their common decline Stating the possibility of receiving and is a key component could be like some kind of analysis (PCA), but I am very far from being an expert on clustering (hierarchical).

Before you can use the dist function, you have brought your data in a suitable format:

  # Change in format genes.mats & lt; - cbind (tab [, "id"], matrix (0l, gda, ("id", gene.names) lapply (seq_len (nrow (tab)), function (x) nno = nrow (tab), ncol = Length (gene.names)) colnames (gene.matrix) & lt; -c jean matrix [x, match (tab [x, c ("gene1", "gene2")], colnames (gene.matrix) ]  

Received App ADK ADORA1 ADRA1A ADRA1B ADRA1D ADRA2 A ... [1,] 1 0 0 0 0 0 0 0 [0] [0 ] 2,] 2 0 1 0 0 1 0 0 [3,] 3 0 1 0 1 0 0 0 [4,] 4 0 0 0 0 0 0 ...

So each line represents an overview (= personal), where the person is identified in the first column and in each subsequent column Ains 1 if the genes are present If this is missing then the dist function on this matrix can be appropriately applied (ID column removed):

 < Code> d & lt; - dist (gene.matrix [, - 1], method = "eclidian") fit & lt; - hclust (d, method = "ward") plot (as.phylo (fit))  

Perhaps, it is a good idea to read the differences, for example, Between the distance between eclidan distance between individuals with id = 1 and id = 2 euclidean , between Manhattan etc. is:

  euclidean_dist = sqrt ((0-0) ^ 2 + (1-1) ^ 2 + (0-0) ^ 2 + (0-0) ^ 2 + (0-1) ^ 2 + ...)  

While Manhattan distance

  Manhattan_dist = asp (0-0) + abs (1 -1) + ABS (0-0) + ABS (0-0) + ABS (0-1) + ...  

Comments

Popular posts from this blog

ios - How do I use CFArrayRef in Swift? -

eclipse plugin - Run java code error: Workspace is closed -

c - Error on building source code in VC 6 -