ReXSpecies-Tutorial

First of all: ReXSpecies needs Cookies and JavaScript enabled. Furthermore you should use Mozilla (or derivates as e.g. Firefox), Opera or Internet Explorer. Mozilla works from at least version 1.5 on, Opera needs to be version 9, Internet Explorer can be used in version 6.0.2900.2180.xpsp2_gdr.070227-2254 but 7.0 is recommended. On MacOS 10.3 Safari: 1.3.2, Firefox 1.5.0.1, Opera 8.5.4, Camino 1.0, and Shiira 1.2.2 probably work, but could not be tested extensively without full access to an Apple. ReXSpecies is hosted at http://bio.math-inf.uni-greifswald.de/ReXSpecies.

Open that link and click on "Register as new user". Enter your e-mail address and choose a password and a user name. You will receive an e-mail containing a confirmation link to ensure, that the mail address you entered was correct. After clicking the link, your account is activated during the next minutes, sometimes, depending on server usage, it may take longer. You will receive another e-mail then, containing your user name and password. Once registered you can login to ReXSpecies at the login form.

The ReXSpecies main menu

The ReXSpecies main screenThe main menu appears, when you move the mouse pointer over the text "Main menu" shown in the upper left corner of the ReXSpecies screen. In the main menu you find different sub menus and the item "Manage files". Besides "Manage files" this tutorial will use the "Alignment Sub menu >", the "ReXSpecies Sub menu >", and the "TFBS Sub menu >". For the other items please refer to the manual.
The purpose of ReXSpecies is to analyze transcription factor binding sites (TFBSs) in homologous sequences. Because there are very less experimentally validated TFBSs we have to rely on TFBS predictions. Currently ReXSpecies supports Mapper and Genomatix. Both tools need you to acquire a login account too. The first thing, we obviously need, are homologous sequences. You can get them via UCSC Conservation track for instance. For convenience we provide an example here. The example contains regulatory regions for the CFTR gene in FASTA format. We now explore the pluripotency network with respect to the regulation of CFTR.

Upload Sequences To ReXSpecies

The file managerNow choose "Manage files" from the "Main menu" and upload the examplefile. If you plan to analyze more genes, you should create some folders to organize your files. Therefore you can use "Create folder" between the "Current folder: [my own files]/"- and the file list. There also you find "Upload file" for uploading a file. Click now on "Upload file". Note: Below the (emtpy) file list, you see again "Upload file" and "Create folder"; this is a copy of the links above and for conveniance only, if you have man files and the list becomes really long.
The upload file formClick on "Choose file" to upload the file. The other fields will be entered automatically by ReXSpecies, when you have uploaded the file. The "Current folder"-line shows the current target folder for all file operations, thus the file will be uploaded to the root folder "/" in the example. The "File name"-field can contain a file name for the uploaded file, if it shall differ from its local name, otherwise the local name will be used. The "File name"-field may contain slashes "/" to put the file in another folder than the current; this other folder will be created, if it does not exist. If you check the "Overwrite if exists"-check box and there is already a file with the same name as the uploaded on the server, the existing file will be overridden, otherwise the uploaded file will be renamed (by appending a number). The big text box may contain the file, if you prefer to enter your files via the keyboard. "Choose file" may contain a local file name to upload as mentioned above. By clicking "Send data" you upload your file. If you click on "Back to files overview" nothing is uploaded and the file list is shown again.
Click on "Send data" after choosing the file to upload.
The upload file formThe form is now filled with the uploaded file and you get feedback on top of the form: "Your file was uploaded or changed". Now you are in editing mode and can change the file, if you like. Therefore it is important having the "Overwrite"-Box checked, otherwise every "Send data" (aka save) creates a new file. If you change the file name, you generate a new file with that name, you do not loose the old file - it is actually a copy and not a rename action.
On the form now, you have more links below the "Send data"-button: "Upload another file" (which is very reasonable, as usually more than one file is needed for an analysis) and "Show this file in file viewer", which shows the file and a menu with things you can do with that file.
We now need to align the sequences, because obviously they are not. Therefore we can go back to the "Manage files" page and choose "Align homo_seq.fasta as fasta sequence file using muscle" from the context menu, that appears, when you move the mouse pointer over the filename. Additionally the context menu is displayed in the file view. So click on "Show this file in file viewer".
The upload file formThe file is now displayed. below the yellow box you can see the file name again. Choose the "Alignment function" "Align homo_seq.fasta as fasta sequence file using muscle" to align the file using muscle. Confirm that action in the following form with yes and forgive the programmer that unneeded confirmation request. A new file named "homo_seq.fasta.muscle" is created, which is aligned by muscle. If there is already a file named "homo_seq.fasta.muscle" the new one will be "homo_seq.fasta.muscle0" and so on. Click on "Back to files overview" below the aligned file now.


The upload file formNow we have to "open" the alignment, which is called "Import" in ReXSpecies (Hint: ReXSpecies can handle only one alignment at a time)to do so, move the mouse pointer over the alignment file name "homo_seq.fasta.muscle" and choose "Import homo_seq.fasta.muscle as alignment file" from context menu that appears. Now again a confirmation is requested to prevent you from mixing sequences which happens, if you import an alignment and another one later without closing (clearing in terms of ReXSpecies) the first. This is because so it is possible to split the alignment in different files and importing them step by step, sequences with the same name will overwrite each other. Click on "Import this fasta alignment file" and you will read "homo_seq.fasta.muscle imported".
Once opened, ReXSpecies can already do some useful things with the alignment -- for example showing it with Jalview (this is an optional feature, because the Jalview jar file is not part of the ReXSpecies distribution and has to be installed separately into the ReXSpecies). Choose the "Alignment Sub menu" from the main menu and click "Show Alignment using JavaScript" for an example.
The upload file form

Upload TFBS Predictions To ReXSpecies

We now have to upload some TFBS predictions from Mapper or Genomatix. In this tutorial we will only upload Genomatix predictions, Mapper works similar. For convenience we provide a Genomatix output (the plain html file generated by Genomatix without images) for the data here. Upload the Genomatix file to ReXSpecies, this works the same way you uploaded the alignment file. Check the uploaded file for completeness with the file viewer by clicking its name in "Manage files".
Now we convert the html file to a format, ReXSpecies can import. Choose "Genomatix to Mapper" from the context menu (it is listed below TFBS functions). In the status bar of ReXSpecies "mat_fam.pl.htm converted from Genomatix to Mapper. Written to mat_fam.pl.htm.mapper" gives you feedback. There may be a problem calculating E-values for the Genomatix scores, if there are not enough matches for calculating an Extreme Value Distribution. In this case, ReXSpecies cannot yet import the Genomatix data with E-values, they will be set to 0.
The created file mat_fam.pl.htm.mapper contains a format similar to the Mapper output format with all of the Genomatix predictions. This file can be imported using "Import as TFBS prediction file". This opens the file and reads all predictions for the already opened alignment. If you directly import Mapper output files, you can import predictions for only one sequence at once (Actually, there is a way using the checkboxes besides every file uploaded to ReXSpecies. All Mapper prediction files must be named consistently with the names of the sequences in Fasta file; then you can check all Mapper files to import and choose "Import as TFBS prediction files" from the context menu that is shown when the mouse pointer moves over the word "[Selection]"). The Mapper file then should have the same name as the sequence in the FASTA file has, so that ReXSpecies can know for which sequence the predictions are; you can assign it yourself, if you import only one predicion file using its context menu. Genomatix exports the sequence names with the predictions so that here no problem occurs. Choose now "Import as TFBS prediction file" from the context menu for the created file mat_fam.pl.htm.mapper.
A form occurs, where you can enter the source of evidence (which means the data source for the predictions). Genomatix is already selected; this is for transparency only. The E-value factor should be set to 1.0; here you could balance very different E-values from different sources - but they will of course not be statistical values any more. Click "Import this ReXSpecies cata file" now to import the Genomatix predictions. Then you can read "mat_fam.pl.htm.mapper imported".

Creating An Alignment Annotated With TFBS Predictions

Now choose "Show alignment annotated with TFBS" from the "ReXSpecies Sub menu". You will see the aligned predictions. In the "Image menu" in the upper right corner you can adjust the graphics. Set "Title/Score/E‑value display" and "Draft" to "Off". Now set an E-value threshold, that shows many TFBSs to find empirically a reasonable threshold. 13.5 is usually a good start value.
Now the task of grouping and filtering starts. As you can see, there are many similar predictions. Genomatix output is less polluted by obvious false positives such as matches of TFBS models made from Arabidopsis thalliana sequences in e.g. Homo sapiens.

Grouping nevertheless is still necessary: Look at Equus and V$BRNF:112 and V$BRNF:121, which could be one group. Furthermore some matches should be filtered out, depending on your interests. For showing how to filter out some predictions, we will consider the platypus prediction for V$SATB at position 98-112 that is a "Special AT-rich sequence-binding protein 1, predominantly expressed in thymocytes, binds to matrix attachment regions (MARs)" being unimportant for our interests. Thus we want to hide this prediction and all others for the V$SATB models.

Choose "Show, filter, and group TFBS's" from the "TFBS Sub menu". Select the "ReXSpecies XML tfbslist format", which contains all columns of the TFBS table, and click on "Change display format". Now we will hide all predictions for V$SATB: Choose "Search in field: Factor for regular expression SATB and hide the matching TFBS's". Look at the status line now. When it shows "Filtered TFBS's by Factor =~ /SATB/ - Done.", all SATB-predictions are hidden. Refresh the graphic now. The V$SATB predictions disappeared.
Form for filtering and grouping TFBS predictions

Defining user defined TFBS prediction groups Now we want to group V$BRNF:112 with V$BRNF:121 to a group named BRNF. Therefore please choose "Manage manual group filters" from the "TFBS Sub menu". Enter BRNF in the text-box besides the "Add Group" button and click that button. Look at the status bar and wait for "Added group BRNF". Now you can search for "V$BRNF/BRN4.01[<] (102-123) (RealLen=18, EV=1.0e+01)" and select BRNF from the-drop-down list. Do so with "V$BRNF/BRN4.01[>] (111-132) (RealLen=18, EV=1.0e+01)" too. A warning appears: ReXSpecies would group "V$BRNF/BRN4.01[<] (102-123) (RealLen=18, EV=1.0e+01)" together with "V$BRNF/BRN4.01[<] (102-121) (RealLen=18, EV=1.0e+01)" automatically because its heuristic considers both equivalent. If we define a group manually, that contains only one of these both factors, the result is undefined. Thus we group the "V$BRNF/BRN4.01[<] (102-121) (RealLen=18, EV=1.0e+01)" also together with the others. The warning disappears. Always watch the status line also. It now shows "Put V$BRNF/BRN4.01[<] (102-121) into BRNF" and no error warning.

After defining the groups we have to apply them to our predictions. Go back to the "Show TFBS" page and click on "Grouping similar TFBS's (fills represents column)". The status line now shows "Filtered TFBS's by grouping similar ones - Done.". Refresh the graphics and the TFBS predictions we just put into a manual group will be grouped as well as some others, ReXSpecies considers equivalent because the have the same start or end coordinate and a very similar name. Look at the grouped TFBS: The group name is prepended in square brackets "[BRNF:117]: V$BRNF:121", which means the manual group BRNF matches, the center of the match is at position 117, the best matching member of the group is V$BRNF, which's center is at position 121. Move the mouse pointer over the predictions and read the represents entry: Here the other matching models of the group are listed with their name. The match is shown over the full extent of the group so that it overlaps all members. Automatically grouped predictions only have the "represents" entry, because no group name exists that could be prepended.

Calculating A MrBayes Tree From The Predictions And The Alignment

The upload file form Choose "Calculate MrBayes tree" from the "ReXSpecies Sub menu". In the following form you should fill in the fields "Output folder", "e-mail", and "Burnin". The e-mail field contains the address the notification will be send, when the job has finished and the MrBayes tree is calculated. The Output folder is the folder, ReXSpecies will create in your file store (if it does not exist) and copy the MrBayes results to. In the burnin field you can enter a the burnin parameter for MrBayes sumt. You may leave it blank, MrBayes will then discard the first file it created. Please read MrBayes manual for more information about the burnin-parameter. Click on "run mrbayes" and wait for the e-mail.

The other fields on the form allow you to set up different output formats of the matrix shown below the form. You can hide certain rows or columns of the matrix by clicking on "hide". This affects the matrix only, not the TFBS predictions. "Hide filter the TFBS of infoless groups" will hide TFBS predictions that occur at most in one sequence or on the other hand it hides TFBS predictions that occur in all sequences or in all but one. These predictions do not carry parsimony information and thus one may hide them for tree calculation.

You may use E-values instead of bits (1/0-values) in the matrix, but currently MrBayes cannot handle this, it aborts with an exception although according to the manual it should work. You also can download the bitmap in phylip or nexus (mrbayes) format to calculate the tree yourself.

Finally you can calculate trees with or without the matrix, and with or without the sequences in the input data for MrBayes. The trees should become more reasonable with TFBS prediction matrix than with sequences only, if not, the TFBS predictions may not carry any phylogenetic information, so we consider them to be false positives.

Creating An Annotated Tree Output

Finally we will create an annotated tree. Therefore we have two opportunities. First we can look at our MrBayes tree. Go to "Manage files" and to the folder, MrBayes created. Move the mouse cursor over the file named "data.con" and choose "Label/Manage nexus tree data.con with current predictions" from "Tree functions". MrBayes writes its trees in Nexus format and thus you have to choose this function and not the one for newick trees. ReXSpecies will show an "Internal error" otherwise.

Below the tree that now appears, there is a small menu where you can set up it. Probably you would like to change the font size and the tree size. The box "Without preserved TFBS" allows you to hide the blue predictions on the inner nodes.

Dollo and Fitch parsimony are the two methods, ReXSpecies currently supports to label the inner nodes. Please read the paper for details.

You also can upload you own tree in Newick or Nexus and show it with the same functions. It is important, that the leaves have the same names as the corresponding sequences and TFBS predictions have.

If the tree you use has other labels, the predictions may not be displayed, but you can use the "Manage synonyms" function from the "Alignment Sub menu" to tell ReXSpecies, which species names can be considered equivalent. Simply enter a trivial name and a scientific name for each sequence (the sequence names are listed in the drop down list, and should be translated to scientific names), click on "Save", and wait for the status line display that the "translation" has been saved. After doing so for all synonyms, just click on "Finalize", which creates dummy entries for all scientific names that translates them to themselves, if searched as trivial name. Translating the names has another useful effect: The sequences will now appear with their scientific name in all graphical output of ReXSpecies.Synonym manager

Thats it...

Thank you for using ReXSpecies!


Footnotes:
Consistently means, a file with the predictions for sequence XYZ in the FASTA file must be named XYZ:
In the FASTA file:
>XYZ
ATTGCTTAA...
The Mapper prediction file must be called XYZ (without extension, or for conveniance, XYZ.txt, because MS Windows appends .txt to downloaded FASTA files often.
Back to text