Upload TFBS Predictions To ReXSpecies
We now have to upload some TFBS predictions from Mapper or Genomatix.
In this tutorial we will only upload Genomatix predictions, Mapper
works similar. For convenience we provide a
Genomatix output
(the plain html file generated by Genomatix without images) for the
data here. Upload the
Genomatix
file
to ReXSpecies, this works the same way you uploaded the alignment file.
Check the uploaded file for completeness with the file viewer by
clicking its name in "Manage files".
Now we convert the html file to a format, ReXSpecies can import. Choose
"Genomatix to Mapper" from the context menu (it is listed below TFBS
functions). In the status bar of ReXSpecies "mat_fam.pl.htm converted
from Genomatix to Mapper. Written to mat_fam.pl.htm.mapper" gives you
feedback. There may be a problem calculating E-values for the Genomatix
scores, if there are not enough matches for calculating an Extreme
Value Distribution. In this case, ReXSpecies cannot
yet
import
the Genomatix data with E-values, they will be set to 0.
The created file mat_fam.pl.htm.mapper contains a format similar to the
Mapper output format with all of the Genomatix predictions. This file
can be imported using "Import as TFBS prediction file". This opens the
file and
reads all predictions for the already opened alignment. If you directly
import Mapper output files, you can import predictions for only one
sequence at once (Actually, there is a way using the checkboxes besides
every file uploaded to ReXSpecies. All Mapper prediction files must be
named
consistently with the names
of the sequences in Fasta file; then you can check all Mapper files to
import and choose "Import as TFBS prediction files" from the context
menu that is shown when the mouse pointer moves over the word
"[Selection]"). The Mapper file then should have the same name as the
sequence in the FASTA file has, so that ReXSpecies can know for which
sequence the predictions are; you can assign it yourself, if you import
only one predicion file using its context menu. Genomatix exports the
sequence names with
the predictions so that here no problem occurs. Choose now "Import
as TFBS prediction file" from the context menu for the created
file mat_fam.pl.htm.mapper.
A form occurs, where you can enter the source of evidence (which means
the data source for the predictions). Genomatix is already selected;
this is for transparency only. The E-value factor should be set to 1.0;
here you could balance very different E-values from different sources -
but they will of course not be statistical values any more. Click
"Import this ReXSpecies
cata
file" now to import the Genomatix predictions. Then you can read
"mat_fam.pl.htm.mapper imported".
Creating An Alignment Annotated With TFBS Predictions
Now choose "Show alignment annotated with TFBS" from the "ReXSpecies
Sub menu". You
will see the aligned predictions. In the "Image menu" in the upper
right corner you can adjust the graphics. Set "Title/Score/E‑value
display" and "Draft" to "Off". Now set an E-value threshold, that shows
many TFBSs to find empirically a reasonable threshold. 13.5 is
usually a good start value.
Now the task of grouping and filtering starts. As you can
see, there are many similar predictions. Genomatix output is less
polluted by obvious false positives such as matches of TFBS models made
from Arabidopsis thalliana sequences in e.g. Homo sapiens.
Grouping
nevertheless is still necessary: Look at Equus and V$BRNF:112 and
V$BRNF:121, which could be one group. Furthermore some matches should
be filtered out, depending on your interests. For showing how to filter
out some predictions, we will consider the platypus prediction for
V$SATB at
position 98-112 that is a "Special AT-rich sequence-binding protein 1,
predominantly expressed in thymocytes, binds to matrix attachment
regions (MARs)" being unimportant for our interests. Thus we want to
hide this prediction and all others for the V$SATB models.
Choose "Show, filter, and group TFBS's" from the "TFBS Sub menu".
Select the
"ReXSpecies XML tfbslist format", which contains all columns of the
TFBS table, and click on "Change display format". Now we will hide all
predictions for V$SATB: Choose "Search in field:
Factor
for regular expression
SATB
and hide the matching TFBS's". Look at the status line now. When it
shows "Filtered TFBS's by Factor =~ /SATB/ - Done.", all
SATB-predictions are hidden. Refresh the graphic now. The
V$SATB predictions disappeared.

Now we want to group
V$BRNF:112
with V$BRNF:121 to a group named BRNF. Therefore please choose "Manage
manual group filters" from the "TFBS Sub menu". Enter BRNF in the
text-box besides the "Add Group" button and click that button. Look at
the status bar and wait for "Added group BRNF". Now you can
search
for "V$BRNF/BRN4.01[<] (102-123) (RealLen=18, EV=1.0e+01)" and
select BRNF from the-drop-down list. Do so with
"V$BRNF/BRN4.01[>]
(111-132) (RealLen=18, EV=1.0e+01)" too. A warning appears: ReXSpecies
would group "V$BRNF/BRN4.01[<] (102-123) (RealLen=18,
EV=1.0e+01)"
together with "V$BRNF/BRN4.01[<] (102-121) (RealLen=18,
EV=1.0e+01)"
automatically because its heuristic considers both equivalent. If we
define a group manually, that contains only one of these both
factors, the result is undefined. Thus we group the
"V$BRNF/BRN4.01[<] (102-121) (RealLen=18, EV=1.0e+01)" also
together
with the others. The warning disappears. Always watch the status line
also. It now shows "Put V$BRNF/BRN4.01[<] (102-121) into BRNF"
and
no error warning.
After defining the groups we have to apply them to our predictions. Go
back to the "Show TFBS" page and click on "Grouping similar TFBS's
(fills represents column)". The status line now shows "Filtered TFBS's
by grouping similar ones - Done.". Refresh the graphics and
the TFBS predictions we just put into a manual group will be
grouped as well as some others, ReXSpecies considers equivalent because
the have the same start or end coordinate and a very similar name. Look
at the grouped TFBS: The group name is prepended in square brackets
"[BRNF:117]: V$BRNF:121", which means the manual group BRNF matches,
the center of the match is at position 117, the best matching member of
the group is V$BRNF, which's center is at position 121. Move the mouse
pointer over the predictions and read the represents entry: Here the
other matching models of the group are listed with their name. The
match is shown over the full extent of the group so that it overlaps
all members. Automatically grouped predictions only have the
"represents" entry, because no group name exists that could be
prepended.
Calculating A MrBayes Tree From The Predictions And The
Alignment

Choose "Calculate MrBayes tree" from the "ReXSpecies Sub menu". In the
following form you should fill in the fields "Output folder", "e-mail",
and "Burnin". The e-mail field contains the address the notification
will be send, when the job has finished and the MrBayes tree is
calculated. The Output folder is the folder, ReXSpecies will
create in your file store (if it does not exist) and
copy the
MrBayes results to. In the burnin field you can enter a the burnin
parameter for MrBayes sumt. You may leave it blank, MrBayes will then
discard the first file it created. Please read
MrBayes manual for more
information about the burnin-parameter. Click on "run mrbayes" and wait
for the e-mail.
The other fields on the form allow you to set up different output
formats of the matrix shown below the form. You can hide certain rows
or columns of the matrix by clicking on "hide". This affects the matrix
only, not the TFBS predictions. "Hide filter the TFBS of infoless
groups" will hide TFBS
predictions that occur at most in one sequence or on the other
hand it hides TFBS predictions that occur in all sequences or in all
but one. These predictions do not carry parsimony information and thus
one may hide them for tree calculation.
You may use E-values instead of bits (1/0-values) in the matrix, but
currently MrBayes cannot handle this, it aborts with an exception
although according to the manual it should work. You also can download
the bitmap in phylip or nexus (mrbayes) format to
calculate the tree yourself.
Finally you can calculate trees with or without the matrix, and with or
without the sequences in the input data for MrBayes. The trees should
become more reasonable with TFBS prediction matrix than with sequences
only, if not, the TFBS predictions may not carry any phylogenetic
information, so we consider them to be false positives.
Creating An Annotated Tree Output
Finally we will create an annotated tree. Therefore we have two
opportunities. First we can look at our MrBayes tree. Go to "Manage
files" and to the folder, MrBayes created. Move the mouse cursor over
the file named "data.con" and choose "Label/Manage nexus tree data.con
with current predictions" from "Tree functions". MrBayes writes its
trees in Nexus format and thus you have to choose this function and not
the one for newick trees. ReXSpecies will show an "Internal error"
otherwise.
Below the tree that now appears, there is a small menu where you can
set up it. Probably you would like to change the font size and the tree
size. The box "Without preserved TFBS" allows you to hide the blue
predictions on the inner nodes.
Dollo and Fitch parsimony are the two methods, ReXSpecies currently
supports to label the inner nodes. Please read the paper for details.
You also can upload you own tree in Newick or Nexus and show it with
the same functions. It is important, that the leaves have the same
names as the corresponding sequences and TFBS predictions have.
If the tree you use has other labels, the predictions may not be
displayed, but you can use the "Manage synonyms" function from the
"Alignment Sub menu" to tell ReXSpecies, which species names can be
considered equivalent. Simply enter a trivial name and a scientific
name for each sequence (the sequence names are listed in the drop down
list, and should be translated to scientific names), click on "Save",
and wait for the status line display that the "translation" has been
saved. After doing so for all synonyms,
just click on "Finalize", which creates dummy entries for all
scientific
names that translates them to themselves, if searched as trivial name.
Translating the names has another useful effect: The sequences will now
appear with their scientific name in all graphical output of ReXSpecies.
Thats it...
Thank you for using ReXSpecies!
Footnotes:
Consistently
means, a file with the predictions for sequence XYZ in the FASTA file
must be named XYZ:
In the FASTA file:
>XYZ
ATTGCTTAA...
The Mapper prediction file must be called XYZ (without extension, or
for conveniance, XYZ.txt, because MS Windows appends .txt to downloaded
FASTA files often.
Back to text