PlantTFDB
Plant Transcriptional Regulatory Map

FunTFBS

The tool screens for functional TFBS and regulatory interaction by coupling base-varied binding affinities of transcription factors and consistently evolutionary constraints on transcription factor binding sites.



Introduction to FunTFBS

    FunTFBS is a tool used for identifying transcriptional factor binding sites (TFBS) which have transcriptional regulatory functions. According to binding motifs, the sequence affinities of TFs to genome are different among base pairs within binding sites. The mutations in different base pairs have different effects on the binding of TFs and thus the selective constrains are different. So for functional binding sites, the base pairs with higher affinity are more conserved across species than non-functional ones. Hinted by this, FunTFBS was developed to identify functional TFBS based on the correlation between frequencies in binding motifs and conservation scores. Given the genomic position of candidate TFBS, TF binding motif, genome sequence and PhyloP scores, FunTFBS can filter for functional ones based on the correlation between frequencies in binding motif and conservation scores across base pairs.

    The functional TFBS in 63 plant species have been identified in our database, which covered all main branches of green plants.


Work Flow of FunTFBS
  1. Given the genomic position of specific TFBS, the genomic sequence is extracted.
  2. The motif frequencies of each base pair are extracted according to genomic sequence and the binding motif of corresponding TF.
  3. The PhyloP scores in the same region are extracted with the same order with motif frequencies (the PhyloP scores will be reversed for binding in negative strands).
  4. Pearson correlation is calculated between motif base frequencies and the absolute value of PhyloP scores.
  5. If the correlation test is significant (p<=0.05) with correlation score > 0.5, the corresponding binding site will be treated as functional TFBS.

Download
VersionDateSizeLink
1.1.02018-03-25255 kbdownload

Also you can download the pre-release from Github.

git clone https://github.com/gao-lab/FunTFBS.git
Installation

Prerequisite
  1. Perl (5.010 or later)
  2. R (3.0.1 or later)
  3. R package: data.table (1.10.4 or later)

Unpack the tarball

tar -zxvf funTFBS-1.1.0.tar.gz

Then go into the directory and the "funTFBS" file can be run directly.

cd funTFBS-1.1.0/
./funTFBS

Also you can add this path to the PATH environment variable and run it out of directory:

export PATH=$PATH:/the path of this package/funTFBS-1.1.0
funTFBS

General usage
funTFBS -t TFBS.bed -m motifs -f motif-format -p PhyloP -g genome -o output
        -t [TFBS.bed]  the file containing positions of candidate TFBS in bed format.
        -m [motifs]    the file containing binding motifs in specified format.
        -f [format]    the format of bidning motifs.
        -p [PhyloP.bg] the file containing PhyloP scores in bedGraph format.
        -g [genome.fa] the file containing genomic sequence in fasta format.
        -o [output]    the output directory.
        -h             show this help information.

There are four parts of information for FunTFBS as input:

  1. Candidate TFBS in bed format with strand information, For example:
    Chr1	3140	3161	AT3G27010	7.46e-06	+
    Chr1	3148	3162	AT5G08330	8.37e-06	-
    Chr1	3214	3229	AT2G20110	9.93e-06	+
    Chr1	3246	3258	AT1G69120	7.7e-06	+
    Chr1	3296	3311	AT2G32460	5.13e-06	+
    Note: The 4th column of the file will be used as TF ID, which should be matched with TF ID in the motif file.
    This file can be generated by ChIP-seq analysis or motif scanning.

  2. Binding motif of TF
    This file could be one of following formats:
    meme/beeml/chen/jaspar-pfm/jaspar-sites/jaspar-cm/transfac/uniprobe
    For example (meme format):
    MEME version 4.4
    ALPHABET= ACGT
    strands: + -
    Background letter frequencies (from file `../../promoter_background/Ath.bg'):
    A 0.33230 C 0.16770 G 0.16770 T 0.33230
    MOTIF AT1G01060 MP00119
    letter-probability matrix: alength= 4 w= 10 nsites= 599 E= 4.8e-803
    0.854758        0.001669        0.051753        0.091820
    0.000000        0.000000        1.000000        0.000000
    1.000000        0.000000        0.000000        0.000000
    0.000000        0.000000        0.000000        1.000000
    0.996661        0.003339        0.000000        0.000000
    0.000000        0.013356        0.008347        0.978297
    0.005008        0.000000        0.005008        0.989983
    0.085142        0.010017        0.000000        0.904841
    0.287145        0.076795        0.056761        0.579299
    0.188648        0.240401        0.195326        0.375626
    URL
    http://planttfdb.gao-lab.org/tf.php?sp=Ath&did=AT1G01060.1#bind_motif
    Note: The TF ID should be matched with those in TFBS file (the 4th column), otherwise the motif base frequencies will not be extracted successfully.
    This file can be generated by experiments (such as SELEX, PBM or ChIP-seq) or downloaded from TF motif database.

  3. PhyloP score in bedGraph format, for example:
    Chr1    116     117     0.263925266926064
    Chr1    117     118     0.27368586219026
    Chr1    118     119     -0.991320815703633
    Chr1    119     120     0.27368586219026
    Chr1    120     121     0.387503444300272
    Chr1    121     122     0.27368586219026
    Chr1    122     123     0.387503444300272
    Chr1    123     124     -0.822481891189735
    Chr1    124     125     -0.651786473358111
    Chr1    125     126     0.404937218146016
    Note: This file should be sorted by coordinate (sort -k 1,1 -k 2,2n).
    Tip: Due to the PhyloP file may be very large, it is recommended to split it and run FunTFBS for each chromosome.
    This file can be generated by multiple alignments or downloaded from UCSC genome browser.

  4. Genome sequences in fasta format, for example:
    >Chr1
    CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAATCTTTAAATCC
    TACATCCATGAATCCCTAAATACCTAATTCCCTAAACCCGAAACCGGTTTCTCTGGTTGAAAATCATTGT
    GTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTATTGTTGTGTGTAGATTTTTTAAAAATATCA
    TTTGAGGTCAATACAAATCCTATTTCTTGTGGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTC
    ATTTGTTATATTGGATACAAGCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTA
    GGGTTGGTTTATCTCAAGAATCTTATTAATTGTTTGGACTGTTTATGTTTGGACATTTATTGTCATTCTT
    ACTCCTTTGTGGAAATGTTTGTTCTATCAA
    This file can be generated by genome sequencing or downloaded from genome database.

Here is a simple example using the demo files in the package:

funTFBS -t demo/test_TFBS.bed -m demo/Ath.meme -f meme -p demo/test_PhyloP.bed -g demo/Ath_test.fa -o test

Following information will be shown on the screen:

TFBS before filtering: 100
motif format: meme
TFBS after filtering: 20
test/TFBS_filtered.bed

After running it there will be two files generated in the output directory, which are in bed6+ format (9 columns).

TFBS_unfiltered.bed: Total candidate TFBS before filtering.
TFBS_filtered.bed: Functional TFBS after filtering.

The 9 columns of output files:

1 Chromosome The chromosome name
2 Start The starting position of the TFBS
3 End The ending position of the TFBS
4 TF The TF ID
5 Value (kept from input file and not used)
6 Strand The strand information of the TFBS
7 Sequence The genomic sequence of the TFBS
8 Correlation Pearson correlation between motif base frequencies and the absolute value of PhyloP
9 P-value P-value in the correlation test

For example:

Chr1	8649	8666	AT4G18450	3.51e-06	-	ATGGCGGCGAGTGAACA	0.5576	2.003e-02
Chr1	8650	8671	AT1G72360	8.17e-07	+	GTTCACTCGCCGCCATTGCTC	0.5802	5.829e-03
Chr1	8650	8671	AT3G11020	4.61e-06	-	GAGCAATGGCGGCGAGTGAAC	0.5283	1.382e-02
Chr1	8650	8671	AT4G17490	7.99e-07	+	GTTCACTCGCCGCCATTGCTC	0.5882	5.042e-03

Related Resource

1) Source of TF binding motifs

PlantTFDB - Plant Transcription Factor Database
JASPAR - The high-quality transcription factor binding profile database
UniPROBE - Universal PBM Resource for Oligonucleotide Binding Evaluation
CIS-BP - The online library of transcription factors and their DNA binding motifs
TRANSFAC - The database of eukaryotic transcription factors
New PLACE - A Database of Plant Cis-acting Regulatory DNA Elements
PlantPAN - The Plant Promoter Analysis Navigator
PlantCistromeDB - Base-pair Resolution Atlases of the Plant Cistrome and Epicistrome

2) Source of sequence conservation scores

PlantRegMap
UCSC Genome Browser

3) Dataset used to evaluate different methods in Arabidopsis thaliana

Binding sites and regulations for 21 TFs in different levels: Download
High-confidence transcriptional regulatory map: ATRM
Gene ontology slim: TAIR
Gene expression correlation: ATTED-II

4) Functional TFBS and regulations generated by FunTFBS for 63 species

Explore predicted TFBS in genome browser
Download predicted TFBS
Retrive regulations on-line
Download regulations


How To Cite