Fish Vision – Part 2: Identifying Potential Cis-Regulatory Elements for the Opsin Genes
O’Quin KE, Smith D, Naseer Z, Schulte J, Engel SD, Loh YH, Streelman JT, Boore JL, & Carleton KL (2011). Divergence in cis-regulatory sequences surrounding the opsin gene arrays of African cichlid fishes. BMC evolutionary biology, 11 (1) PMID: 21554730

Last week I introduced a recent paper that I had published in the journal BMC Evolutionary Biology, “Divergence in cis-regulatory sequences surrounding the opsin gene arrays of African cichlid fishes“.  If you read that post, you’ll remember that cis-regulatory sequences are bits of non-coding DNA found near a gene that help turn that gene on or off; opsins are a family of genes expressed (e.g., turned on) in the eye and allow us to see light; and cichlids are a wonderful group of fish that differ in which opsins they express and therefore which colors of light they can see.  The goal of this paper was to identify potential cis-regulatory sequences and mutations that may contribute to the variable opsin gene expression found in African cichlids.  Last week I introduced some of the concepts and background information for this study; this week, I’ll introduce the methods and the first part of our results.

To identify potential cis-regulatory sequences for the opsins, we took a two-pronged approach that relied on some theory from evolutionary biology.  In this post I’ll discuss the first part of this approach, called phylogenetic footprinting.   This method compares two DNA sequences among very different organisms (for example, humans and mice, which last shared a common ancestor 80 million years ago) in search of segments of DNA that are very similar (though not necessarily identical).  Based on common ancestry and decent with modification, we do not expect many parts of the genome to stay very similar over 80 million years in two species as different as humans and mice.  However, some parts of the genome will stay the same, and these parts must therefore be important — they must have some function that is so integral to survival and fitness that the DNA sequence has been forced to stay very similar despite many millions of years of evolution.  Although it is common for the sequence of protein-coding genes (for example, opsins, hemoglobin, or histones) stay conserved over such a time frame, it is less common for non-coding DNA to do so.  We call these potentially important non-coding regions conserved non-coding elements or CNEs.

For our study, we performed phylogenetic footprinting by comparing > 100,000 base pairs of DNA surrounding the opsin genes of the Nile tilapia (Oreochromis niloticus) with the genomic sequences of several model fish species that diverged from each other over 300 million years ago.  Since there were no genome sequences for cichlids when we started this study, we had to make them ourselves.  We sequenced only those parts of the genome that contained the opsins, which were stored within special bacterial cells called bacterial artificial chromosomes.  With these initial sequences and comparisons, we hoped to find CNEs that serve as candidate cis-regulatory sequences for the opsins.

Phylogenetic shadowing of opsin gene arrays across fishes. The opsins are found in three parts of the cichlid genome, shown in panels A, B, and C. The top line of each panel represents the genome sequence of cichlids, while each of the remaining rows in the panel represent the corresponding genome sequence from another fish species. Black dots in each row represent regions of DNA that are highly similar (50 - 100% identical) between cichlids and the other fish species as identified via BLAST. Highlighted in red are regions of non-coding DNA that are highly similar among two or more fishes -- these are our conserved non-coding elements (CNEs) that we use as candidate cis-regulatory elements. Highlighted in green are CNEs that are composed largely of repetitive DNA that we ignored. Highlighted in blue are promoter regions that we analyzed further at the end of the study.

In the figure above, the top line of each panel represents the cichlid genome sequence.  Boxes atop each panel indicate the protein-coding genes found in the cichlid sequence.  So, for example, in Panel A you can see we found three genes, TNPO3, SWS1 (the ultraviolet-sensitive opsin), and CALUA.  The little black squares beneath the top line represent bits of DNA that are very similar between the cichlids and each fish genome examined.  Again focusing on Panel A, if you look beneath the SWS1 opsin gene, you’ll find that all three species have a series of block dots along that region — this means that all three species have very similar SWS1 opsin sequences.  Now, look at the regions highlighted in red.  These are regions of non-coding DNA (e.g., found outside of a protein-coding gene) that still show lots of similarity between cichlids and at least one other fish species.  These are our conserved non-coding elements (CNEs).  The regions in green are also technically CNEs, but we found that these were mostly made up of repetitive sequences that are found throughout the genome and are generally not related to gene function.  I ignored these in our analysis.  In all, we found 20 CNEs that have remained conserved over 100 – 300 million years of fish evolution.  These CNEs now serve as our candidate cis-regulatory elements for opsin gene expression in cichlids.

But how do we know that these bits of DNA actually help turn the opsins on and off?  The short answer is that we don’t, at least not for all of them.  But, happily, some of them do.  Specifically, the CNEs labeled 7a and 7b (boxed in red near the LWS opsin in Panel B) are very similar to two known elements that control gene expression.  CNE 7a encodes a functional RNA that binds to the 3′-UTR of newly expressed genes in order to turn them off.  The RNA encoded by CNE 7a is commonly found in the eyes of fish and other vertebrates.  CNE 7b matches a cis-regulatory sequence in humans that helps turn the LWS (red-sensitive) opsins on and off.  This CNE 7b region is called the LWS opsin Locus Control Region, or LWS-LCR.  A recent (and freely available) study with zebrafish found that this sequence also works the same way in fish.  Thus, there is evidence that at least some of the CNEs we treat as candidate cis-regulatory sequences actually may help turn the opsin genes on and off.  It is therefore possible that the remaining 19 CNEs could also help control cichlid opsin expression, but simply haven’t been identified before.

Functional non-coding elements (CNEs) near the opsin genes of cichlids. CNE 7a matches a functional RNA that regulates gene expression in the eye by binding to the 3'-UTR of newly expressed proteins. CNE 7b matches a cis-regulatory sequences that helps turn on LWS expression in both humans and fish. The box in panel A indicates the functional part of the RNA, while the boxes in Panel B indicate transcription factor binding sites -- those places where proteins that help drive gene expression bind, and which make this sequence functional.

Next week I’ll discuss the second part of our analysis, called phylogenetic shadowing.  This method compares potential cis-regulatory sequences and CNEs among more closely related species to hopes of finding mutations that may explain the altered patterns of opin expression.  See you then.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s