of all sequences assayed (literally ‘one in a million’). While
we have no idea how many of these random sequences were
severely detrimental to the cell because these would quickly
disappear from the culture, one would expect that most
random sequences would have no effect at all.
They make an additional error by assuming that the
random sequences add biological novelty to the cell. There
is, in fact, no evidence for this. The majority of sequences I
The high proportion of sequences
that match the reverse compliment of a
known gene demonstrate that orientation
is unimportant. But functional areas can
include non-genic areas like promotor
regions. Thus, the protein sequence, at least
in most cases, though perhaps all, is also
unimportant.
If these ‘bioactive’ DNA sequences are
not producing functional proteins, they
must be acting on the level of RNA–RNA
or RNA–DNA interactions. The annealing
temperatures of ribonucleic acids depend
on their length and percent identity.
Biological function in this case does not
depend on sequence specificity. Also, the
triple-hydrogen bonding G and C bind
more tightly than the double-hydrogen
bonding A and T, meaning sequences
rich in G and C have a higher melting
temperature (the temperature at which the
two nucleic acids will separate in solution).
The placement of G and C along the strand
also impacts annealing, with terminal
Gs and Cs serving to anchor the strand
more so than internal ones. The skewed
frequencies of A (low) and G (high) seen in
the data are quite interesting in this context.
Why do we not see longer or shorter
‘bioactive’ sequences? First, due to the
sheer number of permutations along
a DNA strand, as the search string gets
longer, the expected number of matches
drops off exponentially. Second, it may
be that the BLAST algorithm is cutting
off less-than-perfect, but still functional,
leading or trailing sequences that are
beneath the detection threshold. Third,
shorter sequences will not have a high enough annealing
temperature to interact directly with the genome.
What we see are the sequences at just the right length.
Their RNA transcripts are long enough ( 20–40 nucleotides)
that they could bind tightly to both RNA and DNA under
physiological conditions (e.g. 37°C). The two RNA ends that
have no match to the surrounding sequence would not anneal,
however. This will affect the annealing of the ‘random’ RNA
strand, but to an unknown extent. The RNAs produced in
their experiment were on the order of 700 nucleotides, only
150 of which were the ‘random’ component. Since these
Test Hits Identity Statistics Species
13
38 bp
84%
Vibrio bivalvicida [pathogenic bacterium]
Vibrio tubiashii [pathogenic bacterium]
(match identical to above)
Aggregatibacter aphrophilus [proteobacterium]
(non-overlapping with above)
23
28–46 bp
80–93%
Babesia equi [protozoan]
Baudoinia panamericana [fungus]
Panthera pardus [leopard] (all overlapping)
33
23–43 bp
81–100%
Numida meleagris [helmeted guineafowl]
Schistosoma mansoni [parasitic flatworm] (overlapping)
Helicoverpa armigera [cotton bollworm] (non-overlapping)
45
23–49 bp
81–100%
Branchiostoma floridae [lancelet]
Asparagus officinalis [asparagus]
Algoriphagus marincola [marine sediment bacterium]
Paraphaeosphaeria sporulosa [fungus]
Oncorhynchus kisutch [coho salmon]
52
32–36 bp
89–91%
Plasmodium berghei [protozoan]
Labrus bergylta [Ballan wrasse]
68
24–47 bp
83–100%
Pseudomonas fluorescens [Gram-negative bacterium]
plasmid pQBR55, gene CEK42535, reverse
Yersinia ruckeri [Gram-negative bacterium]
gene ARZ027031, reverse
Apteryx australis mantelli [North Island (NZ) brown kiwi]
No gene annotations provided
Angiostrongylus costaricensis [parasitic nematode]
No gene annotations provided
Mus musculus [house mouse] ( 3 identical matches)
immunoglobulin heavy chain complex (Igh), Reverse
Mus musculus [house mouse]
a few hundred bp above and below two copies of MER1, a
gene involved in chromosome pairing in yeast
Table 4. BLAST results generated from random sequences. Tests 1–5 used 150-nucleotide
test sequences. Test 6 used a 1,500-nucleotide test sequence. Genomic contexts (if available)
are provided for test 6.