Lack of biased gene transformation resolve favoring Grams/C nucleotides during the D. melanogaster
The analysis of the distribution of ? along chromosomes at the 100-kb scale reveals a more uniform distribution than that of CO (c) rates, with no reduction near telomeres or centromeres (Figure 5). More than 80% of 100-kb windows show ? within a 2-fold range, a percentage that contrasts with the distribution of CO where only 26.3% of 100-kb windows along chromosomes show c within a 2-fold range of the chromosome average. To test specifically whether the distribution of CO events is more variable across the genome that either GC or the combination of GC and CO events (i.e., number of DSBs), we estimated the coefficient of variation (CV) along chromosomes for each of the three parameters for different window sizes and chromosome arms. In all cases (window size and chromosome arm), the CV for CO is much greater (more than 2-fold) than that for either GC or DSBs (CO+GC), while the CV for DSBs is only marginally greater than that for GC: for 100-kb windows, the average CV per chromosome arm for CO, GC and DSBs is 0.90, 0.37 and 0.38, respectively. Nevertheless, we can also rule out the possibility that the distribution of GC events or DSBs are completely random, with significant heterogeneity along each chromosome (P<0.0001 at all physical scales analyzed, from 100 kb to 10 Mb; see Materials and Methods for details). Not surprisingly due to the excess of GC over CO events, GC is a much better predictor of the total number of DSBs or total recombination events across the genome than CO rates, with semi-partial correlations of 0.96 for GC and 0.38 for CO to explain the overall variance in DSBs (not taking into account the fourth chromosome).
DSB solution requires the formation from heteroduplex sequences (both for CO or GC situations; Profile S1). Such heteroduplex sequences normally incorporate A beneficial(T):C(G) mismatches that are fixed at random or favoring particular nucleotides. Inside Drosophila, there’s no head experimental facts support G+C biased gene conversion process resolve and you may evolutionary analyses possess considering inconsistent efficiency while using the CO rates while the a proxy to possess heteroduplex creation (– however, get a hold of , ). Note although not one to GC situations be much more repeated than simply CO incidents for the Drosophila plus in other bacteria , , , hence GC (?) costs can be more relevant than simply CO (c) pricing whenever investigating the fresh possible effects from heteroduplex resolve.
In certain varieties, gene conversion mismatch fix has been suggested getting biased, favoring Grams and you will C nucleotides – and you can forecasting a positive relationship between recombination costs (sensu regularity from heteroduplex creation) while the Grams+C articles away from noncoding DNA ,
Our data reveal zero relationship off ? which have Grams+C nucleotide constitution in the intergenic sequences (R = +0.036, P>0.20) or introns (R = ?0.041, P>0.16). A similar insufficient organization is seen when G+C nucleotide composition are compared to the c (P>0.25 for intergenic sequences and you can introns). We discover therefore zero proof of gene conversion bias favoring Grams and you will C nucleotides inside the D. melanogaster according to nucleotide composition. The reason why for the majority of one’s early in the day efficiency one inferred gene conversion process prejudice toward G and you will C nucleotides in the Drosophila may be numerous you need to include the usage of sparse CO charts also as unfinished genome annotation. Because the Uniform dating app gene thickness inside D. melanogaster is actually large for the countries which have low-faster CO , , many recently annotated transcribed nations and G+C rich exons , , might have been in the past analyzed as neutral sequences, particularly in such genomic countries which have non-reduced CO.
The fresh motifs away from recombination inside Drosophila
To discover DNA motifs associated with recombination events (CO or GC), we focused on 1,909 CO and 3,701 GC events delimited by five-hundred bp or less (CO500 and GC500, respectively). Our D. melanogaster data reveal many motifs significantly enriched in sequences surrounding recombination events (18 and 10 motifs for CO and GC, respectively) (Figure 6 and Figure 7). Individually, the motifs surrounding CO events (MCO) are present in 6.8 to 43.2% of CO500 sequences, while motifs surrounding GC events (MGC) are present in 7.8 to 27.6% of GC500 sequences. Note that 97.7% of all CO500 sequences contain at least one MCO motif and 85.0% of GC500 sequences contain one or more MGC motif (Figure S4).