Get ranges for synonymous and non-synonymous nucleotide positions within a codon separately





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I have GRanges object (coordinates of all gene exons); coding_pos defines what is the start position of a codon in a particular exon (1 means that first nucleotide in exon is also the first nt in a codon, and so on).



grTargetGene itself looks like this



> grTargetGene

GRanges object with 11 ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype prev_exons_length coding_pos
<Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
[1] chr2 [148602722, 148602776] + | ENSG00000121989 protein_coding 0 1
[2] chr2 [148653870, 148654077] + | ENSG00000121989 protein_coding 55 2
[3] chr2 [148657027, 148657136] + | ENSG00000121989 protein_coding 263 3
[4] chr2 [148657313, 148657467] + | ENSG00000121989 protein_coding 373 2
[5] chr2 [148672760, 148672903] + | ENSG00000121989 protein_coding 528 1
[6] chr2 [148674852, 148674995] + | ENSG00000121989 protein_coding 672 1
[7] chr2 [148676016, 148676161] + | ENSG00000121989 protein_coding 816 1
[8] chr2 [148677799, 148677913] + | ENSG00000121989 protein_coding 962 3
[9] chr2 [148680542, 148680680] + | ENSG00000121989 protein_coding 1077 1
[10] chr2 [148683600, 148683730] + | ENSG00000121989 protein_coding 1216 2
[11] chr2 [148684649, 148684843] + | ENSG00000121989 protein_coding 1347 1
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths


I am interested in looking at coordinates separately for [1,2] positions in each codon and [3]. In other words, I would like to have 2 different GRanges objects that look approximately like this (here it is only the beginning)



> grTargetGene_Nonsynonym

GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602722, 148602723] + | ENSG00000121989 protein_coding
[2] chr2 [148602725, 148602726] + | ENSG00000121989 protein_coding
[3] chr2 [148602728, 148602729] + | ENSG00000121989 protein_coding
[4] chr2 [148602731, 148602732] + | ENSG00000121989 protein_coding



> grTargetGene_Synonym

GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602724, 148602724] + | ENSG00000121989 protein_coding
[2] chr2 [148602727, 148602727] + | ENSG00000121989 protein_coding
[3] chr2 [148602730, 148602730] + | ENSG00000121989 protein_coding
[4] chr2 [148602733, 148602733] + | ENSG00000121989 protein_coding


I was planning to do it through the loop that creates a set of granges for each exon according to coding_pos and strand, but I suspect there is a smarter way or maybe even a function that can do it already, but I couldn't find a simple solution.



Important: I do not need the sequence itself (the easiest way, in that case, would be to extract DNA first and then work with the sequence), but instead of doing this I only need the positions which I will use to overlap with some features.



> library("GenomicRanges")
> dput(grTargetGene)

new("GRanges"
, seqnames = new("Rle"
, values = structure(1L, .Label = "chr2", class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, ranges = new("IRanges"
, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L,
148674852L)
, width = c(55L, 208L, 110L, 155L, 144L, 144L)
, NAMES = NULL
, elementType = "integer"
, elementMetadata = NULL
, metadata = list()
)
, strand = new("Rle"
, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, elementMetadata = new("DataFrame"
, rownames = NULL
, nrows = 6L
, listData = structure(list(ensembl_ids =
c("ENSG00000121989","ENSG00000121989",
"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"
), gene_biotype = c("protein_coding", "protein_coding", "protein_coding",
"protein_coding", "protein_coding", "protein_coding"), cds_length =
c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L,
148602086L, 148602086L, 148602086L, 148602086L, 148602086L),
gene_end_position = c(148688393L, 148688393L, 148688393L,
148688393L, 148688393L, 148688393L), prev_exons_length = c(0,
55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =
c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",
"gene_end_position",
"prev_exons_length", "coding_pos"))
, elementType = "ANY"
, elementMetadata = NULL
, metadata = list()
)
, seqinfo = new("Seqinfo"
, seqnames = "chr2"
, seqlengths = NA_integer_
, is_circular = NA
, genome = NA_character_
)
, metadata = list()
)









share|improve this question

























  • Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

    – Maurits Evers
    Nov 22 '18 at 10:43











  • Full grTargetGene is shown above - not sure what exactly you need.

    – lizaveta
    Nov 22 '18 at 11:22











  • It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

    – Maurits Evers
    Nov 22 '18 at 11:59











  • @MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

    – lizaveta
    Nov 22 '18 at 13:06











  • Thanks for providing sample data, I've posted a solution below.

    – Maurits Evers
    Nov 24 '18 at 8:13


















1















I have GRanges object (coordinates of all gene exons); coding_pos defines what is the start position of a codon in a particular exon (1 means that first nucleotide in exon is also the first nt in a codon, and so on).



grTargetGene itself looks like this



> grTargetGene

GRanges object with 11 ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype prev_exons_length coding_pos
<Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
[1] chr2 [148602722, 148602776] + | ENSG00000121989 protein_coding 0 1
[2] chr2 [148653870, 148654077] + | ENSG00000121989 protein_coding 55 2
[3] chr2 [148657027, 148657136] + | ENSG00000121989 protein_coding 263 3
[4] chr2 [148657313, 148657467] + | ENSG00000121989 protein_coding 373 2
[5] chr2 [148672760, 148672903] + | ENSG00000121989 protein_coding 528 1
[6] chr2 [148674852, 148674995] + | ENSG00000121989 protein_coding 672 1
[7] chr2 [148676016, 148676161] + | ENSG00000121989 protein_coding 816 1
[8] chr2 [148677799, 148677913] + | ENSG00000121989 protein_coding 962 3
[9] chr2 [148680542, 148680680] + | ENSG00000121989 protein_coding 1077 1
[10] chr2 [148683600, 148683730] + | ENSG00000121989 protein_coding 1216 2
[11] chr2 [148684649, 148684843] + | ENSG00000121989 protein_coding 1347 1
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths


I am interested in looking at coordinates separately for [1,2] positions in each codon and [3]. In other words, I would like to have 2 different GRanges objects that look approximately like this (here it is only the beginning)



> grTargetGene_Nonsynonym

GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602722, 148602723] + | ENSG00000121989 protein_coding
[2] chr2 [148602725, 148602726] + | ENSG00000121989 protein_coding
[3] chr2 [148602728, 148602729] + | ENSG00000121989 protein_coding
[4] chr2 [148602731, 148602732] + | ENSG00000121989 protein_coding



> grTargetGene_Synonym

GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602724, 148602724] + | ENSG00000121989 protein_coding
[2] chr2 [148602727, 148602727] + | ENSG00000121989 protein_coding
[3] chr2 [148602730, 148602730] + | ENSG00000121989 protein_coding
[4] chr2 [148602733, 148602733] + | ENSG00000121989 protein_coding


I was planning to do it through the loop that creates a set of granges for each exon according to coding_pos and strand, but I suspect there is a smarter way or maybe even a function that can do it already, but I couldn't find a simple solution.



Important: I do not need the sequence itself (the easiest way, in that case, would be to extract DNA first and then work with the sequence), but instead of doing this I only need the positions which I will use to overlap with some features.



> library("GenomicRanges")
> dput(grTargetGene)

new("GRanges"
, seqnames = new("Rle"
, values = structure(1L, .Label = "chr2", class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, ranges = new("IRanges"
, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L,
148674852L)
, width = c(55L, 208L, 110L, 155L, 144L, 144L)
, NAMES = NULL
, elementType = "integer"
, elementMetadata = NULL
, metadata = list()
)
, strand = new("Rle"
, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, elementMetadata = new("DataFrame"
, rownames = NULL
, nrows = 6L
, listData = structure(list(ensembl_ids =
c("ENSG00000121989","ENSG00000121989",
"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"
), gene_biotype = c("protein_coding", "protein_coding", "protein_coding",
"protein_coding", "protein_coding", "protein_coding"), cds_length =
c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L,
148602086L, 148602086L, 148602086L, 148602086L, 148602086L),
gene_end_position = c(148688393L, 148688393L, 148688393L,
148688393L, 148688393L, 148688393L), prev_exons_length = c(0,
55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =
c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",
"gene_end_position",
"prev_exons_length", "coding_pos"))
, elementType = "ANY"
, elementMetadata = NULL
, metadata = list()
)
, seqinfo = new("Seqinfo"
, seqnames = "chr2"
, seqlengths = NA_integer_
, is_circular = NA
, genome = NA_character_
)
, metadata = list()
)









share|improve this question

























  • Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

    – Maurits Evers
    Nov 22 '18 at 10:43











  • Full grTargetGene is shown above - not sure what exactly you need.

    – lizaveta
    Nov 22 '18 at 11:22











  • It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

    – Maurits Evers
    Nov 22 '18 at 11:59











  • @MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

    – lizaveta
    Nov 22 '18 at 13:06











  • Thanks for providing sample data, I've posted a solution below.

    – Maurits Evers
    Nov 24 '18 at 8:13














1












1








1








I have GRanges object (coordinates of all gene exons); coding_pos defines what is the start position of a codon in a particular exon (1 means that first nucleotide in exon is also the first nt in a codon, and so on).



grTargetGene itself looks like this



> grTargetGene

GRanges object with 11 ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype prev_exons_length coding_pos
<Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
[1] chr2 [148602722, 148602776] + | ENSG00000121989 protein_coding 0 1
[2] chr2 [148653870, 148654077] + | ENSG00000121989 protein_coding 55 2
[3] chr2 [148657027, 148657136] + | ENSG00000121989 protein_coding 263 3
[4] chr2 [148657313, 148657467] + | ENSG00000121989 protein_coding 373 2
[5] chr2 [148672760, 148672903] + | ENSG00000121989 protein_coding 528 1
[6] chr2 [148674852, 148674995] + | ENSG00000121989 protein_coding 672 1
[7] chr2 [148676016, 148676161] + | ENSG00000121989 protein_coding 816 1
[8] chr2 [148677799, 148677913] + | ENSG00000121989 protein_coding 962 3
[9] chr2 [148680542, 148680680] + | ENSG00000121989 protein_coding 1077 1
[10] chr2 [148683600, 148683730] + | ENSG00000121989 protein_coding 1216 2
[11] chr2 [148684649, 148684843] + | ENSG00000121989 protein_coding 1347 1
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths


I am interested in looking at coordinates separately for [1,2] positions in each codon and [3]. In other words, I would like to have 2 different GRanges objects that look approximately like this (here it is only the beginning)



> grTargetGene_Nonsynonym

GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602722, 148602723] + | ENSG00000121989 protein_coding
[2] chr2 [148602725, 148602726] + | ENSG00000121989 protein_coding
[3] chr2 [148602728, 148602729] + | ENSG00000121989 protein_coding
[4] chr2 [148602731, 148602732] + | ENSG00000121989 protein_coding



> grTargetGene_Synonym

GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602724, 148602724] + | ENSG00000121989 protein_coding
[2] chr2 [148602727, 148602727] + | ENSG00000121989 protein_coding
[3] chr2 [148602730, 148602730] + | ENSG00000121989 protein_coding
[4] chr2 [148602733, 148602733] + | ENSG00000121989 protein_coding


I was planning to do it through the loop that creates a set of granges for each exon according to coding_pos and strand, but I suspect there is a smarter way or maybe even a function that can do it already, but I couldn't find a simple solution.



Important: I do not need the sequence itself (the easiest way, in that case, would be to extract DNA first and then work with the sequence), but instead of doing this I only need the positions which I will use to overlap with some features.



> library("GenomicRanges")
> dput(grTargetGene)

new("GRanges"
, seqnames = new("Rle"
, values = structure(1L, .Label = "chr2", class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, ranges = new("IRanges"
, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L,
148674852L)
, width = c(55L, 208L, 110L, 155L, 144L, 144L)
, NAMES = NULL
, elementType = "integer"
, elementMetadata = NULL
, metadata = list()
)
, strand = new("Rle"
, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, elementMetadata = new("DataFrame"
, rownames = NULL
, nrows = 6L
, listData = structure(list(ensembl_ids =
c("ENSG00000121989","ENSG00000121989",
"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"
), gene_biotype = c("protein_coding", "protein_coding", "protein_coding",
"protein_coding", "protein_coding", "protein_coding"), cds_length =
c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L,
148602086L, 148602086L, 148602086L, 148602086L, 148602086L),
gene_end_position = c(148688393L, 148688393L, 148688393L,
148688393L, 148688393L, 148688393L), prev_exons_length = c(0,
55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =
c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",
"gene_end_position",
"prev_exons_length", "coding_pos"))
, elementType = "ANY"
, elementMetadata = NULL
, metadata = list()
)
, seqinfo = new("Seqinfo"
, seqnames = "chr2"
, seqlengths = NA_integer_
, is_circular = NA
, genome = NA_character_
)
, metadata = list()
)









share|improve this question
















I have GRanges object (coordinates of all gene exons); coding_pos defines what is the start position of a codon in a particular exon (1 means that first nucleotide in exon is also the first nt in a codon, and so on).



grTargetGene itself looks like this



> grTargetGene

GRanges object with 11 ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype prev_exons_length coding_pos
<Rle> <IRanges> <Rle> | <character> <character> <numeric> <numeric>
[1] chr2 [148602722, 148602776] + | ENSG00000121989 protein_coding 0 1
[2] chr2 [148653870, 148654077] + | ENSG00000121989 protein_coding 55 2
[3] chr2 [148657027, 148657136] + | ENSG00000121989 protein_coding 263 3
[4] chr2 [148657313, 148657467] + | ENSG00000121989 protein_coding 373 2
[5] chr2 [148672760, 148672903] + | ENSG00000121989 protein_coding 528 1
[6] chr2 [148674852, 148674995] + | ENSG00000121989 protein_coding 672 1
[7] chr2 [148676016, 148676161] + | ENSG00000121989 protein_coding 816 1
[8] chr2 [148677799, 148677913] + | ENSG00000121989 protein_coding 962 3
[9] chr2 [148680542, 148680680] + | ENSG00000121989 protein_coding 1077 1
[10] chr2 [148683600, 148683730] + | ENSG00000121989 protein_coding 1216 2
[11] chr2 [148684649, 148684843] + | ENSG00000121989 protein_coding 1347 1
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths


I am interested in looking at coordinates separately for [1,2] positions in each codon and [3]. In other words, I would like to have 2 different GRanges objects that look approximately like this (here it is only the beginning)



> grTargetGene_Nonsynonym

GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602722, 148602723] + | ENSG00000121989 protein_coding
[2] chr2 [148602725, 148602726] + | ENSG00000121989 protein_coding
[3] chr2 [148602728, 148602729] + | ENSG00000121989 protein_coding
[4] chr2 [148602731, 148602732] + | ENSG00000121989 protein_coding



> grTargetGene_Synonym

GRanges object with X ranges and 7 metadata columns:
seqnames ranges strand | ensembl_ids gene_biotype
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 [148602724, 148602724] + | ENSG00000121989 protein_coding
[2] chr2 [148602727, 148602727] + | ENSG00000121989 protein_coding
[3] chr2 [148602730, 148602730] + | ENSG00000121989 protein_coding
[4] chr2 [148602733, 148602733] + | ENSG00000121989 protein_coding


I was planning to do it through the loop that creates a set of granges for each exon according to coding_pos and strand, but I suspect there is a smarter way or maybe even a function that can do it already, but I couldn't find a simple solution.



Important: I do not need the sequence itself (the easiest way, in that case, would be to extract DNA first and then work with the sequence), but instead of doing this I only need the positions which I will use to overlap with some features.



> library("GenomicRanges")
> dput(grTargetGene)

new("GRanges"
, seqnames = new("Rle"
, values = structure(1L, .Label = "chr2", class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, ranges = new("IRanges"
, start = c(148602722L, 148653870L, 148657027L, 148657313L, 148672760L,
148674852L)
, width = c(55L, 208L, 110L, 155L, 144L, 144L)
, NAMES = NULL
, elementType = "integer"
, elementMetadata = NULL
, metadata = list()
)
, strand = new("Rle"
, values = structure(1L, .Label = c("+", "-", "*"), class = "factor")
, lengths = 6L
, elementMetadata = NULL
, metadata = list()
)
, elementMetadata = new("DataFrame"
, rownames = NULL
, nrows = 6L
, listData = structure(list(ensembl_ids =
c("ENSG00000121989","ENSG00000121989",
"ENSG00000121989", "ENSG00000121989", "ENSG00000121989", "ENSG00000121989"
), gene_biotype = c("protein_coding", "protein_coding", "protein_coding",
"protein_coding", "protein_coding", "protein_coding"), cds_length =
c(1542,1542, 1542, 1542, 1542, 1542), gene_start_position = c(148602086L,
148602086L, 148602086L, 148602086L, 148602086L, 148602086L),
gene_end_position = c(148688393L, 148688393L, 148688393L,
148688393L, 148688393L, 148688393L), prev_exons_length = c(0,
55, 263, 373, 528, 672), coding_pos = c(1, 2, 3, 2, 1, 1)), .Names =
c("ensembl_ids", "gene_biotype", "cds_length", "gene_start_position",
"gene_end_position",
"prev_exons_length", "coding_pos"))
, elementType = "ANY"
, elementMetadata = NULL
, metadata = list()
)
, seqinfo = new("Seqinfo"
, seqnames = "chr2"
, seqlengths = NA_integer_
, is_circular = NA
, genome = NA_character_
)
, metadata = list()
)






r bioconductor genomicranges






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 10:06









zx8754

30.7k766105




30.7k766105










asked Nov 22 '18 at 10:23









lizavetalizaveta

657




657













  • Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

    – Maurits Evers
    Nov 22 '18 at 10:43











  • Full grTargetGene is shown above - not sure what exactly you need.

    – lizaveta
    Nov 22 '18 at 11:22











  • It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

    – Maurits Evers
    Nov 22 '18 at 11:59











  • @MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

    – lizaveta
    Nov 22 '18 at 13:06











  • Thanks for providing sample data, I've posted a solution below.

    – Maurits Evers
    Nov 24 '18 at 8:13



















  • Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

    – Maurits Evers
    Nov 22 '18 at 10:43











  • Full grTargetGene is shown above - not sure what exactly you need.

    – lizaveta
    Nov 22 '18 at 11:22











  • It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

    – Maurits Evers
    Nov 22 '18 at 11:59











  • @MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

    – lizaveta
    Nov 22 '18 at 13:06











  • Thanks for providing sample data, I've posted a solution below.

    – Maurits Evers
    Nov 24 '18 at 8:13

















Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

– Maurits Evers
Nov 22 '18 at 10:43





Can you provide some minimal & reproducible sample data? E.g. use dput to include the output of part of grTargetGene.

– Maurits Evers
Nov 22 '18 at 10:43













Full grTargetGene is shown above - not sure what exactly you need.

– lizaveta
Nov 22 '18 at 11:22





Full grTargetGene is shown above - not sure what exactly you need.

– lizaveta
Nov 22 '18 at 11:22













It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

– Maurits Evers
Nov 22 '18 at 11:59





It's much easier to play around with your your data if you use dput to share sample data. This is even more true if make use of non-base R objects such as GRanges. So please review how to provide a minimal reproducible example/attempt and then edit your post accordingly.

– Maurits Evers
Nov 22 '18 at 11:59













@MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

– lizaveta
Nov 22 '18 at 13:06





@MauritsEvers, thank you for the explanation. I changed the question, hope it helps.

– lizaveta
Nov 22 '18 at 13:06













Thanks for providing sample data, I've posted a solution below.

– Maurits Evers
Nov 24 '18 at 8:13





Thanks for providing sample data, I've posted a solution below.

– Maurits Evers
Nov 24 '18 at 8:13












2 Answers
2






active

oldest

votes


















2














How about the following:



grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {
ranges(grTargetGene) <- IRanges(
start = start(grTargetGene) + x[1] - 1,
end = start(grTargetGene) + x[2] - 1)
return(grTargetGene) })
grl
#$Nonsym
#GRanges object with 6 ranges and 7 metadata columns:
# seqnames ranges strand | ensembl_ids gene_biotype
# <Rle> <IRanges> <Rle> | <character> <character>
# [1] chr2 148602722-148602723 + | ENSG00000121989 protein_coding
# [2] chr2 148653870-148653871 + | ENSG00000121989 protein_coding
# [3] chr2 148657027-148657028 + | ENSG00000121989 protein_coding
# [4] chr2 148657313-148657314 + | ENSG00000121989 protein_coding
# [5] chr2 148672760-148672761 + | ENSG00000121989 protein_coding
# [6] chr2 148674852-148674853 + | ENSG00000121989 protein_coding
# cds_length gene_start_position gene_end_position prev_exons_length
# <numeric> <integer> <integer> <numeric>
# [1] 1542 148602086 148688393 0
# [2] 1542 148602086 148688393 55
# [3] 1542 148602086 148688393 263
# [4] 1542 148602086 148688393 373
# [5] 1542 148602086 148688393 528
# [6] 1542 148602086 148688393 672
# coding_pos
# <numeric>
# [1] 1
# [2] 2
# [3] 3
# [4] 2
# [5] 1
# [6] 1
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths
#
#$Sym
#GRanges object with 6 ranges and 7 metadata columns:
# seqnames ranges strand | ensembl_ids gene_biotype cds_length
# <Rle> <IRanges> <Rle> | <character> <character> <numeric>
# [1] chr2 148602724 + | ENSG00000121989 protein_coding 1542
# [2] chr2 148653872 + | ENSG00000121989 protein_coding 1542
# [3] chr2 148657029 + | ENSG00000121989 protein_coding 1542
# [4] chr2 148657315 + | ENSG00000121989 protein_coding 1542
# [5] chr2 148672762 + | ENSG00000121989 protein_coding 1542
# [6] chr2 148674854 + | ENSG00000121989 protein_coding 1542
# gene_start_position gene_end_position prev_exons_length coding_pos
# <integer> <integer> <numeric> <numeric>
# [1] 148602086 148688393 0 1
# [2] 148602086 148688393 55 2
# [3] 148602086 148688393 263 3
# [4] 148602086 148688393 373 2
# [5] 148602086 148688393 528 1
# [6] 148602086 148688393 672 1
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths


grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.






share|improve this answer
























  • If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

    – lizaveta
    Nov 30 '18 at 13:48











  • @lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

    – Maurits Evers
    Dec 1 '18 at 7:53













  • by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

    – lizaveta
    Dec 7 '18 at 14:46



















-1














I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)



CodonPosition_separation = function(grTargetGene) {
grTargetGene = sort(grTargetGene)
grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])
if (length(grTargetGene) >1) {
for (l in 2:length(grTargetGene)) {
grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]
}
}
grTargetGene$coding_pos = grTargetGene$prev_exons_length%%3+1
grTargetGene_N = GRanges()
grTargetGene_S = GRanges()
for (l in 1:length(grTargetGene)) {
for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}
if (as.character(strand(grTargetGene)[1]) =="+"){
start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]
end_ns = end(grTargetGene[l])
if (start_ns <=end_ns) {
start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
}
start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]
end_s = end(grTargetGene[l])
if (start_s <=end_s) {
start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
end_syn = start_syn
}

} else {
start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]
end_ns = start(grTargetGene[l])
if (start_ns >=end_ns) {
start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
}
start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]
end_s = start(grTargetGene[l])
if (start_ns >=end_ns) {
start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
end_syn = start_syn
}
}
if (exists("start_nonsyn")) {
length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)
gr_nonsyn = GRanges(
seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),
strand = rep(strand(grTargetGene[l]), length_nonsyn),
ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))
)
gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])
grTargetGene_N = append(grTargetGene_N, gr_nonsyn)
}
if (exists("start_syn")) {
length_syn = length(start_syn)
gr_syn = GRanges(
seqnames = rep(seqnames(grTargetGene[l]), length_syn),
strand = rep(strand(grTargetGene[l]), length_syn),
ranges = IRanges(start = start_syn, end = end_syn)
)
gr_syn = intersect(gr_syn,grTargetGene[l])
grTargetGene_S = append(grTargetGene_S, gr_syn)
}
}
return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))
}


It works nicely:



> CodonPosition_separation(grTargetGene)
$grTargetGene_S
GRanges object with 514 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 [148602724, 148602724] +
[2] chr2 [148602727, 148602727] +
[3] chr2 [148602730, 148602730] +
[4] chr2 [148602733, 148602733] +
[5] chr2 [148602736, 148602736] +
... ... ... ...
[510] chr2 [148684831, 148684831] +
[511] chr2 [148684834, 148684834] +
[512] chr2 [148684837, 148684837] +
[513] chr2 [148684840, 148684840] +
[514] chr2 [148684843, 148684843] +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

$grTargetGene_N
GRanges object with 517 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 [148602722, 148602723] +
[2] chr2 [148602725, 148602726] +
[3] chr2 [148602728, 148602729] +
[4] chr2 [148602731, 148602732] +
[5] chr2 [148602734, 148602735] +
... ... ... ...
[513] chr2 [148684829, 148684830] +
[514] chr2 [148684832, 148684833] +
[515] chr2 [148684835, 148684836] +
[516] chr2 [148684838, 148684839] +
[517] chr2 [148684841, 148684842] +
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths





share|improve this answer
























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428778%2fget-ranges-for-synonymous-and-non-synonymous-nucleotide-positions-within-a-codon%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    How about the following:



    grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {
    ranges(grTargetGene) <- IRanges(
    start = start(grTargetGene) + x[1] - 1,
    end = start(grTargetGene) + x[2] - 1)
    return(grTargetGene) })
    grl
    #$Nonsym
    #GRanges object with 6 ranges and 7 metadata columns:
    # seqnames ranges strand | ensembl_ids gene_biotype
    # <Rle> <IRanges> <Rle> | <character> <character>
    # [1] chr2 148602722-148602723 + | ENSG00000121989 protein_coding
    # [2] chr2 148653870-148653871 + | ENSG00000121989 protein_coding
    # [3] chr2 148657027-148657028 + | ENSG00000121989 protein_coding
    # [4] chr2 148657313-148657314 + | ENSG00000121989 protein_coding
    # [5] chr2 148672760-148672761 + | ENSG00000121989 protein_coding
    # [6] chr2 148674852-148674853 + | ENSG00000121989 protein_coding
    # cds_length gene_start_position gene_end_position prev_exons_length
    # <numeric> <integer> <integer> <numeric>
    # [1] 1542 148602086 148688393 0
    # [2] 1542 148602086 148688393 55
    # [3] 1542 148602086 148688393 263
    # [4] 1542 148602086 148688393 373
    # [5] 1542 148602086 148688393 528
    # [6] 1542 148602086 148688393 672
    # coding_pos
    # <numeric>
    # [1] 1
    # [2] 2
    # [3] 3
    # [4] 2
    # [5] 1
    # [6] 1
    # -------
    # seqinfo: 1 sequence from an unspecified genome; no seqlengths
    #
    #$Sym
    #GRanges object with 6 ranges and 7 metadata columns:
    # seqnames ranges strand | ensembl_ids gene_biotype cds_length
    # <Rle> <IRanges> <Rle> | <character> <character> <numeric>
    # [1] chr2 148602724 + | ENSG00000121989 protein_coding 1542
    # [2] chr2 148653872 + | ENSG00000121989 protein_coding 1542
    # [3] chr2 148657029 + | ENSG00000121989 protein_coding 1542
    # [4] chr2 148657315 + | ENSG00000121989 protein_coding 1542
    # [5] chr2 148672762 + | ENSG00000121989 protein_coding 1542
    # [6] chr2 148674854 + | ENSG00000121989 protein_coding 1542
    # gene_start_position gene_end_position prev_exons_length coding_pos
    # <integer> <integer> <numeric> <numeric>
    # [1] 148602086 148688393 0 1
    # [2] 148602086 148688393 55 2
    # [3] 148602086 148688393 263 3
    # [4] 148602086 148688393 373 2
    # [5] 148602086 148688393 528 1
    # [6] 148602086 148688393 672 1
    # -------
    # seqinfo: 1 sequence from an unspecified genome; no seqlengths


    grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.






    share|improve this answer
























    • If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

      – lizaveta
      Nov 30 '18 at 13:48











    • @lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

      – Maurits Evers
      Dec 1 '18 at 7:53













    • by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

      – lizaveta
      Dec 7 '18 at 14:46
















    2














    How about the following:



    grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {
    ranges(grTargetGene) <- IRanges(
    start = start(grTargetGene) + x[1] - 1,
    end = start(grTargetGene) + x[2] - 1)
    return(grTargetGene) })
    grl
    #$Nonsym
    #GRanges object with 6 ranges and 7 metadata columns:
    # seqnames ranges strand | ensembl_ids gene_biotype
    # <Rle> <IRanges> <Rle> | <character> <character>
    # [1] chr2 148602722-148602723 + | ENSG00000121989 protein_coding
    # [2] chr2 148653870-148653871 + | ENSG00000121989 protein_coding
    # [3] chr2 148657027-148657028 + | ENSG00000121989 protein_coding
    # [4] chr2 148657313-148657314 + | ENSG00000121989 protein_coding
    # [5] chr2 148672760-148672761 + | ENSG00000121989 protein_coding
    # [6] chr2 148674852-148674853 + | ENSG00000121989 protein_coding
    # cds_length gene_start_position gene_end_position prev_exons_length
    # <numeric> <integer> <integer> <numeric>
    # [1] 1542 148602086 148688393 0
    # [2] 1542 148602086 148688393 55
    # [3] 1542 148602086 148688393 263
    # [4] 1542 148602086 148688393 373
    # [5] 1542 148602086 148688393 528
    # [6] 1542 148602086 148688393 672
    # coding_pos
    # <numeric>
    # [1] 1
    # [2] 2
    # [3] 3
    # [4] 2
    # [5] 1
    # [6] 1
    # -------
    # seqinfo: 1 sequence from an unspecified genome; no seqlengths
    #
    #$Sym
    #GRanges object with 6 ranges and 7 metadata columns:
    # seqnames ranges strand | ensembl_ids gene_biotype cds_length
    # <Rle> <IRanges> <Rle> | <character> <character> <numeric>
    # [1] chr2 148602724 + | ENSG00000121989 protein_coding 1542
    # [2] chr2 148653872 + | ENSG00000121989 protein_coding 1542
    # [3] chr2 148657029 + | ENSG00000121989 protein_coding 1542
    # [4] chr2 148657315 + | ENSG00000121989 protein_coding 1542
    # [5] chr2 148672762 + | ENSG00000121989 protein_coding 1542
    # [6] chr2 148674854 + | ENSG00000121989 protein_coding 1542
    # gene_start_position gene_end_position prev_exons_length coding_pos
    # <integer> <integer> <numeric> <numeric>
    # [1] 148602086 148688393 0 1
    # [2] 148602086 148688393 55 2
    # [3] 148602086 148688393 263 3
    # [4] 148602086 148688393 373 2
    # [5] 148602086 148688393 528 1
    # [6] 148602086 148688393 672 1
    # -------
    # seqinfo: 1 sequence from an unspecified genome; no seqlengths


    grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.






    share|improve this answer
























    • If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

      – lizaveta
      Nov 30 '18 at 13:48











    • @lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

      – Maurits Evers
      Dec 1 '18 at 7:53













    • by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

      – lizaveta
      Dec 7 '18 at 14:46














    2












    2








    2







    How about the following:



    grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {
    ranges(grTargetGene) <- IRanges(
    start = start(grTargetGene) + x[1] - 1,
    end = start(grTargetGene) + x[2] - 1)
    return(grTargetGene) })
    grl
    #$Nonsym
    #GRanges object with 6 ranges and 7 metadata columns:
    # seqnames ranges strand | ensembl_ids gene_biotype
    # <Rle> <IRanges> <Rle> | <character> <character>
    # [1] chr2 148602722-148602723 + | ENSG00000121989 protein_coding
    # [2] chr2 148653870-148653871 + | ENSG00000121989 protein_coding
    # [3] chr2 148657027-148657028 + | ENSG00000121989 protein_coding
    # [4] chr2 148657313-148657314 + | ENSG00000121989 protein_coding
    # [5] chr2 148672760-148672761 + | ENSG00000121989 protein_coding
    # [6] chr2 148674852-148674853 + | ENSG00000121989 protein_coding
    # cds_length gene_start_position gene_end_position prev_exons_length
    # <numeric> <integer> <integer> <numeric>
    # [1] 1542 148602086 148688393 0
    # [2] 1542 148602086 148688393 55
    # [3] 1542 148602086 148688393 263
    # [4] 1542 148602086 148688393 373
    # [5] 1542 148602086 148688393 528
    # [6] 1542 148602086 148688393 672
    # coding_pos
    # <numeric>
    # [1] 1
    # [2] 2
    # [3] 3
    # [4] 2
    # [5] 1
    # [6] 1
    # -------
    # seqinfo: 1 sequence from an unspecified genome; no seqlengths
    #
    #$Sym
    #GRanges object with 6 ranges and 7 metadata columns:
    # seqnames ranges strand | ensembl_ids gene_biotype cds_length
    # <Rle> <IRanges> <Rle> | <character> <character> <numeric>
    # [1] chr2 148602724 + | ENSG00000121989 protein_coding 1542
    # [2] chr2 148653872 + | ENSG00000121989 protein_coding 1542
    # [3] chr2 148657029 + | ENSG00000121989 protein_coding 1542
    # [4] chr2 148657315 + | ENSG00000121989 protein_coding 1542
    # [5] chr2 148672762 + | ENSG00000121989 protein_coding 1542
    # [6] chr2 148674854 + | ENSG00000121989 protein_coding 1542
    # gene_start_position gene_end_position prev_exons_length coding_pos
    # <integer> <integer> <numeric> <numeric>
    # [1] 148602086 148688393 0 1
    # [2] 148602086 148688393 55 2
    # [3] 148602086 148688393 263 3
    # [4] 148602086 148688393 373 2
    # [5] 148602086 148688393 528 1
    # [6] 148602086 148688393 672 1
    # -------
    # seqinfo: 1 sequence from an unspecified genome; no seqlengths


    grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.






    share|improve this answer













    How about the following:



    grl <- lapply(list(Nonsym = c(1, 2), Sym = c(3, 3)), function(x) {
    ranges(grTargetGene) <- IRanges(
    start = start(grTargetGene) + x[1] - 1,
    end = start(grTargetGene) + x[2] - 1)
    return(grTargetGene) })
    grl
    #$Nonsym
    #GRanges object with 6 ranges and 7 metadata columns:
    # seqnames ranges strand | ensembl_ids gene_biotype
    # <Rle> <IRanges> <Rle> | <character> <character>
    # [1] chr2 148602722-148602723 + | ENSG00000121989 protein_coding
    # [2] chr2 148653870-148653871 + | ENSG00000121989 protein_coding
    # [3] chr2 148657027-148657028 + | ENSG00000121989 protein_coding
    # [4] chr2 148657313-148657314 + | ENSG00000121989 protein_coding
    # [5] chr2 148672760-148672761 + | ENSG00000121989 protein_coding
    # [6] chr2 148674852-148674853 + | ENSG00000121989 protein_coding
    # cds_length gene_start_position gene_end_position prev_exons_length
    # <numeric> <integer> <integer> <numeric>
    # [1] 1542 148602086 148688393 0
    # [2] 1542 148602086 148688393 55
    # [3] 1542 148602086 148688393 263
    # [4] 1542 148602086 148688393 373
    # [5] 1542 148602086 148688393 528
    # [6] 1542 148602086 148688393 672
    # coding_pos
    # <numeric>
    # [1] 1
    # [2] 2
    # [3] 3
    # [4] 2
    # [5] 1
    # [6] 1
    # -------
    # seqinfo: 1 sequence from an unspecified genome; no seqlengths
    #
    #$Sym
    #GRanges object with 6 ranges and 7 metadata columns:
    # seqnames ranges strand | ensembl_ids gene_biotype cds_length
    # <Rle> <IRanges> <Rle> | <character> <character> <numeric>
    # [1] chr2 148602724 + | ENSG00000121989 protein_coding 1542
    # [2] chr2 148653872 + | ENSG00000121989 protein_coding 1542
    # [3] chr2 148657029 + | ENSG00000121989 protein_coding 1542
    # [4] chr2 148657315 + | ENSG00000121989 protein_coding 1542
    # [5] chr2 148672762 + | ENSG00000121989 protein_coding 1542
    # [6] chr2 148674854 + | ENSG00000121989 protein_coding 1542
    # gene_start_position gene_end_position prev_exons_length coding_pos
    # <integer> <integer> <numeric> <numeric>
    # [1] 148602086 148688393 0 1
    # [2] 148602086 148688393 55 2
    # [3] 148602086 148688393 263 3
    # [4] 148602086 148688393 373 2
    # [5] 148602086 148688393 528 1
    # [6] 148602086 148688393 672 1
    # -------
    # seqinfo: 1 sequence from an unspecified genome; no seqlengths


    grl contains a list of two GRanges, one with ranges based on positions 1 and 2, and the other with ranges based on position 3.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 23 '18 at 1:56









    Maurits EversMaurits Evers

    30.5k41637




    30.5k41637













    • If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

      – lizaveta
      Nov 30 '18 at 13:48











    • @lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

      – Maurits Evers
      Dec 1 '18 at 7:53













    • by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

      – lizaveta
      Dec 7 '18 at 14:46



















    • If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

      – lizaveta
      Nov 30 '18 at 13:48











    • @lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

      – Maurits Evers
      Dec 1 '18 at 7:53













    • by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

      – lizaveta
      Dec 7 '18 at 14:46

















    If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

    – lizaveta
    Nov 30 '18 at 13:48





    If I understand correctly, it assumes that each element of grTarget starts with Nonsym and this is only true for a very first exon. I posted my current solution below; it also takes into account a chain that gene is encoded on.

    – lizaveta
    Nov 30 '18 at 13:48













    @lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

    – Maurits Evers
    Dec 1 '18 at 7:53







    @lizaveta I’m sorry but I don’t understand a word you said. What’s a chain of a gene? Do you mean a transcript? The way I understood your OP was to extract the first two and last position of the genomic ranges given in grTargetGene. I don’t understand what you’re calculating. Additionally and from a pure coding POV I’m sure you can avoid most of these explicit for loops.

    – Maurits Evers
    Dec 1 '18 at 7:53















    by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

    – lizaveta
    Dec 7 '18 at 14:46





    by a chain of a gene I mean a chain of DNA which is coding for a future transcript. And the question was about extraction every third position in a given gene (which is going to be the third position in a codon). It is not exactly the same as extracting every third position in each genomic range, because some exons have a "frameshift" and start with 2nd or 3rd nucleotide of a codon. I am sure you can use better coding style than in function I presented; the question was about it and I wondered how someone can write it more elegant. Would be glad to hear any suggestions.

    – lizaveta
    Dec 7 '18 at 14:46













    -1














    I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)



    CodonPosition_separation = function(grTargetGene) {
    grTargetGene = sort(grTargetGene)
    grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])
    if (length(grTargetGene) >1) {
    for (l in 2:length(grTargetGene)) {
    grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]
    }
    }
    grTargetGene$coding_pos = grTargetGene$prev_exons_length%%3+1
    grTargetGene_N = GRanges()
    grTargetGene_S = GRanges()
    for (l in 1:length(grTargetGene)) {
    for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}
    if (as.character(strand(grTargetGene)[1]) =="+"){
    start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]
    end_ns = end(grTargetGene[l])
    if (start_ns <=end_ns) {
    start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
    end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
    }
    start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]
    end_s = end(grTargetGene[l])
    if (start_s <=end_s) {
    start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
    end_syn = start_syn
    }

    } else {
    start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]
    end_ns = start(grTargetGene[l])
    if (start_ns >=end_ns) {
    start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
    end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
    }
    start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]
    end_s = start(grTargetGene[l])
    if (start_ns >=end_ns) {
    start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
    end_syn = start_syn
    }
    }
    if (exists("start_nonsyn")) {
    length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)
    gr_nonsyn = GRanges(
    seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),
    strand = rep(strand(grTargetGene[l]), length_nonsyn),
    ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))
    )
    gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])
    grTargetGene_N = append(grTargetGene_N, gr_nonsyn)
    }
    if (exists("start_syn")) {
    length_syn = length(start_syn)
    gr_syn = GRanges(
    seqnames = rep(seqnames(grTargetGene[l]), length_syn),
    strand = rep(strand(grTargetGene[l]), length_syn),
    ranges = IRanges(start = start_syn, end = end_syn)
    )
    gr_syn = intersect(gr_syn,grTargetGene[l])
    grTargetGene_S = append(grTargetGene_S, gr_syn)
    }
    }
    return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))
    }


    It works nicely:



    > CodonPosition_separation(grTargetGene)
    $grTargetGene_S
    GRanges object with 514 ranges and 0 metadata columns:
    seqnames ranges strand
    <Rle> <IRanges> <Rle>
    [1] chr2 [148602724, 148602724] +
    [2] chr2 [148602727, 148602727] +
    [3] chr2 [148602730, 148602730] +
    [4] chr2 [148602733, 148602733] +
    [5] chr2 [148602736, 148602736] +
    ... ... ... ...
    [510] chr2 [148684831, 148684831] +
    [511] chr2 [148684834, 148684834] +
    [512] chr2 [148684837, 148684837] +
    [513] chr2 [148684840, 148684840] +
    [514] chr2 [148684843, 148684843] +
    -------
    seqinfo: 1 sequence from an unspecified genome; no seqlengths

    $grTargetGene_N
    GRanges object with 517 ranges and 0 metadata columns:
    seqnames ranges strand
    <Rle> <IRanges> <Rle>
    [1] chr2 [148602722, 148602723] +
    [2] chr2 [148602725, 148602726] +
    [3] chr2 [148602728, 148602729] +
    [4] chr2 [148602731, 148602732] +
    [5] chr2 [148602734, 148602735] +
    ... ... ... ...
    [513] chr2 [148684829, 148684830] +
    [514] chr2 [148684832, 148684833] +
    [515] chr2 [148684835, 148684836] +
    [516] chr2 [148684838, 148684839] +
    [517] chr2 [148684841, 148684842] +
    -------
    seqinfo: 1 sequence from an unspecified genome; no seqlengths





    share|improve this answer




























      -1














      I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)



      CodonPosition_separation = function(grTargetGene) {
      grTargetGene = sort(grTargetGene)
      grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])
      if (length(grTargetGene) >1) {
      for (l in 2:length(grTargetGene)) {
      grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]
      }
      }
      grTargetGene$coding_pos = grTargetGene$prev_exons_length%%3+1
      grTargetGene_N = GRanges()
      grTargetGene_S = GRanges()
      for (l in 1:length(grTargetGene)) {
      for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}
      if (as.character(strand(grTargetGene)[1]) =="+"){
      start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]
      end_ns = end(grTargetGene[l])
      if (start_ns <=end_ns) {
      start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
      end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
      }
      start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]
      end_s = end(grTargetGene[l])
      if (start_s <=end_s) {
      start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
      end_syn = start_syn
      }

      } else {
      start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]
      end_ns = start(grTargetGene[l])
      if (start_ns >=end_ns) {
      start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
      end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
      }
      start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]
      end_s = start(grTargetGene[l])
      if (start_ns >=end_ns) {
      start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
      end_syn = start_syn
      }
      }
      if (exists("start_nonsyn")) {
      length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)
      gr_nonsyn = GRanges(
      seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),
      strand = rep(strand(grTargetGene[l]), length_nonsyn),
      ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))
      )
      gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])
      grTargetGene_N = append(grTargetGene_N, gr_nonsyn)
      }
      if (exists("start_syn")) {
      length_syn = length(start_syn)
      gr_syn = GRanges(
      seqnames = rep(seqnames(grTargetGene[l]), length_syn),
      strand = rep(strand(grTargetGene[l]), length_syn),
      ranges = IRanges(start = start_syn, end = end_syn)
      )
      gr_syn = intersect(gr_syn,grTargetGene[l])
      grTargetGene_S = append(grTargetGene_S, gr_syn)
      }
      }
      return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))
      }


      It works nicely:



      > CodonPosition_separation(grTargetGene)
      $grTargetGene_S
      GRanges object with 514 ranges and 0 metadata columns:
      seqnames ranges strand
      <Rle> <IRanges> <Rle>
      [1] chr2 [148602724, 148602724] +
      [2] chr2 [148602727, 148602727] +
      [3] chr2 [148602730, 148602730] +
      [4] chr2 [148602733, 148602733] +
      [5] chr2 [148602736, 148602736] +
      ... ... ... ...
      [510] chr2 [148684831, 148684831] +
      [511] chr2 [148684834, 148684834] +
      [512] chr2 [148684837, 148684837] +
      [513] chr2 [148684840, 148684840] +
      [514] chr2 [148684843, 148684843] +
      -------
      seqinfo: 1 sequence from an unspecified genome; no seqlengths

      $grTargetGene_N
      GRanges object with 517 ranges and 0 metadata columns:
      seqnames ranges strand
      <Rle> <IRanges> <Rle>
      [1] chr2 [148602722, 148602723] +
      [2] chr2 [148602725, 148602726] +
      [3] chr2 [148602728, 148602729] +
      [4] chr2 [148602731, 148602732] +
      [5] chr2 [148602734, 148602735] +
      ... ... ... ...
      [513] chr2 [148684829, 148684830] +
      [514] chr2 [148684832, 148684833] +
      [515] chr2 [148684835, 148684836] +
      [516] chr2 [148684838, 148684839] +
      [517] chr2 [148684841, 148684842] +
      -------
      seqinfo: 1 sequence from an unspecified genome; no seqlengths





      share|improve this answer


























        -1












        -1








        -1







        I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)



        CodonPosition_separation = function(grTargetGene) {
        grTargetGene = sort(grTargetGene)
        grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])
        if (length(grTargetGene) >1) {
        for (l in 2:length(grTargetGene)) {
        grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]
        }
        }
        grTargetGene$coding_pos = grTargetGene$prev_exons_length%%3+1
        grTargetGene_N = GRanges()
        grTargetGene_S = GRanges()
        for (l in 1:length(grTargetGene)) {
        for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}
        if (as.character(strand(grTargetGene)[1]) =="+"){
        start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]
        end_ns = end(grTargetGene[l])
        if (start_ns <=end_ns) {
        start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
        end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
        }
        start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]
        end_s = end(grTargetGene[l])
        if (start_s <=end_s) {
        start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
        end_syn = start_syn
        }

        } else {
        start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]
        end_ns = start(grTargetGene[l])
        if (start_ns >=end_ns) {
        start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
        end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
        }
        start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]
        end_s = start(grTargetGene[l])
        if (start_ns >=end_ns) {
        start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
        end_syn = start_syn
        }
        }
        if (exists("start_nonsyn")) {
        length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)
        gr_nonsyn = GRanges(
        seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),
        strand = rep(strand(grTargetGene[l]), length_nonsyn),
        ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))
        )
        gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])
        grTargetGene_N = append(grTargetGene_N, gr_nonsyn)
        }
        if (exists("start_syn")) {
        length_syn = length(start_syn)
        gr_syn = GRanges(
        seqnames = rep(seqnames(grTargetGene[l]), length_syn),
        strand = rep(strand(grTargetGene[l]), length_syn),
        ranges = IRanges(start = start_syn, end = end_syn)
        )
        gr_syn = intersect(gr_syn,grTargetGene[l])
        grTargetGene_S = append(grTargetGene_S, gr_syn)
        }
        }
        return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))
        }


        It works nicely:



        > CodonPosition_separation(grTargetGene)
        $grTargetGene_S
        GRanges object with 514 ranges and 0 metadata columns:
        seqnames ranges strand
        <Rle> <IRanges> <Rle>
        [1] chr2 [148602724, 148602724] +
        [2] chr2 [148602727, 148602727] +
        [3] chr2 [148602730, 148602730] +
        [4] chr2 [148602733, 148602733] +
        [5] chr2 [148602736, 148602736] +
        ... ... ... ...
        [510] chr2 [148684831, 148684831] +
        [511] chr2 [148684834, 148684834] +
        [512] chr2 [148684837, 148684837] +
        [513] chr2 [148684840, 148684840] +
        [514] chr2 [148684843, 148684843] +
        -------
        seqinfo: 1 sequence from an unspecified genome; no seqlengths

        $grTargetGene_N
        GRanges object with 517 ranges and 0 metadata columns:
        seqnames ranges strand
        <Rle> <IRanges> <Rle>
        [1] chr2 [148602722, 148602723] +
        [2] chr2 [148602725, 148602726] +
        [3] chr2 [148602728, 148602729] +
        [4] chr2 [148602731, 148602732] +
        [5] chr2 [148602734, 148602735] +
        ... ... ... ...
        [513] chr2 [148684829, 148684830] +
        [514] chr2 [148684832, 148684833] +
        [515] chr2 [148684835, 148684836] +
        [516] chr2 [148684838, 148684839] +
        [517] chr2 [148684841, 148684842] +
        -------
        seqinfo: 1 sequence from an unspecified genome; no seqlengths





        share|improve this answer













        I created a function that can account for a chain and allows to process exons that length is not divisible by 3 (and might be even less than 3)



        CodonPosition_separation = function(grTargetGene) {
        grTargetGene = sort(grTargetGene)
        grTargetGene$prev_exons_length = c(0,width(grTargetGene)[1:length(grTargetGene)-1])
        if (length(grTargetGene) >1) {
        for (l in 2:length(grTargetGene)) {
        grTargetGene$prev_exons_length[l] = grTargetGene$prev_exons_length[l]+grTargetGene$prev_exons_length[l-1]
        }
        }
        grTargetGene$coding_pos = grTargetGene$prev_exons_length%%3+1
        grTargetGene_N = GRanges()
        grTargetGene_S = GRanges()
        for (l in 1:length(grTargetGene)) {
        for (obj in c("start_nonsyn","start_syn", "end_nonsyn", "end_syn","gr_nonsyn","gr_syn")) {if(exists(obj)) {rm(obj)}}
        if (as.character(strand(grTargetGene)[1]) =="+"){
        start_ns = start(grTargetGene[l])+1-grTargetGene$coding_pos[l]
        end_ns = end(grTargetGene[l])
        if (start_ns <=end_ns) {
        start_nonsyn = seq(from = start(grTargetGene[l])+1-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
        end_nonsyn = seq(from = start(grTargetGene[l])+2-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
        }
        start_s =start(grTargetGene[l])+3-grTargetGene$coding_pos[l]
        end_s = end(grTargetGene[l])
        if (start_s <=end_s) {
        start_syn = seq(from = start(grTargetGene[l])+3-grTargetGene$coding_pos[l],to = end(grTargetGene[l]), by=3)
        end_syn = start_syn
        }

        } else {
        start_ns = end(grTargetGene[l])-1+grTargetGene$coding_pos[l]
        end_ns = start(grTargetGene[l])
        if (start_ns >=end_ns) {
        start_nonsyn = seq(from = end(grTargetGene[l])-1+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
        end_nonsyn = seq(from = end(grTargetGene[l])-2+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
        }
        start_s =end(grTargetGene[l])-3+grTargetGene$coding_pos[l]
        end_s = start(grTargetGene[l])
        if (start_ns >=end_ns) {
        start_syn = seq(from = end(grTargetGene[l])-3+grTargetGene$coding_pos[l],to = start(grTargetGene[l]), by=-3)
        end_syn = start_syn
        }
        }
        if (exists("start_nonsyn")) {
        length_nonsyn = length(start_nonsyn)+ length(end_nonsyn)
        gr_nonsyn = GRanges(
        seqnames = rep(seqnames(grTargetGene[l]), length_nonsyn),
        strand = rep(strand(grTargetGene[l]), length_nonsyn),
        ranges = IRanges(start = c(start_nonsyn, end_nonsyn), end = c(start_nonsyn, end_nonsyn))
        )
        gr_nonsyn = intersect(gr_nonsyn,grTargetGene[l])
        grTargetGene_N = append(grTargetGene_N, gr_nonsyn)
        }
        if (exists("start_syn")) {
        length_syn = length(start_syn)
        gr_syn = GRanges(
        seqnames = rep(seqnames(grTargetGene[l]), length_syn),
        strand = rep(strand(grTargetGene[l]), length_syn),
        ranges = IRanges(start = start_syn, end = end_syn)
        )
        gr_syn = intersect(gr_syn,grTargetGene[l])
        grTargetGene_S = append(grTargetGene_S, gr_syn)
        }
        }
        return(list("grTargetGene_S"=grTargetGene_S,"grTargetGene_N"=grTargetGene_N))
        }


        It works nicely:



        > CodonPosition_separation(grTargetGene)
        $grTargetGene_S
        GRanges object with 514 ranges and 0 metadata columns:
        seqnames ranges strand
        <Rle> <IRanges> <Rle>
        [1] chr2 [148602724, 148602724] +
        [2] chr2 [148602727, 148602727] +
        [3] chr2 [148602730, 148602730] +
        [4] chr2 [148602733, 148602733] +
        [5] chr2 [148602736, 148602736] +
        ... ... ... ...
        [510] chr2 [148684831, 148684831] +
        [511] chr2 [148684834, 148684834] +
        [512] chr2 [148684837, 148684837] +
        [513] chr2 [148684840, 148684840] +
        [514] chr2 [148684843, 148684843] +
        -------
        seqinfo: 1 sequence from an unspecified genome; no seqlengths

        $grTargetGene_N
        GRanges object with 517 ranges and 0 metadata columns:
        seqnames ranges strand
        <Rle> <IRanges> <Rle>
        [1] chr2 [148602722, 148602723] +
        [2] chr2 [148602725, 148602726] +
        [3] chr2 [148602728, 148602729] +
        [4] chr2 [148602731, 148602732] +
        [5] chr2 [148602734, 148602735] +
        ... ... ... ...
        [513] chr2 [148684829, 148684830] +
        [514] chr2 [148684832, 148684833] +
        [515] chr2 [148684835, 148684836] +
        [516] chr2 [148684838, 148684839] +
        [517] chr2 [148684841, 148684842] +
        -------
        seqinfo: 1 sequence from an unspecified genome; no seqlengths






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 30 '18 at 14:42









        lizavetalizaveta

        657




        657






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428778%2fget-ranges-for-synonymous-and-non-synonymous-nucleotide-positions-within-a-codon%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

            ComboBox Display Member on multiple fields

            Is it possible to collect Nectar points via Trainline?